I hinted at this topic in my post about e-readers: How long will our digital data last?
I was first struck by the importance of this issue about 25 years ago when I bought a new computer and attempted to move a journal I had written to the new machine. Of course back then we didn’t have thumb drives, or easy access to the internet, and it wasn’t even standard yet to have floppy disk drives in all computers. I had no way to move a file. I tried tinkering with a null modem (a serial cable strung between the two computers, using modem protocols to send and receive the file) but I couldn’t get it to work. I finally found the solution by using a neighbour’s CompuServe account, sending the file up to the server then back down again.
The lesson stuck with me, and as we’ve progressed over the past couple of decades through multiple sizes and capacities of floppy disks then no floppies at all, through various formats of backup media (various sizes of tape, Bernoulli disks, Jaz drive, Zip drive, etc.), different hardware platforms (PC, Mac, etc.) different operating systems (Windows, Mac, multiple flavours of Unix, etc.) and software platforms (WordStar, WordPerfect, MS Word, etc.), I wonder how we’re going to save our data. Think of what would be required to retrieve today a file saved on a 5 1/4″ floppy disk; where do you find a computer nowadays that has one? Or worse yet, a Jaz drive cartridge; seen one of those lately? What if the operating system used a different type of encoding or file directory on the disk? What if the file was saved in a format from a word processor for which you don’t have the software any more, or a database for which you don’t have the schema? (There are, of course, businesses that will recover data for you from obsolete media, but it’s a bit pricey and not always guaranteed.) All of these formats have come and gone just in the past couple of decades; how many more changes will take place in the next twenty (or fifty or hundred) years?
Somehow, though, a book created a couple decades or centuries, or even a millennium, ago is still legible.
My early work in standards was around SGML, which provides a platform-independent syntax for describing data. Tags applied to the text would describe how the document was structured, what parts were section heads or paragraphs or tables, etc. But unless you stored the schema together with the document instance it was difficult to interpret the file. SGML’s successor, XML, does a much better job of this, in that the instance can be self-describing. But the most popular XML application, the MS Office suite, uses Microsoft’s own flavour of XML that is not necessarily compatible with other software. I wonder if an XML file created by MSWord will be readable a couple decades down the road. Does anyone even use SGML any more?
I used to recommend to people that they migrate their data to new platforms every few years. This is a somewhat automatic practice as people usually replace their computers every three to five years, and the latest version of software can usually read the data from the preceding version. So we may be okay here, as long as you move things every few years; if you don’t, though, you may be in trouble later. But now a bigger problem is that many people store their data online, on services that provide no guarantee about the longevity or safety of the data. There are a number of data backup services, either free or cheap; what happens when they go out of business? (A given free or cheap that’s a strong possibility.) Or, for another example, my kids and others of their generation store most of their photos on FaceBook. They take a picture and it goes straight online from the camera with no other copy made. Does FaceBook promise that they will permanently save your photos, and guarantee that you will be able to access them for the rest of your life? What would happen if FaceBook were to suddenly pull the plug on their servers tomorrow? “Oh, they would never do that” you say. But they could, and if they did you would have no recourse — and no photos.
What photos are you going to be able to share with your kids when they grow up?
What about other data, such as financial records including bank statements and tax returns? These are all going paperless, as various institutions and agencies discover the cost-saving benefits of electronic processing. Cost savings is fine, but who is going to benefit? It’s not me. And I make sure that I have paper copies of everything; I’m not going to trust that someone else will always make available to me information that I need, especially if the dispute is with them.
As for my personal papers, they’re personal, and I’d rather not trust them to someone else. I’m not going to store them in the cloud where who-knows-who will have access to them for data mining purposes. Promises to the contrary, a free or even a paid service can change the rules at any time, and there’s not much you’re going to be able to do about it.