6th December, 2011

My digital preservation utopia

At Build last month, Jeremy Keith gave a presentation about preserving our websites, documents, and personal timelines. He talked about avoiding data loss and shared his fears for the future. I really soaked up his thoughts, even though I had to get up and speak directly after him.

Anyhow, it’s an important topic that I’ve often lazily considered. I share a lot about myself on the web and even archive details about my family history in certain places. I want to feel that none of this is pointless and that it’ll have some legacy. If I eventually have offspring, I’d like them to have a record of everything, and hope they’d add to it, pass it down the generations, and keep this personal history intact.

So, long story short. When out and about with web folks, a few pints down the line, I often share my idea for some sort of centralised super data archive we’d all use to preserve our data for generations to come.

Now, I’m somewhat naive when it comes to complex storage systems, encryption methods, security and all that jazz, so go easy on me. Also, this is probably more worthy of a tweet than a post, but 140 characters isn’t enough. I’ve written this at 100mph without much care, so I apologise for the tone and anticipated errors.

My unachievable idea

In my utopian misguided mind, I imagine the following possibly flawed scenario:

Over the next few years, all the services we love (Twitter, Flickr, Foursquare, Last.fm etc) make sure they follow Cameron’s Orbital Content model, allowing us to easily export all of our data in a raw format such as XML with accompanying folders of raw assets such as photos. We could then take that data to any other sites as services come and go—or inevitably get bought by Facebook. This first point might actually happen.

Now the dream. The government (or some other organisation we can supposedly trust) builds a massive Act-Of-God-proof data centre in a remote part of Northumberland, or somewhere like that.

Each year, perhaps on a set date or National Export Day or whatever, we each download our raw data from all of our services, back up our own sites, photos, important documents and stuff, and update our Super Zips. It’d be like doing an annual tax return, but slightly less painful. Maybe.

We’d then use some magic tool to encrypt our Super Zip folders if we’re security conscious. We'd upload these zips to the government or whoever's Act-Of-God-proof data centre in the background; Backblaze-style.

For this to be useful, our raw data might need to be converted or refactored every decade or so, should we fall out with XML, JSON, HTML or some other unexpected language madness. This refactoring would be a decision we made each year prior to submitting our Super Zips.

Each of us has two Keymasters that we choose from our families or friends and assign each year when we send our data. These Keymasters might change if people die or we divorce them or whatever, but essentially these people need to sync up to sign a release for our Super Zips should we die, go missing, get abducted by aliens, or go work at Facebook.

The government (or whoever holds our data) release the Super Zip to the key masters so they have a complete (or at least, no older than 12 months) copy of everything we wanted to pass down the line. Our key masters know our secret code so they can decrypt our data.

Now, another dream. Right now we view our photos, our check-ins, our articles and so on in a certain way, in certain frameworks, designed in certain ways. Over time, tastes will change, platforms and operating systems will come and go. So, hopefully, our Super Zips of raw data will be plugged in or uploaded and be interpreted by the sites or tools of the day and display our articles, photos, and other stuff in a manner that future generations will appreciate.

Perhaps the XML of today would power some sort of augmented reality Minority Report headfuck in 2081, or be plugged directly into our Great Great Great Great Great Great Great Grandchildren’s brains and turned into an interactive maze or something. As with all of the previous points, I have absolutely no idea what I’m talking about.

So, there you are. I expect you are laughing at me.

I know this is flawed

I understand that the government is not a fine custodian and that they’d probably close it down in 50 years, or they’d spend billions on the Information Technology and end up with it being run off’ve Wordpress or something. I know that JPG, PNG and other formats might not survive the millennium, which is why I suggest magic conversion tools at relevant periods in the future.

Most of all, I feel better for getting that out, but do appreciate that it’s probably ridiculous. I write it in hope that in decades or centuries people will look back and I’ll seem like some sort of Nostradamus of the digital age and be posthumously offered a knighthood which I hope my family would refuse on principle (the Empire and all that).

Then again, chances are that I’ll forget to renew my domain name in a few years or get hacked, lose my site and all my articles; and with my entire online history, this sooth that I say will be lost forever—just like all the stuff you put online today.

If you enjoyed this post, please grab the RSS feed. You can follow me on Twitter and Instagram, and subscribe to my infrequent newsletter. Thank you.