"The future for digital storage is constant migration."
February 20, 2012 8:38 PM   Subscribe

"Most of the filmmakers surveyed...were not aware of the perishable nature of digital content or how short its unmanaged lifespan is." After the Motion Picture Academy's release last month of "The Digital Dilemma 2," a warning aimed at independent filmmakers and nonprofit archives, cinematographer John Bailey talks with one of the report's authors about the perils of data migration ("It’s not unreasonable to say that the term "digital preservation" is an oxymoron") and the need to educate filmmakers who are so "enamored with the perceived benefits of digital image capture and workflow" that they fail to realize preservation concerns start to appear almost immediately after their work is completed. Film professor David Bordwell covers the report in a detailed post about preserving "born-digital" films, sixth in his "Pandora's Digital Box" series about the worldwide conversion to digital projection, with lots of good links at the bottom.
posted by mediareport (87 comments total) 40 users marked this as a favorite

 
Via Bruce Sterling.
posted by mediareport at 8:43 PM on February 20, 2012


Long term storage has been a huge problem and a pain in the side since well before advent of digital. So much taped analog content has been lost due to storage errors and even to intentional tape wiping (or junking) that it is a true shame.
posted by bz at 8:50 PM on February 20, 2012 [5 favorites]


Somebody ram this down Apple's throat.
posted by a non e mouse at 9:00 PM on February 20, 2012


Somebody ram this down Apple's throat.

Why Apple in particular?
posted by schwa at 9:13 PM on February 20, 2012 [4 favorites]


There are computer records in my company going back to the 70s. I think it's pretty simple actually - back up in multiple locations and every once in a while get a new server. I wager that with large collections, economy of scale works in your favor, and storage would be cheaper than the cited figures.
posted by gyp casino at 9:13 PM on February 20, 2012


Somebody ram this down Apple's throat.

Not sure how it's all Apple's fault, when every company seems to be pushing it. Heck, Steam was one of the first to do it. Frankly the way Steam does it (and Apple has adopted) works wonders for me, having it stored somewhere else and me able to download it when need, keeps me from worrying about if my zip drive is obsolete or if i can get a burner in the size i need.

Someone needs to get their asses working on a good and stable holographic type storage. Good size (at least 1TB for now), stable, etc. I know we are still in the baby steps area of digital (long term, it seems like we are old hats at this, but looking long term we are a fraction of length of things like film).

Honestly though, while i admire the "preserve everything", and would love to have that, i just remember going through antique stores and see the piles of old photos and realize that most of what could be saved usually ends up as stuff that gets sold as soon as the person who did them dies. There are also a lot of films made each year that one wonders if they are worth the effort to preserve. Not to mention that we have films that have been lost due to war, disinterest, etc (the Louise Brooks library of films, most on silver nitrate i believe, gone). So while digital might be more fleeting, it might be good that it takes more effort too, so people are actively knowing where these things are.
posted by usagizero at 9:14 PM on February 20, 2012 [3 favorites]


You want to archive your digital film? Simple. Put free copies on Netflix, Hulu, amazon, iTunes, YouTube, Vimeo and set up some torrents. The more people that can keep copies the better. True, this doesn't guarantee survival of a nuclear internet holocaust, but it's much more likely to survive than a hard drive under the bed. Someone somewhere will be have a copy and the ability to play it or convert it.
posted by fungible at 9:18 PM on February 20, 2012 [7 favorites]


It seems like our brave new digital world remembers everything we'd rather forget and forgets what we want to remember.
posted by treepour at 9:20 PM on February 20, 2012 [19 favorites]


Babylon 5 has already had this happen. While they were making DVDs, it was discovered that the computer models had been lost, resulting in poor quality video on the DVDs. More than that, the latest revision of wikipedia doesn't mention it, and I had to dig into the past to even find mention of it, and this was only a decade ago.
posted by fragmede at 9:31 PM on February 20, 2012 [3 favorites]


It's not like preservation has never been an issue for film even before digital. All the mammoth restoration projects and the money which has been spent on various "hey, we have to save this movie" projects across the past 20 or so years are a testament to that. The number of films which were only ever on nitrate stock which are forever lost to history are a sad monument to the fragility of any of our media formats.

Granted, it's probably accelerated by digital media formats, seeing as to how quickly they evolve. VHS (not digital) to DVD to Blu-ray has happened in far less than my lifetime. As has 5.25 floppy to Zip Drive to the cloud. And having players which can actually interpret old formats? I'm probably lucky that using Apple machines means that I can still play my CD-ROM of The Madness Of Roland despite several evolutions of OS between 1992 and now.

No matter what the format, unless there is a small army working to preserve old material and continuing to make sure it doesn't degrade and is still playable in modern formats, we're going to discover a large amount of our media history is simply lost forever. The vinyl record is probably the only media format I can think of which may continue to be playable forever, as all it really takes is a needle mounted in a semi-fixed bracket and some kind of way to take those frozen sound waves and make them louder than they would be otherwise. I remember, as a child, holding a hand sewing needle in my hand and being able to faintly hear the record I held it to. You can't do that with movies in any format.

And thus is the conundrum we face -- we don't have ANY permanent medium for such things. And without applied energy, entropy is all we have to look forward to.
posted by hippybear at 9:32 PM on February 20, 2012 [3 favorites]


I always get the "Motion Picture Academy" confused with either the "Academy of Motion Picture Arts and Sciences" or the "Motion Picture Association of America". Not that it matters, they're all made up of roughly the same old white men. How do I know they're old white men? Who the else would use a phrase like "motion picture" anymore?
posted by twoleftfeet at 9:36 PM on February 20, 2012


Twoleftfeet: the Motion Picture Academy and the Academy of Motion Picture Arts and Sciences are the same thing, aka AMPAS. The other one is the MPAA.
posted by BlahLaLa at 9:41 PM on February 20, 2012


Yeah, both articles spend time discussing preservation issues, successes and failures from the past. I liked the story of the Dawson City film cache, rediscovered under frozen earth and an ice rink after 50 years, at the start of Bordwell's piece. Then there's the quote at the end from the director of the UCLA Film and TV Archive:

If I found a reel of 35mm film in 500 years and didn’t know what it was, I could probably without too much trouble figure out a way to reverse engineer a projector. In any case, I can always look at the individual frames, even without a projector, and see what is there.

If I find a cache of Blu-rays and DCPs in 500 years, what do I have? Plastic waste. How do you reverse-engineer those media? Impossible. Without an understanding of the software and the hardware, you have zip. No way to look at it, no way to know even if it has any information on it.

posted by mediareport at 9:41 PM on February 20, 2012 [9 favorites]


Engraved metal flipbooks then?
posted by Joh at 9:45 PM on February 20, 2012


Something that can be copied with a few keystrokes can be deleted with even less
posted by Ironmouth at 9:48 PM on February 20, 2012 [2 favorites]


Somebody ram this down Apple's throat.

And who do we blame for the fact that we are currently missing 90 percent of all silent movies ever made and 50 percent of all sound films made before 1950 because the film stock was inherently unstable?
posted by Bunny Ultramod at 9:50 PM on February 20, 2012 [7 favorites]


Wow.

"Speaking very broadly, with 4K scans of color films you wind up in the neighborhood of 128 MB per frame. . . . Figure that a typical motion picture has about 160,000 frames, and you wind up with around 24 TB per film. And that’s just the raw data. Now you process it to do things like removing dust, tears, and other digital restoration work. Each of those develops additional data streams and data files. We’ve decided, based upon our previous experience, that it is best to save the initial scans as well as the final processed files for the long term. Now we are up to 48 TB per film. In our nitrate collection alone, we have well over 30,000 titles. 48 TB x 30,000 = 1,440,000 TB or 1.44 EB (exabytes) of data.

Weissman adds with a trace of grim humor: “And of course you want to have a backup copy.”

posted by KevinSkomsvold at 9:57 PM on February 20, 2012 [7 favorites]


And who do we blame for the fact that we are currently missing 90 percent of all silent movies ever made and 50 percent of all sound films made before 1950 because the film stock was inherently unstable?

The bottom line mostly. Add to this the lack of preservation for TV series around the world, due to lack of technology, lack of foresight, and lack of money. Most of the time, I bet there were questions of potential royalties owed, and the assumption that there was no replay value nor archival value in any of the content.
posted by ZeusHumms at 10:01 PM on February 20, 2012


fungible, your response seems a bit glib (although I'm with you on the "get it out there" thing, for sure). A key issue is changes in software and devices over time. Bordwell links this clip of the director of the Academy Film Archive telling a "great cautionary tale" of the problems Pixar had trying to remaster Toy Story for DVD in 2000, five years after it came out, using software that no longer recognized the original files.

Pixar can afford the money (and, at least as big a factor, the time) to migrate their files regularly. Are you planning on someone at Amazon and Vimeo doing it for you? For free? For how long?
posted by mediareport at 10:10 PM on February 20, 2012 [2 favorites]


You want to archive your digital film? Simple. Put free copies on Netflix, Hulu, amazon, iTunes, YouTube, Vimeo and set up some torrents. The more people that can keep copies the better. True, this doesn't guarantee survival of a nuclear internet holocaust, but it's much more likely to survive than a hard drive under the bed. Someone somewhere will be have a copy and the ability to play it or convert it.

Those wouldn't really be archives though. I mean, they'd be better than nothing, sure, but it would be like saying photos of the Mona Lisa would be acceptable in place of the real thing.

Now, if you're actually regularly torrenting multi TB master copies, then I salute you, sir, and wonder how much money you're paying for those racks.
posted by kmz at 10:13 PM on February 20, 2012 [4 favorites]


Spreading copies far and wide though helps ensure that the content is remembered as more than a paragraph here and a paragraph there though.
posted by ZeusHumms at 10:18 PM on February 20, 2012


Bunny Ultramod: " 'Somebody ram this down Apple's throat.'

And who do we blame for the fact that we are currently missing 90 percent of all silent movies ever made and 50 percent of all sound films made before 1950 because the film stock was inherently unstable
"

APPLE OBVIOUSLY DUH
posted by DoctorFedora at 10:23 PM on February 20, 2012 [8 favorites]


1. Tie copyright extensions to obligations to preserve.

2. Massive new jobs program for librarians.
posted by fatllama at 10:34 PM on February 20, 2012 [19 favorites]


Sorry, should have provided more info, but here's an example of moving into the future (personal annoyances aside, I see this, among other things, as why Apple needs to consider this issue more responsibly).
posted by a non e mouse at 10:45 PM on February 20, 2012


How hard would it be to come up with some kind of ISO-9000 Certified Archival Quality Codec for audio and video preservation? Seems like the main problem with digital preservation is simply a matter of "none of our current computers can play these old file formats."

Instead of shooting themselves in the foot with proprietary codecs and whatnot, just get everyone to agree on a format (fat chance, yeah) and submit a copy of their film to the AMPAS vault (step one: get a vault) and there ya go.

Other than the cost (which ought to be MUCH less expensive than film storage) what would be the downside of this kind of setup?
posted by ShutterBun at 10:45 PM on February 20, 2012


Sorry, should have provided more info, but here's an example of moving into the future (personal annoyances aside, I see this, among other things, as why Apple needs to consider this issue more responsibly).

Why should any part of this be Apple's responsibility? That's like finding the companies that made film stock 80-100 years ago, and making them responsible for the loss of films and degradation in color.
posted by ZeusHumms at 10:51 PM on February 20, 2012 [2 favorites]


I work for the Washington State Digital Archives (which does not endorse my posts here or anywhere...). We preserve over 100 million digital objects, will manage tens of millions more by the end of the year, and plan to manage and preserve most of them forever. We practice periodic forward-migration, regular refreshes of the hardware and backups in multiple locations, that sort of thing.

The famous failures to preserve digital objects--like the original moon landing movies--are not evidence that "digital preservation is an oxymoron" but rather evidence that some people are doing it wrong.
posted by LarryC at 10:53 PM on February 20, 2012 [6 favorites]


And yet notated music and play scripts from 500 years ago can be recreated with perfect accuracy.
posted by awfurby at 11:03 PM on February 20, 2012 [3 favorites]


I wouldn't say perfect accuracy; new interpretations of what composer X really meant piece Y to sound like keep popping up every few years or so, not to mention the slow evolution of musical instruments undsoweiter.
posted by MartinWisse at 11:12 PM on February 20, 2012 [2 favorites]


The scope for interpretations exist because there are aspects of musical performance that notated music can't specify at all or precisely enough. However, even for seminal works such as the Beethoven symphonies, an urtext edition was published within the last couple of decades.
posted by Gyan at 11:39 PM on February 20, 2012


Apple toyed with entering the professional film industry through the creation of its suite of NLE software. Many people have plied their trade using this software. Suddenly, that software is no longer maintained or supported at a professional level because Apple have deemed that there is more money to be made from the consumer market.

I take issue with this, in that it shows no real respect for the industry that it exploits, and less respect for those whose livelihood it ultimately sacrifices.

On a broader level, this is a fundamental issue that affects how we maintain and archive media (albeit a narrow one). I think that large organisations, like Apple, need to exercise a certain moral consideration of the sector/medium that they want to capitalise on.
posted by a non e mouse at 11:51 PM on February 20, 2012


I take issue with this, in that it shows no real respect for the industry that it exploits, and less respect for those whose livelihood it ultimately sacrifices.

How does this work? Did the software stop working?
posted by pompomtom at 12:02 AM on February 21, 2012


How does this work? Did the software stop working?

Expectation for the future of a platform strongly influences present decisions about using that platform. If you suspect (as I do) that Apple is losing interest in professionals of all kinds, then you start to wonder whether it’s wise to hang around. The software may work, but can you trust that it will continue to interface with new hardware or new file formats? If there are new bugs, will anyone care? If you’re the last guy using FCP, will you still be able to share media with your colleagues on other platforms?
posted by migurski at 12:11 AM on February 21, 2012


The first link makes a really good point about documentary footage. Fifty years from now it may be very hard to know what life was really like in 2012 because so many of the news broadcasts and ephemera like video clips and cellphone pictures aren't being archived, and won't be there like the old photographs usagizero mentions. I'm not sure what will happen to youtube clips and facebook pages decades from now, but it's hard to imagine those companies carrying around larger and larger amounts of data that no one accesses anymore.
posted by Kevin Street at 12:29 AM on February 21, 2012


a non e mouse: "I think that large organisations, like Apple, need to exercise a certain moral consideration of the sector/medium that they want to capitalise on."

While I'd tend to agree with you, it's worth noting that the very reason the Academy has created this report is because almost no-one else in the sector/medium is doing much to solve the problem either.

Apple arrogantly/ignorantly/stupidly dropping backwards compatabilty in one piece of widely-used software displays a certain sort of attitude, but it pales into insignificance compared to the fact that the industry itself doesn't have an agreed-upon compulsory archival process that goes much beyond "stick in it a vault and hope you can read it in 5 years time"…
posted by Pinback at 12:30 AM on February 21, 2012 [3 favorites]


The academy is working on a set of standards, IIF-ACES. It's designed to be used from capture to archive. If it's adopted, there will be a shared open format to work from.
posted by jade east at 1:11 AM on February 21, 2012


The first link makes a really good point about documentary footage. Fifty years from now it may be very hard to know what life was really like in 2012 because so many of the news broadcasts and ephemera like video clips and cellphone pictures aren't being archived, and won't be there like the old photographs usagizero mentions. I'm not sure what will happen to youtube clips and facebook pages decades from now, but it's hard to imagine those companies carrying around larger and larger amounts of data that no one accesses anymore.

The Internet Archive has all video from dozens of satellite and cable channels archived back into the 90s, with tapes being digitized going further back. Archive TEAM, which I am part of, made a full copy of Yahoo! Video before it went down. Don't give up hope!
posted by jscott at 1:35 AM on February 21, 2012 [6 favorites]


It's not like preservation has never been an issue for film even before digital.

I don't think this is really the point, unless you believe that "people have been murdering for a long time; why can't I?" is a valid defense.

To my mind, it's more that people tend to think of digital material as self-preserving, especially if you have back ups. The issues of loss of information in data transfer, keeping track of various parts and layers of a project, migration problems, maintaining proper documentation, etc seem to be below most people's radars. We can hope that awareness leads to some action by some people.
posted by GenjiandProust at 2:56 AM on February 21, 2012 [1 favorite]


jscott,

What's the present status in serious archival media, i.e. etching onto stone and the like?
posted by effugas at 4:01 AM on February 21, 2012


Seriously though, the future is not in constant migration. The future is in media that doesn't spin and doesn't rot. We might spin the read/write heads though.
posted by effugas at 4:09 AM on February 21, 2012


Also, it's important to realize that there's this ridiculous expansion of how much data people want to keep. It's not like we could maintain the sets for old movies, but because they're now digital, we're trying to hold onto them -- at every phase of processing.
posted by effugas at 4:10 AM on February 21, 2012 [1 favorite]


"it pales into insignificance compared to the fact that the industry itself doesn't have an agreed-upon compulsory archival process that goes much beyond "stick in it a vault and hope you can read it in 5 years time"

Thing is, with film, you don't need any technology to interpret the held data. Simply hold it up to the light. Even 5K uncompressed scans of every frame (probably out-resolving 35mm) are completely nonsensical without a computer that can interpret that data in a meaningful way.
posted by Magnakai at 4:29 AM on February 21, 2012


from listening in on conversations at malwart, some people are working very hard on keeping an oral history of many television shows
posted by Redhush at 4:37 AM on February 21, 2012 [4 favorites]


I remember, as a child, holding a hand sewing needle in my hand and being able to faintly hear the record I held it to. You can't do that with movies in any format.

You can make a homemade projector pretty easily. I helped a friend of mine make one for an art installation using the movement from a broken Bolex and some random lenses we took apart. Hand cranked. Threw a surprisingly good image.
posted by nathancaswell at 4:43 AM on February 21, 2012


The famous failures to preserve digital objects--like the original moon landing movies--are not evidence that "digital preservation is an oxymoron" but rather evidence that some people are doing it wrong.

I thought the "oxymoron" comment was referring not to human error/stupidity but to the "almost inevitable" degradation of information that can occur in large-scale data migration:

We talk about “migration” as if it were a foolproof system, like a text clone; but in fact, the ones and zeros that constitute a digital record of a two-hour feature length film at 2K resolution, is a monstrously hungry data system. How accurate is such complex media migration? Has enough time passed, and enough migration generations, for us to know the safety and accuracy of the migration process?

Just the opposite. We already see issues with migrated materials. And, while it may work for a very small number of titles with active management, migration doesn’t scale for larger collections: this from the Storage Industry Network Association study. The Associate Librarian of Congress, a major migrator of digital data, admits that each time they migrate, they face the danger of losing information.

Is it safe to assume that most migration is done on robot systems without a human eye as intermediary to check for accuracy? If migration errors happen, and they are almost inevitable, how does this affect subsequent migrations? The question of migration errors is not addressed in the TDD2 report. Perhaps, it’s too early in migration history to even know the answer.

Our role was to ask about possible solutions for the long term. Migration, as stated by others (see above, Mavrotes) isn’t a long-term solution. It’s part of active management...

posted by mediareport at 4:50 AM on February 21, 2012


I'm glad this issue is getting greater press. I work in a small library and everyone pushes digitalization for every format, but even for those items without copyright restrictions, it's not like we have an unlimited budget (hah!) to plan every possible format change for even the rarest of the items. I mean, we still have some cassette tapes and floppy drives that were included as supplemental material in books, and I'm pretty sure we have (college) students who have never seen a floppy drive. We do have archival copies of some things-- digital-only DVDs of previously-transfered movies from the 20's and 30's being one of the ones I see the most. But that's only as good as the next big format change, really; at some point, I can see how the copies of the copies of the nearly-a-century-old-film will stop being as useful. We've also uploaded some scans of materials to Internet Archive, which is great, but again falls into the question of how stable it is as a site, and how stable those files will be fifty or sixty (or even thirty) years from now. I wish the documentary makers good luck. It's not an easy question, especially not when the common answer "Just digitize it! Make all the torrents!" is not especially easy.


(Was it on the blue that someone mentioned the microfilm state archives of Russia, because at least with a light source and magnifying glass you can still make out the text? Not that it's stable the way baked clay tablets are or engraved stone, but it's an interesting point.)
posted by jetlagaddict at 4:51 AM on February 21, 2012


The strength of digital media has always been almost exclusively about its convenience and has never been about any supposed archival strength or longevity. Digital formats are simply too dependent on the software and hardware factors to ever be considered archival.
posted by Thorzdad at 4:59 AM on February 21, 2012


In a peripheral field and another lifetime, I had a twenty year career as a microfilm archivist, which ended because the salesmen and digital hucksters of the world managed to successfully bamboozle the middle managers charged with archiving files into believing in the magical world of the future, where your data will be magically accessible anywhere in the world and will last forever. I was the "throwback" at so many conference room sales presentation, armed with science, data, and history while one moronic proprietary storage system after another would be brought up, trying desperately to convince people that archival storage required media that (A) was demonstrated to last a long time and (B) could be read by technology that will still exist in a hundred years, and in the end, I got tired of making a fraction of what the salesmen and the middle managers made while I had to bust my ass to save them from themselves.

I left the business in 2004, except for going back now and then for a bit of consulting work for the last surviving local microfilm company doing archival work with any integrity, and to be completely honest, I'm still bitter about what happened to that industry.

As it happens, I'm still pretty easy to track down on the internet, so I get the occasional email from a former customer, who's almost always in a panic because their idiotic proprietary Integrated Top-Down Data Access System$trade; no longer worked, leaving all their data locked up on wonderful little proprietary cartridge-loading magneto-optical disks in wonderful auto-loading jukebox machines made by forward-thinking companies like Wang.

I would like not to be a dick in these situations, but here we are.

"I've been tasked with retrieving these files for migration," they start, and they've almost lost me at saying that they've been "tasked," but I try to be sympathetic...and fail.

"Well, Hitachi hasn't made that disk media or readers since 1994. The manufacturer of your cart jukebox merged with XYZ Corp, who promptly fired all their employees and shut down that division. I have to say you're looking at a pretty big project."

There's a sort of pause, and a desperate, hair-pulling panic in the air, and what I want to say is that I sat across from a big ugly particleboard conference table with a faux-marble formica top and decorative oak edging with the caller and had salesmen and reps from Wang and other companies run down my suggestion to do a straight microfilm capture followed by a scan from the film to cover both the archival storage and the digital side, but when I think about those meeting, my blood boils. Still.

See, we knew that the bullshit the digital utopians were spinning was wrong. We said so. We provided ample, peer-reviewed documentation. We explained concepts, like how it doesn't matter how archival the medium is if the reader technology isn't stable (see NASA's disasters with the endless oddball tape formats they used for an example). We sat there and told our customers what would, not could, happen, and ended up being talked over by our own salesmen, who ground their teeth at the potential for profit in digital storage and the endless service contracts that such things require.

I'm still bitter about this because I grew up in that industry. My family's business were a part of some amazing projects to convert fragile documents into a form that created a high quality, 500 year copy that could be copied and distributed to scholars around the world. With fresh white gloves on, I've carefully handled every surviving letter between Martha Washington and the Marquee de Lafayette, and as I clicked off frames on my perfectly-maintained MRD-2 camera, I had to laugh at her jokes, her gossip, and all the recipes she sent, most of which required an enormous amount of suet. I've run the camera while the staff of the Walters Art Museum carefully turned the pages of tiny illuminated manuscripts with ivory tongs as we caught every fine nuance of the text on film.

I designed and built book boxes that took the strain off a binding to make filming easier, and then scanned the film, eliminating the whole process of either cutting off the binding or mashing the poor book onto a flatbed scanner and producing a versatile digital copy (in at least two formats) and a long-lived microfilm copy.

I received original sprocketed film from Birkenau, carefully cleaned the film, made a pair of safety silver print copies, and scanned everything, doing a careful A-B comparison to make sure not a single letter from a single name would ever disappear from history.

In the end, I'm just a throwback, a crusty old man with a rake, alas.

"Well, yes, we remembered what you said," these people say when they track me down, years and years later, "but we just need to access this data and your our last contact."

The thing is, though, what they need is someone to rebuild the hot technology of 1988, which barely worked even then, and I can only shrug and offer my condolences. I burn to say "well, I did tell you what would happen, you fucking fucker," but I'm not like that, with a few key exceptions.

"I can set you up with some contacts, but my contracting fee is two hundred and forty dollars an hour."

This makes everyone balk, but tough. I find this, and them, tiresome almost beyond measure. See, the complaint at the time was to call me a luddite, and in the same way that virtually everyone who cries "luddite" really doesn't know the story behind that word, they don't get it even now. If you're working on an archive, you need to think things out, and think sideways, and think of impossible conditions, and think that companies go out of business, so if you've hitched your wagon to one of them, you're going to be screwed.

What killed it all for me was Nicholson Baker's sloppy, uninformed hatchet piece, Double Fold, a book that was suddenly in the hands of the middle managers on the other side of the table, and which was used in the most convoluted manner to push for digital storage, rather than evil, destructive microfilm (or at least what Baker claimed microfilm was). Just like there's nothing worse than a patient with the internet, diagnosing themselves, that was pretty much the final straw, and I left them all behind to rot with their fading archives.

In this case, "born digital" doesn't mean "stuck digital." They can print their digital films to actual high quality film and have an analogue copy that'll last. We can also move away from proprietary digital into the realm of open source digital, which at least gets us out of the realm of the unsupported file formats of the future. The medium, alas, is a problem so far, because recordable digital media don't last, and if they would, the means to read them can't be guaranteed. It's a complex problem that can be solved, but only if there's a genuine will to do so that's spread across all the fields that are involved. So far, we're not there, and bean counters, salesmen, and middle managers won't do it because their entire reason for being is to live in the short-term profit projection. If the problem won't arise until after you're retired, why should you care?

Right now, we're in the next phase of digital fuzzy magic, and we talk about migration and storing stuff in "the cloud" and dispersed storage across the internet as if those are solutions, but what happens if the MPAA and RIAA win their battle to lock down the internet? Think it can't happen? History begs to differ.

If we want all this digital stuff to be something more than an ephemeral performance, a dance that's beautiful and glorious and something to be left behind as we keep pushing forward, we're doing okay. If we want history to be something that doesn't just trail off suddenly at the start of the twenty-first century, we need to be smarter, and think in a holistic, comprehensive way.
posted by sonascope at 5:07 AM on February 21, 2012 [125 favorites]


from listening in on conversations at malwart, some people are working very hard on keeping an oral history of many television shows

I lol'ed.
posted by odinsdream at 5:08 AM on February 21, 2012


If we want all this digital stuff to be something more than an ephemeral performance, a dance that's beautiful and glorious and something to be left behind as we keep pushing forward, we're doing okay. If we want history to be something that doesn't just trail off suddenly at the start of the twenty-first century, we need to be smarter, and think in a holistic, comprehensive way.

So, what you're saying is we're fucked.
posted by odinsdream at 5:14 AM on February 21, 2012 [4 favorites]


The beautiful catch-22 is that the copyright policy the movie industry has pushed so hard for is the exact thing that prevents the preservation of their precious "property".

I weep for society, but laugh directly at those responsible for this.
posted by blue_beetle at 5:20 AM on February 21, 2012 [3 favorites]


What's the present status in serious archival media, i.e. etching onto stone and the like?

(Digressing, but I'll come back to digital....promise.)

Historically, pretty poor. Bit density is low, transport and writing costs are very high. Worse, large slabs of stone have suffered from the same reuse/recycle problem that silver based film stocks have. Thousands of stela have been reused in construction -- some times as slab, but often worse. Limestone, in particular, turned out to be a lousy medium, because people would throw them into lime kilns for making mortar for their buildings.

There's no write protection, it's easy to recarve that information later (harder to do so undetectable, of course.) And, of course, even if they were left in situ, a large amount of these stela were left outside, and weathering made them unreadable.

Finally, of course, there's encoding issues. There are a number of scripts that are only sketchily understood, and a few that are completely unknown.

An example: Ancient Egypt is known for stone carved texts, and indeed the script itself was recovered thanks to a lucky stela that had nearly identical text engraved in two Egyptian scripts (Hieroglyphic and Demotic) and one script that we knew (Greek). It also helped that Coptic was spoken until the mid 1600s, and Coptic was clear descendant of the ancient Egyptian language.

However, the vast majority of information we have on Ancient Egypt isn't from stone stela, it's from papyrus rolls, often written in a third script (Heratic). It's likely that more of the stone carving has survived, but there was so much Heratic on Papyrus to recover.

Which, of course, gets to the crux of the problem.

(Hey, fully back on topic!)

The problem is the word "archive." The modern sense is that you would take X, make a copy, Y and put it into an archive, where it would then be safe. You then didn't care about Y', the other copies of X -- or in many cases, X itself. The issue with this is the full assumption that Y is safe in its archive, and as digital is discovering anew, it often isn't. Paper burns, stone wears, film stock degrades, and bits rot.

Digital, however, has something over most previous media. When you made Y in most other media, you could always tell that Y was not X. Sometimes it was obvious, sometimes it wasn't, but making Y=X was basically impossible, certainly at scale. Except, of course, for digital.

In digital copies, X=Y=Y'=Y", presuming you take the most basic care in the copy. This leads to a new method of archival, which is survival of the numerous. If there are 10,000,000 identical copies of X, the chance of X disappearing plummets in comparison to the one magic media holding a single copy of X.

There are still issues. Encoding leaps to the fore -- having, say, 30,000 copies of "I_want_to_be_sedated.mp3" doesn't help if you can't decode an mp3. So, archiving the formats is critical as well. Finally, there's encryption - but then again, "encrypt" come from "cryptography" from the Greek "kryptos"=hidden and "graphein"=writing. The point of an encrypted text is to *not* be generally read. Indeed, it's the very anthesis of archival, you are actively trying to prevent it from being read in the present, rather than making sure it can be read in the future.

So, the attributes of a digital archival system. First, keep the data alive -- all the data at rest issues disappear if you make sure the data can always be accessed. This means living media, in multiple places. Secondly, you have to archive the encoding schemes as well.*. Finally, of course, those files must be clear text -- though cryptological hashing is a great idea to ensure that your copies (which are always being made in a living data archive) are indeed correct, and X lives again.

This is hard. But then again, *all* archives are hard. The Library of Alexandria carefully stored the best works on very long lived media in multiple languages, and made sure scholars were reading, and copying. It then burned to the ground. Thankfully, many of those works were also elsewhere. Unfortunately, some were lost forever.

Will we lose data? Yes. We always have, and I think we always will -- even if it's only because everyone is convinced that data block X is junk, throws it away and reuses it, only to find out that no, we can read X now!

Will we lose the ability to archive data in a digital realm? Nonsense. It will be a very different archive than people are used to, it will take very different methods, indeed, it may well blur the line between archive and online. But, if we want to, if we're willing to take the time, it can be, and will be, retrievable.

The historic problem, of course, is that in most cases, nobody bothers. This was true in 3000BCE, and I'll bet it'll be true in 3000CE.




* Technical aside, IT geeks -- have you put a working media player in the offsite vault with your backup tapes? They mean nothing if they can't be read, and making the assumption that your tape drive works is a bad one.
posted by eriko at 6:00 AM on February 21, 2012 [16 favorites]


1. What eriko just said
2. If I found a reel of 35mm film in 500 years and didn’t know what it was, I could probably without too much trouble figure out a way to reverse engineer a projector. Um, if you found a reel of 35mm film in 500 years you'd be looking at dust.
3. I get the problems of software changing. But not all software is lost to time. I can still read a JPEG from 30 years ago. I can play Atari games on my iPhone thanks to an emulator.
4. Yeah, I guess a 4K master can be 48TB. But a 1080HD version in H.264 (such as that on youtube or vimeo) is less than 50GB. Is that sufficient as a "master" for future generations? On a basic level, yes. Maybe if I want to upgrade the film to Super-4D holographic projection someday, probably not.

But put it this way: whose more likely to migrate your film to future technologies, Google, Apple, torrenters, etc. or the hard drive under your bed?

Also, keep in mind that in 10 years, 48TB will probably fit on a thumb drive and cost $5. (Or download from the cloud in 4 minutes.)
posted by fungible at 6:19 AM on February 21, 2012


Um, if you found a reel of 35mm film in 500 years you'd be looking at dust.

We got B&W film and microfilm that's over 100 years old that is still good. I'd much rather put my money at that lasting 500 years than a hard drive.

Sorry about your fucking stutter too.
posted by marxchivist at 6:27 AM on February 21, 2012 [1 favorite]


For all my transhumanist/futurist/etc leanings, my faith in digital archiving is trapped in the stack of at least 3 hard drives that I can no longer access, but can't bear to throw away given their content.
posted by Uther Bentrazor at 6:32 AM on February 21, 2012


Sorry about your fucking stutter too.

Whoa. Chill out.
posted by weinbot at 6:35 AM on February 21, 2012 [2 favorites]


Whoa. Chill out.

Yes you're right sorry about that.
posted by marxchivist at 6:57 AM on February 21, 2012 [2 favorites]


But a 1080HD version in H.264 (such as that on youtube or vimeo) is less than 50GB. Is that sufficient as a "master" for future generations? On a basic level, yes.

A professionally encoded 1080p H.264 clip on YouTube or Vimeo may pass muster on my 46-inch LCD (I'd be shocked if a two-hour movie encoded at that quality took up anywhere near 50 GB, but whatever), but just barely, due to macroblocking, color-banding/posterization, and an obvious loss of high-frequency detail that occurs when low-pass filters are applied to the image. And a 50 GB Blu-ray (far superior to anything on YouTube or Vimeo) doesn't look great when you project it on a 50-foot movie-theater screen, either. As mentioned above, it's a picture of the Mona Lisa.

Did you happen to see Mission Impossible: Ghost Protocol projected on an IMAX screen? The Dubai sequence was fucking amazing. That's what we're looking to protect -- not some H.264 YouTube video that'll rock your socks off on a MacBook Pro.
posted by Joey Bagels at 6:58 AM on February 21, 2012 [4 favorites]


I really do believe that we'll crack it one day, but currently:


Um, if you found a reel of 35mm film in 500 years you'd be looking at dust.

As far as we know (or as far as I understand what we know), polyester film, when stored properly, will not degrade. Obviously, proper storage is an issue. Nitrate, when stored properly, still looks amazing.

I get the problems of software changing. But not all software is lost to time. I can still read a JPEG from 30 years ago. I can play Atari games on my iPhone thanks to an emulator.

Yes, those are wonderful ways to experience items created in the past. Let's examine both of them:

JPEG
The JPEG happens to be an incredibly widely understood format. I would wager that, without some kind of catastrophic loss of information, we'll be able to understand JPEGs in 100 years. However, unless something was created as a JPEG, they can only be considered lossy formats. They're not actually preserving anything, they're an imperfect replica. And if something does happen to have been created as a JPEG, we need to ensure that everything from that file, including any metadata that happens to have been recorded during creating, is readable. Otherwise we aren't interpreting that file faithfully.

There are plenty video tapes in unusual, widely used within broadcast and other non-consumer worlds. There is at least one major broadcasting studio with literally more tapes of some bizarre 80s format than they can ever possibly digitise. There are quite literally not enough compatible tape heads in existence to read everything they have, not even once.

Atari
Playing an Atari game on your iPhone is not really relevant in terms of archiving. It shows that there is an interactive representation of the game available on modern hardware, sure. But it's divorced from the original hardware. There are several schools of thought about how important that is to something that's "born digital", but I don't think anyone could possibly argue that it's how the creators intended for their game to be experienced. The high-res, progressive screen is miles away from an early-80s CRT television. The touch-screen controls are nothing like the original joystick controllers. So it's a fine way to experience it on your own terms, but it's not commonly thought of as preservation, and it's definitely ignoring the original artefacts.

Yeah, I guess a 4K master can be 48TB. But a 1080HD version in H.264 (such as that on youtube or vimeo) is less than 50GB. Is that sufficient as a "master" for future generations? On a basic level, yes. Maybe if I want to upgrade the film to Super-4D holographic projection someday, probably not.

No for both It doesn't even have to be for a crazy futuristic straw-man project - it's not sufficient as a master in any sense of the word. The amount of data lost is enormous. We're already starting to expect 4K in the cinema, and it's only 2012. Also, once again the original artefact is being completely ignored.

I understand why people think that digitising is the answer - I used to assume that myself. But it's mostly only currently useful for access, plus restoration in the rarified world of Enough Money To Throw Away On A Vanity Project. Thankfully we have Martin Scorsese for that.
posted by Magnakai at 7:06 AM on February 21, 2012 [1 favorite]


We got B&W film and microfilm that's over 100 years old that is still good.

I'd be surprised if we had much in the way of B&W movies on stock that's 100 years old that can still be run through a projector, actually.
posted by hippybear at 7:09 AM on February 21, 2012


In what format do you store your archive of digital data formats?

Is there a data format designed specifically and explicitly for archival and extreme longevity, something that will not only be likely to work on arbitrary reading devices of the future, but which will also be amenable to relatively easy deciphering starting from scratch?

Is it crazy to suggest something like a three part file where the first part somehow defined the alphabet and the format for a plain text document, the second part consisted of a concise plain text document containing text specifying the file format used for the third part, and the third part was then your (arbitrary format) data file, now all in one handy bundle for future data archaeologists.

Obviously there's no evidence at all that they work, but techniques not a million miles away from some of this were used in the Arecibo message.
posted by motty at 7:17 AM on February 21, 2012


I'd be surprised if we had much in the way of B&W movies on stock that's 100 years old that can still be run through a projector, actually.

Plenty of hundred year old films can run through a projector, but for obvious reasons, they're usually used as a master in a restoration shop to make a workprint for the same reason that it's possible, but not reasonable, to curl up on the couch and thumb through an original Gutenberg Bible.
posted by sonascope at 7:31 AM on February 21, 2012


Sorry about your fucking stutter too.

I bring out the best in everyone!

But note it said 500 years not 100. All media is ephemeral, even those stone tablets will be dust some day.

It reminds me of reading about the New York Times time capsule in 2000 - they were attempting to make a time capsule that would last for 1000 years, which seems impossibly long. They went through several strategies of trying to figure out how to make sure it still can be found in the year 3000 under unimaginable circumstances (nuclear apocalypse, alien attack, our minds being subsumed into the Matrix, etc.)

The solution they came up with, strangely, was to place it out in the open in the Museum of Natural History. Their reasoning was that the best chance it had of being found in 3000 was to entrust people to take care of it and have them pass it from one generation to the next. I think of the digital world as not being so different from all that - if you keep your digital copies safe in a vault they will die. If you put them out in the world they will last. I have Metafilter posts that will probably outlive me.
posted by fungible at 8:22 AM on February 21, 2012 [2 favorites]


Is there a data format designed specifically and explicitly for archival and extreme longevity, something that will not only be likely to work on arbitrary reading devices of the future, but which will also be amenable to relatively easy deciphering starting from scratch?

Enter: Optar.

Data storage on durable paper or film stock for the bitless future. Complete with error correction, and open source in C. Portable to your future trinary computer language of choice. 200KB per A4 sheet, you do the math for linear 35mm film. Etch into titanium and infill with enamel, for the really important stuff.

I am only half joking.
posted by helicomatic at 8:33 AM on February 21, 2012 [3 favorites]


It's actually a mercy that almost everything gets lost and forgotten. Without the memory hole humanity goes crazy in very short order from the oppression of remembering the past.
posted by jfuller at 9:26 AM on February 21, 2012


Mission Impossible: Ghost Protocol [...] That's what we're looking to protect

You know, I think human culture will survive if this isn't preserved in full resolution. :-)

Silliness aside, I'm not sure the fidelity matters that much past a certain point. Is it really super important that we see everything in uncompressed 4k? In an era where cellphone cameras take HD video, I can't see historians lacking for A/V data.

Many years from now, the resolution of movies will have improved several fold, and so will the image processing techniques. If you can save a 4k MP4, I'm betting that will be more than sufficient. The idea that one must experience something exactly as people used to down to the last mostly unnoticable pixel really isn't well supported, IMO.
posted by smidgen at 9:26 AM on February 21, 2012 [1 favorite]


If we could perfect the trick of archiving to the mitochondrial DNA of specially bred cockroaches and flu viruses then all our problems would be solved.
posted by rongorongo at 10:10 AM on February 21, 2012 [2 favorites]


Perhaps we should require any filmmaker to encrypt their master video files with encryption keys that will take about 95 years to crack (the length of the copyright term after publishing), then post them on the internet. Solved!
posted by RobotVoodooPower at 10:29 AM on February 21, 2012 [1 favorite]


I have Metafilter posts that will probably outlive me.

This message is intended for my great-grandchilden and beyond, searching for ancient words of the pater familias: 2012 is pretty great. Wish you were here.
posted by yeti at 12:15 PM on February 21, 2012 [1 favorite]


Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)

Torvalds, Linus (1996-07-20). Post. linux.dev.kernel newsgroup.
posted by mikelieman at 1:15 PM on February 21, 2012


Fascinating discussion. Here's what's going on at the other end of the spectrum:
Is it acceptable to destroy cultural objects as if they were land mines? This is a question faced by archivists in Germany, where many of the country's historical films were shot on explosive nitrocellulose. A bitter fight has broken out in Germany over whether the film should be preserved or destroyed.
posted by muckster at 1:26 PM on February 21, 2012 [1 favorite]


Or rather, I s'pose it's part of the same problem. It's late here.
posted by muckster at 1:38 PM on February 21, 2012


In an era where cellphone cameras take HD video, I can't see historians lacking for A/V data.

I think this is terribly myopic. We've been in the "digital era" for a rather short time but already, pixel density means that just about anything not-a-feature-film from the 80s and 90s (and before) will look not terribly compelling on a current-day standard widescreen-format television. You will have your display size effectively cut in half because you're looking at a cube through a long rectangle. Widescreen-format content may fare better for the time being, but even now remaster is piling up on top of remaster and we're only a few pixel dimension upshifts away from all but the most painstakingly expensive original masters, looking forward.

And I should add that A/V Data encompasses quite a bit that you might not be considering. How often is each frame captured? 24 times a second is the beloved film standard, but modern displays can do 60 or 120 frames (although some of this is marketing and misdirection). What's the expected display's pixel density? Upscaling can only do so much, a postage stamp will become pixel art at fullscreen (or, conversely, your file will have incredibly sharp detail--all three by two inches of it.) We're also not considering color space or model, interlacing, and so on. Have you tried watching, say, the first season of Stargate SG1 or Buffy the Vampire Slayer fullscreen on a 2560x1600 monitor? It's blur soup. Theoretically we aren't seeing the quality the original master has (or maybe we need a remaster), but arguably that's part of the problem too.

Even considering all that, we're still only looking at what we have right now. We can't really predict the big changes--3D, curved display, holograph, virtual reality, procedural video, optical nerve interfaces, who knows. Historians will always be lacking for data, a historian that doesn't want more data must be in an understaffed department.
posted by Phyltre at 2:07 PM on February 21, 2012


I'm not sure the fidelity matters that much past a certain point. Is it really super important that we see everything in uncompressed 4k?

Nah. Come to think of it, the Louvre can probably get away with replacing the collections with inkjet prints, too.
posted by Thorzdad at 2:36 PM on February 21, 2012


Have you tried watching, say, the first season of Stargate SG1 or Buffy the Vampire Slayer fullscreen on a 2560x1600 monitor? It's blur soup.

Can't speak to Stargate, but the first season (or two) of Buffy was shot on 16mm film, so even the best possible digital scan is probably gonna look not-great.
posted by Mister Moofoo at 4:05 PM on February 21, 2012


We have a lot of data, but only a tiny percentage is worth preserving.

Masters of movies with amazing-for-2012 special effects? Yeah, people will sure be all open-mouthed at those in 100 years. Actually the image quality of a movie has almost nothing to do with how good it is.

I'm sure there will be enough data from the 20th and 21st centuries available to future civilisations to get a good flavour of what life was like now, but I'm far from convinced that any humans at all in 2534 will benefit from pristine archives of every episode of Baywatch, for example.
posted by dickasso at 4:06 PM on February 21, 2012 [3 favorites]


I'm so glad dickasso gets to choose this.
posted by jscott at 4:52 PM on February 21, 2012 [2 favorites]


More food for thought: Are Shakespeare's plays still with us because they were written on really awesome non-degradable paper, or because people have made so many copies?
posted by fungible at 6:28 PM on February 21, 2012 [1 favorite]


"I can still read a JPEG from 30 years ago."

Dude, some of us can actually remember when the JPEG standards were introduced, and it wasn't thirty years ago. Try twenty.
posted by Ivan Fyodorovich at 7:14 PM on February 21, 2012


Well, that's a peculiar comparison. Paper of various sorts has been known to survive for centuries. Nitrate film stock, which much of the early days of movie making used, is highly volatile and degrades quickly unless stored pretty carefully. Anyway, copies of the First Folio's first edition do still exist, so yeah, awesome non-degradable paper is part of it.

In any event, books get printed and reprinted, and even completely new editions of books don't mean you've lost anything meaningful as long as the text is accurate.

The text of movies is images, and they are much more difficult to preserve. Negatives degrade over time, too, and many films we have today we don't even have negatives for. This means we're already making duplicates which aren't true to the original, because the nature of copying a film from a print already introduces loss into the equation. With some projects, such as the restoration of The Wizard Of Oz from a couple of years ago, they were lucky enough to have well-preserved three-strip Technicolor nitrate stock to make their working scans from. That's a true rarity when it comes to film preservation and restoration.

Anyway, film stock simply doesn't last forever. Even modern film stock suffers from various problems, either color fade or other problems, often with steps taken to stop one causing others to accelerate.

Ultimately, going back to your comparison, if we were actually publishing movies like we do books, we'd actually be creating new film prints (or even new negatives, which suffer some quality loss but not as much as making prints from prints). Turns out, we hardly ever do this with movies. The same prints get used for decades until they're too badly damaged to send to theaters for revivals, etc. That's the opposite of Shakespeare's plays, which are published and republished without degradation so there are always new copies which are effectively identical in content to the First Folio.
posted by hippybear at 7:20 PM on February 21, 2012


I don’t know about movies, but most albums recorded today, no matter how big the act, are stored on a hard drive, usually with one back up. And they sit on a shelf. And they’re usually in a proprietary format (the tracks, not the final mixes). So I wouldn't count on any mixes or remixes 20 years down the road.

I have music projects from 10-15 years ago I would have trouble bringing up, and I’ve kept all the equipment.
posted by bongo_x at 7:53 PM on February 21, 2012


I heard that the Smithsonian was preserving music recordings on vinyl, because they’re smart and stuff.
posted by bongo_x at 7:54 PM on February 21, 2012


The fact of the matter is this: information is inherently fragile. Imagine a picture puzzle. There is one and only one arrangement of the pieces that forms a total, complete image. All other arrangements are nonsense, and every time you shake the box you end up with another, nonsensical arrangement of pieces (or bits). Ordered states are vastly outnumbered by disordered states. This, in a nutshell, is ENTROPY, or the second law of thermodynamics. Law, not theory, meaning there's Jack Shit you can do about it...it's what makes time go forward, btw, so, y'know...knock yourself out.

so what can be done about it? I came her to post a very interesting and surprisingly candid interview with Jeff Bezos explaining how Amazon.com maintains 100% up-time. alas, my google-fu has failed me...anyone? The system is pretty clever and operates on the assumption that EVERY data recording medium is falliable. Basically, it's just an automated triple backup system. Every bit of data entered into the server farm is recorded on to hard drive and then that data automatically copies itself into two other locations and periodically pings the two sister copies (which are often in differing geographic locations) to see if the hard drives that they are copied to still exist. If one of them fails to respond, it copies itself to another location. That way, if a drive should fail, or become outmoded, it can simply be removed and thrown away. This of course assumes that civilization does not collapse, but then any hard copies printed out then become subject to the elements and can be seen as falliable as well. We are also fast approaching the point (if we haven't passed it already) where making 'archival hard copies' (in film stock, vinyl, and paper) of all the digital data currently in existence becomes flatly impossible due to the limited quantities of celluloid, oil, and trees on the surface and in the crust of planet earth.
posted by sexyrobot at 8:40 PM on February 21, 2012 [1 favorite]


Etch into titanium and infill with enamel

Or onto nickel and put in a box.

The system is pretty clever and operates on the assumption that EVERY data recording medium is falliable.

Most of the big data companies (have to) operate on this assumption. One famously had a process which would go around and randomly kill off servers in their production farms, just to keep the engineers from getting complacent. (And pour encourager les autres?)
posted by hattifattener at 11:23 PM on February 21, 2012


Optar sounds like a wonderful idea, helicomatic! A neat way of taking the complexity out of digital storage. If we were still building pyramids Optar could be the modern version of hieroglyphs. If you used metal you could probably get more than 200kb a page, but you might need some kind of microscope attached to the camera.
posted by Kevin Street at 12:55 AM on February 22, 2012


"Many years from now, the resolution of movies will have improved several fold, and so will the image processing techniques. "

Many years from now, consumer video will be banned for exactly this reason. Seriously, where are we going to put all this stuff? The human race will slowly become like those hoarders living in a house that's filling up with yellowing magazines and assorted treasures. There's a narrow path between the "living" space and the bathroom. Every single item in the house has value, and is worth keeping, but is essentially useless because even if you know where it is you can't get at it. (This scenario is based on the document retention policy at my workplace - YMMV.)

I like the suggestion above about encoding humanity's archives in cockroach dna. Or pet dna: "Darn! Season 3 of Gilligan's Island was lost when Miss Piggles was cooked and eaten by those hippies next door."
posted by sneebler at 8:21 AM on February 22, 2012 [2 favorites]


Is there a data format designed specifically and explicitly for archival and extreme longevity, something that will not only be likely to work on arbitrary reading devices of the future, but which will also be amenable to relatively easy deciphering starting from scratch?

PDF/A was kinda-sorta designed to do that. Exactly how well it implements that goal is controversial, to be sure. But it at least attempts to ensure readability down the line. The biggest difference from plain-ol PDF is that it requires that the documents be totally self-contained and not refer to any external references in order to render correctly. It's not a self-documenting format by an means, though.

Are Shakespeare's plays still with us because they were written on really awesome non-degradable paper, or because people have made so many copies?

It's interesting you bring that up, because Shakespeare is an interesting example of how distributed preservation can succeed in some ways while failing in others. The problem with digital-style preservation-through-copying is that there's a fidelity problem: it's extremely difficult to know with certainty that the version of Hamlet you can buy in a bookstore today is the exact version that the Bard penned originally or was performed at the Globe first. There are multiple versions of the play floating around, some radically different than others, and it's not at all clear which version one should consider authoritative. (The oldest extant version, Q1, is very different from what most people are familiar with today, and it's not clear if it's an abridgment of the familiar text, or if the other parts were added later on.)

So while the reason we're familiar with Shakespeare is because his work was and is so widely disseminated, when scholars attempt to figure out what he actually wrote and what was added later, they typically have to go back to big monolithic library-style archives -- in the case of Q1, the Stationers' Register, an early copyright registry.

I don't really see how any digital system necessarily solves this problem. Yes, you can perform some sort of checksumming or hashing to produce a signature of the contents, but that by itself doesn't really prevent modification -- someone can just modify the contents and change the checksum/hash, and how are you to know which is the correct checksum/hash value? You don't, unless there's some big database that you can verify it against to see what the correct value is supposed to be. Perhaps that database could be distributed (although that introduces its own set of problems, e.g. what's the incentive for nodes to participate as the database gets very large? how do you ensure the integrity of the information against hostile nodes?) but it's still a central repository.

Personally, I think that while digital media is great for disseminating a whole lot of copies of something out into the wild, something like analog microfilm/fiche makes much more sense for the central repository. Even if the retrieval rate out of the microfilm is very low (i.e. most retrievals happen from digital copies) having them around would be worthwhile even if their practical purpose is generally only to validate the provenance of something when it comes into dispute. Naturally, they'd also protect against less common but more catastrophic scenarios, like Skynet, EMP war, solar flares, etc.

Like Sonascope, I've done some work with microfilm and it is truly wonderful stuff. The cost of keeping a huge offline archive on microfilm is very low -- basically it's just keeping a room at a constant temperature and humidity versus the outside ambient. (And unlike a bunch of servers, a drawer full of films doesn't produce any heat.) The reading apparatus is much simpler than what you need for any digital offline-storage system, such as tape or MO or optical, and you can't get screwed by some vendor suddenly EOLing their products -- something that's recently happened to a lot of places with big investments in tape and MO.

Digital is not going anywhere, and it certainly has advantages, but big centralized repositories filled, hopefully, with low-tech analog originals and reproductions, will still have a place indefinitely.
posted by Kadin2048 at 1:29 PM on February 22, 2012 [3 favorites]


« Older Spray-on Nanoparticle Mix Turns Trees Into Antenna...  |  Hey, Michel Foucault... Newer »


This thread has been archived and is closed to new comments