Digital obsolescence is more deadly than degrading film stock ever was.
March 2, 2014 10:33 AM   Subscribe

Film preservation 2.0 Unless the unique challenges of digital preservation are met, we run the risk of a future in which a film from 1894 printed on card stock has a better chance of surviving than a digital film from 2014.
posted by mediareport (105 comments total) 40 users marked this as a favorite
 
Our entire civilization is being written in the sand. One good gust of wind - from the Sun - and most if it is kaput.
posted by codswallop at 10:41 AM on March 2, 2014 [12 favorites]


I've been wondering what studios were going to do about keeping digital films every time a new Doctor Who tape turned up or a new hoard of old movies thought forever lost was discovered. I suspect in the (not too distant) future, we'll get some stories of digital artists doing the same kind of reconstruction with damaged data files that we see with the degraded tapes of old Who and vintage films.
posted by immlass at 10:41 AM on March 2, 2014 [1 favorite]


This is compounded by the fact that, arguably (i.e. I have no real data), far more films are produced now than even 20 years ago.
posted by psolo at 10:44 AM on March 2, 2014


The fact that we have a mostly-restored version of Metropolis now is only because of the ability for film stock to survive, ignored, in storage.

We are going to lose a huge amount of our culture because we prefer digital. Future generations will have a giant blank to look back upon. It's sad, but true.
posted by hippybear at 10:45 AM on March 2, 2014 [2 favorites]


so it goes
posted by thelonius at 10:53 AM on March 2, 2014 [6 favorites]


I know that this may be more of a question for AskMeFi, but after reading this article, I have to ask - how exactly can one get into film preservation as a career?
posted by pipian at 10:54 AM on March 2, 2014


The rapid obsolescence of the Linear Tape-Open technology seems to be a key sticking point:

Since 2000, new generations of LTO technology have been released every two years or so—new tapes and new drives—and they’re only backward-compatible for two generations...The practical result of this is that a digital film archive needs to invest heavily in data migration to maintain its assets. Every five years or so, each film needs to be copied to new media, in a constant race against magnetic-tape degradation and drive obsolescence. This requires time and money: new tapes, new drives, staff to copy and verify the data...

The part about studios not pushing for a "true digital archival format" is interesting, too:

[T]he motion-picture industry should push for the development of a true digital archival format, capable of surviving 100 years of benign neglect...[But] neither the manufacturers of LTO tapes nor the bulk of their clients have much incentive to develop a true archival format. Tape and drive manufacturers thrive on planned obsolescence, just like the rest of the computer industry. And companies who purchase LTO tapes for backups not only don’t need the data preserved longer than the seven years required for Sarbanes-Oxley compliance, they don’t *want* it preserved longer, because as long as the data exists, it’s discoverable in lawsuits. It’s possible that a digital format that meets these archival requirements will someday be developed—but it’s unlikely to happen anytime soon.

Not sure about that lawsuits bit, but studios banding together to create a format that didn't require constant migration for preservation seems very doable and something that should be being done, now.
posted by mediareport at 10:55 AM on March 2, 2014


Their copyrights will survive though.
posted by srboisvert at 10:57 AM on March 2, 2014 [31 favorites]


Currently your best bet to get a copy of any form of media is to hunt for it on torrent sites. Despite the enthusiasm of copyright holders for trying to shut the thing down, torrent networks are a great way to back up and massively mirror media!
posted by Mokusatsu at 11:06 AM on March 2, 2014 [14 favorites]


I work very much in this field and the LTO obsolescence bit is so frustrating because on top of the generational issues they are also not universal cross- platform so you need the same drive AND the exact program to access any info on the tape.
posted by djseafood at 11:07 AM on March 2, 2014 [2 favorites]


It occurs to me that the stories of miraculous rediscoveries will shift from "print found in someone's basement" to "bluray rip found on someone's dropbox."

For someone who's more knowledgeable about sort of thing, what kind of form would a true digital archival format take? As alluded to in the article, early films were submitted to the copyright office with each frame printed on card stock, so new prints could be made from those. Properly preserved acid-free paper kept away from moisture and light can last hundreds of years, but surely that's not the answer when we're talking about tera or petabytes of data?
posted by arcolz at 11:09 AM on March 2, 2014 [1 favorite]


We are going to lose a huge amount of our culture because we prefer digital. Future generations will have a giant blank to look back upon. It's sad, but true.

There is no obsolescence with bits. When I look at the amount of truly lost gaming works from the first days of computing vs the number of lost film works, digital's success rate is much, MUCH higher. When we have people dedicated, if not obsessed, with the archival and completeness of those archives, along with the easily duplicatable nature of bits, that tends to happen.

I can take any video made within the last 23 years (since we started getting digital video encoders) and slap it into software and get playback. That's the nature of bits.

The digital community will succeed in archival simply because there's a collective will and availability. The reason we have a huge blank from the 1900s through to the 1950s is because there was a lack of availability. It wasn't easy for some schmuck to say "I'll take care of it" and get a free copy of a work.
posted by Talez at 11:19 AM on March 2, 2014 [18 favorites]


I've heard tell of compression formats that were "additive", i.e. you could have multiple low-quality rips of the same song which sounded the same to human ears, but actually contained different parts of the waveform, so that if you rubbed any two together out popped a slightly better sounding version. Can you do that with video? If so, I'd think to keep just one of the low quality rips on card stock, and the rest in as many *different forms* of digital backup as you can. For the risks you don't know you don't know.

Real pie in sky stuff here, but I wonder if one day a creative industry that has fewer self esteem issues would permit the sharing of just the low quality copies in this hypothetical format, so that you have to do actual work to construct one you want to watch--and the people doing that work are doing unpaid archival work.
posted by LogicalDash at 11:21 AM on March 2, 2014 [2 favorites]


There is no obsolescence with bits. When I look at the amount of truly lost gaming works from the first days of computing vs the number of lost film works, digital's success rate is much, MUCH higher. When we have people dedicated, if not obsessed, with the archival and completeness of those archives, along with the easily duplicatable nature of bits, that tends to happen.

Sure, if you have people who are working to keep the true 2K or 4K source alive and non-degraded. But ultimately, if you are talking torrented rips from layered media, you're talking about a basic degradation away from the true material. (Just like the newly rediscovered pieces of Metropolis are from a smaller-frame and lower-quality of stock from the existing original.)

Studios aren't doing anything to back-up the actual theatrical experience of their films. (And at least one studio is no longer striking ANY film prints -- others are going to follow.)

And what about films conceived to be presented in 3D? Gravity will be lost. Life Of Pi will be lost. Many others too. The source files for these films are not going to be available for the obsessed fans to archive in any format. They will simply go away.
posted by hippybear at 11:26 AM on March 2, 2014 [4 favorites]


The fact that we have a mostly-restored version of Metropolis now is only because of the ability for film stock to survive, ignored, in storage.

The fact that we have the restored Metropolis is just dumb luck. Film stock from that era is a terrible long-term storage media that not only degrades over the years but is horribly dangerous to store as it's so flammable.
posted by octothorpe at 11:29 AM on March 2, 2014 [5 favorites]


True, dumb luck played a lot into the discovery of the missing footage. But if it were never created and then ignored for so long, we would never have it today.
posted by hippybear at 11:32 AM on March 2, 2014


That doesn't work as an argument in favor of "the ability for film stock to survive, ignored, in storage".
posted by LogicalDash at 11:35 AM on March 2, 2014 [1 favorite]


a layered media like a CD or a DVD would never be available to give up data after as many years.
posted by hippybear at 11:38 AM on March 2, 2014 [1 favorite]


When we have people dedicated, if not obsessed, with the archival and completeness of those archives

One of Dessem's points in the article is that Hollywood studio executives are not that kind of people:

If the last decade has taught us nothing else, it’s that our system rewards executives who make horrible long-term decisions for short-term results. (See Jamie Dimon.) In the analog world, most of the cost of preservation is paid when the archival print is created. But for a digitally preserved film, the cost of migration shows up every five years. Postponing it is going to be tempting, especially during buyouts, changes in management, or any of the near-constant corporate turmoil that puts huge short-term pressure on cost-cutting. Films that continue to make money are probably safe, but for bombs—whether they were genuinely terrible or interesting failures—the incentives are all wrong. Putting a significant part of our cultural heritage in a system where a five-year gap in funding means catastrophic, irrevocable loss seems to guarantee we’ll lose some of it.
posted by mediareport at 11:46 AM on March 2, 2014 [2 favorites]


There is a lot of worthy stuff to save, but I am strangely ok with 80-90% of the movies over the last two decades or so not being available to our future selves.
posted by edgeways at 11:47 AM on March 2, 2014 [6 favorites]


I expect digital archivists would be keeping a very close eye on Tahoe-LAFS. Tahoe will deal with the problem of physical media degrading and going obsolete, because it allows you to set up a storage grid with content that remains fully available while you remove old servers and add new ones; you don't need to do anything special to keep your data moving onto new media - that just happens automatically as machines break down and get replaced.

It also allows the archivist to restrict access to authorized users, unlike the accidentally somewhat-high-availability storage grid provided by random torrent users.

Multiple terabytes of storage already cost an absolute pittance compared to what a movie costs to make. There's no reason I can see why, if it's reasonably easy to do, a studio shouldn't just archive every bit of footage shot and every bit of software used to process it and VM images of all the machines that run that software and every bit of intermediate material as well as the final distribution mixes.

So the ignored-for-years-and-found-accidentally items of the future could well be Tahoe access credentials. The underlying data will all still be there somewhere in the grid, part of that insignificant percentage of storage that happens to hold all the output of the decades previous to whichever this one is now (remember when movie files were all under fifty terabytes each? Amazing the performance they got out of those old things, and now here I am carrying sixty of them around with my car keys...)

But with the best will in the world, you know what the only piece of video that will ever actually be reconstructed five thousand years from now will be?

This.
posted by flabdablet at 11:54 AM on March 2, 2014 [4 favorites]


This can't be true because everybody knows digital = better than.
posted by entropicamericana at 12:07 PM on March 2, 2014


Dateline 2125 -- Researchers at the recently launched Quantum Cryptography Initiative announced the discovery of several long-lost keys, including the Tahoe-LAFS keys to several early digital era motion picture archives and the long-lost cold storage key which was scrambled by first generation Bitcoin broker Mt. Gox.

"These films, which include such long unseen classics as The Final Destination, Saw IV, and 300 will provide a long missing window on early twenty-first century film," an excited QCI spokesman said.

As for the bitcoin chain, once worth millions of Standard Work Days, the researchers said they were fortunate to get a bit of a discount so they could use it to order a symbolic pizza. "The legacy of the long-lost Mt. Gox is Canadian Bacon," the spokesman for the Quebec based QCI team said with a sly grin as he closed the press conference.
posted by localroger at 12:09 PM on March 2, 2014 [12 favorites]


The solution is simple: splice the digital files into the junk DNA of flora and fauna, which our children can then spread all over the earth.
By saving our past we can save their future.
posted by weapons-grade pandemonium at 12:18 PM on March 2, 2014 [3 favorites]


Sure, if you have people who are working to keep the true 2K or 4K source alive and non-degraded. But ultimately, if you are talking torrented rips from layered media, you're talking about a basic degradation away from the true material. (Just like the newly rediscovered pieces of Metropolis are from a smaller-frame and lower-quality of stock from the existing original.)

Most of the time these rips are good enough. If you can at least freeze a piece of media at a certain quality you can keep it in tact. I know there's going to be a lot of music videos where the only quality you'll be able to get is a 240p MPEG-1 rip from analog TV. But a lot of natively digital media is still going to be in fairly pristine condition.
posted by Talez at 12:23 PM on March 2, 2014


Most of the time these rips are good enough.

Define "good enough".

The recent restoration of Lawrence Of Arabia which was presented in theaters last year.. that was astounding. Is there anything about those rips which could possibly match the quality of that presentation?

Or do you mean, by "good enough", that someone might be able to watch the film on a home device and get the gist of the media without actually experiencing it in the way was meant originally?

"Good enough" is not good enough. Actual archival of modern media means being able to present it in the form it was intended for. Anything less is loss.
posted by hippybear at 12:29 PM on March 2, 2014 [6 favorites]


This topic seems to come up a lot, but can anybody point to an actual example of a work that has been lost due to a file format issue (i.e., where they have all the bits but don't know how to interpret them), or a digital work created after the internet that has truly been lost (i.e., is unobtainable anywhere)?

Or do you mean, by "good enough", that someone might be able to watch the film on a home device and get the gist of the media without actually experiencing it in the way was meant originally?

Modern films are created with the intention that they'll be seen at home as well as theaters. Seeing one on a 'home device' is experiencing it in an originally intended way.
posted by Pyry at 12:33 PM on March 2, 2014 [2 favorites]


A few years ago I attended a lecture by one of the leading experts in audio archiving in the world. The moral of his lecture was, keep your analog reel to reel tapes. Proprietary format/codec obsolescence, evolving storage technologies (could you, easily, get stuff of a SCSI-cabled tape backup drive? How about in 20 years?), and the emerging fragility of optical discs at the time (he believed even "archival" grade optical discs were safe for no more than 25 years, no matter what the disc-makers were saying about their testing), as well as the digital degradation that happens when files are transferred and formats migrated, all pointed to a wild level of over-confidence in digital storage by the archivists of the turn of the millennium.

Scared the shit out of me.

On the other hand, why do we need to save everything? Seems like we have always exercised discrimination due to the limits and costs of media in the past. Only the apparently bottomless capacity of digital technology for cheap storage (and cheap production), and the hope that Moore's law never stops applying, gives us the hubris to imagine everything can be saved, everything should always be available instantly everywhere in the world, etc. Our ancestors had to choose carefully what they inscribed for the future. The grumpy old man in me wishes we were addressing the question of what's *worth* preserving a little more broadly in our culture.
posted by spitbull at 12:33 PM on March 2, 2014 [2 favorites]


This archival worry just seems to be a hangover from the 90's when there was a horrible mix of incompatible HDs, Zips, floppies and CDs. It is incredibly easy now to do digital backups and it is only going to get easier. 'Cloud' storage has abstracted away hardware so all you need is to store the decoding algorithm (mpeg or whatever) with the data.
posted by bhnyc at 12:34 PM on March 2, 2014 [2 favorites]


But it isn't really about hardware, and those encoding and decoding algorithms have not yet been proven durable, as the long list of now deprecated or unsupported prior codecs demonstrates. The time scale of a few decades is not nearly enough to be sure about long term sustainability. Technology has changed very fast since the 1980s, and just because personal data storage is less diverse or hardware dependent than it used to be doesn't mean someone, somewhere, doesn't have to maintain complex hardware and software configurations, at an increasing cost as the older forms become obsolete. Forward migration of formats and raw data is of course the answer, but the point of this article and thread is that migration *itself* degrades even digital data.
posted by spitbull at 12:37 PM on March 2, 2014


A 1080p rip from a Blu Ray could be projected on a big screen and you'd probably never notice the difference. Most theaters that do digital projection are still doing 2k, which is very very close to 1080p anyway.

I'm not saying it's not worthwhile to archive at higher quality, but the degradation you'd face if you still had a Blu Ray disc would be many orders of magnitude lower than what we happily accept for older films today.

As for the eternal boogeyman of "Hollywood executives" - they are businessmen worried about the short-term bottom line, most of whom will be fired or move on from any given job within a few years anyway. Of course long-term archiving should not be entrusted to them.

If film stock was actually the best method for long-term preservation, there's no reason you couldn't make film prints of movies originally projected on digital. But I don't see any real evidence of this being the case.
posted by drjimmy11 at 12:37 PM on March 2, 2014 [4 favorites]


There is a lot of worthy stuff to save, but I am strangely ok with 80-90% of the movies over the last two decades or so not being available to our future selves.

The non-academic part of my personality understands this sentiment. The academic part, which works on novels which sometimes exist in exactly one copy, thinks that there is often much to learn even from "bad" creative works, and would be sad to see even Scream XXXIV disappear.
posted by thomas j wise at 12:39 PM on March 2, 2014 [6 favorites]


Life Of Pi will be lost

If only.
posted by drjimmy11 at 12:40 PM on March 2, 2014 [4 favorites]


And what about films conceived to be presented in 3D? Gravity will be lost.

I'm not sure I understand what the issue here is - why would we not be able to preserve a stereographic 3D film? It just requires preserving two film tracks instead of one, sufely?
posted by Jimbob at 12:41 PM on March 2, 2014


On the other hand, why do we need to save everything? Seems like we have always exercised discrimination due to the limits and costs of media in the past.

Not even remotely true when it comes to the history of cinema. It was almost entirely happenstance.

from Wikipedia:
Most lost films are from the silent film and early talkie era, from about 1894 to 1930.[3] Martin Scorsese's Film Foundation estimates that over 90 percent of American films made before 1929 are lost.[4]
The largest cause of silent film loss was intentional destruction, as silent films were perceived as having little or no commercial value after the end of the silent era by 1930. Film preservationist Robert A. Harris has said, "Most of the early films did not survive because of wholesale junking by the studios. There was no thought of ever saving these films. They simply needed vault space and the materials were expensive to house."[5]

...

Before the eras of television and later home video, films were viewed as having little future value when their theatrical runs ended. Thus, again, many were deliberately destroyed to save the space and cost of storage; many were recycled for their silver content. Many Technicolor two-color negatives from the 1920s and 1930s were thrown out when the studios refused to reclaim their films, still being held by Technicolor in its vaults. Some prints were sold either intact or broken into short clips to individuals who bought early novelty home projection machines and wanted scenes from their favorite movies to play for guests or family members.
As a consequence of this widespread lack of care, the work of many early filmmakers and performers has made its way to the present in fragmentary form. A high-profile example is the case of Theda Bara. One of the best-known actresses of the early silent era, she made 40 films, but only three and a half are now known to exist. Clara Bow was equally celebrated in her heyday, but twenty of her 57 films are completely lost and another five are incomplete.[8] Once-popular stage actresses such as Pauline Frederick and Elsie Ferguson who made the jump to silent films are now largely forgotten with a minimal archive to represent their careers; fewer than ten movies exist from Frederick's 1915-1928 work, and Ferguson has just two surviving films, one from 1919 and one from 1930. This is preferable to the fate of the stage actress and Bara rival Valeska Suratt, whose entire film career has been lost. Western hero, William Farnum, like Bara and Suratt a Fox player, was one of the four big Western actors rivaling William S. Hart, Tom Mix and Harry Carey. Farnum has about three of his Fox films extant.
posted by entropicamericana at 12:41 PM on March 2, 2014 [3 favorites]


The article sort of espoused LTO tapes for long-term storage, and I felt like running some numbers. I glanced at prices on Amazon and Newegg. For a given volume, LTO-5 tapes (10 for $253, 1.5 terabytes on each) cost around half of the equivalent in three-terabyte hard drives ($102). The cost of LTO drives is sort of ridiculous, though: into four figures, with the cheapest LTO-5 drive being around $1770. With that in mind, the crossover point for tapes being cheaper is around 120 terabytes.

Pros and cons: The tapes would be offline, which would both insulate them from, say, glitches or a virus, but would also take time to retrieve them. The LTO drives are apparently pricey, with replacements for drives and media expected around every 5 years. And I'm sure that 120 terabytes is simply small potatoes for a *lot* of places.

For redundancy, I should probably add at least one LTO drive, and do RAID-Z2 on the hard drive side, but one LTO-5 drive costs as much as 17 of the hard drives, so it's *really* academic for me from here.

Oh, and re: codec obsolescence: if you're aiming at very long-term archival, you could backup a VM that has the codecs.
posted by Pronoiac at 12:47 PM on March 2, 2014


Jimbob: the people who actually have control of the 3d version have no intention or realistic plan for archiving it properly.
posted by idiopath at 12:48 PM on March 2, 2014


a layered media like a CD or a DVD would never be available to give up data after as many years.

My guess is that a mass-produced CD-ROM — not one made on a CD-R drive but one actually stamped from a glass die — is very close to archival, if properly stored. It's polycarbonate with a thin layer of aluminum, topped with some lacquer to keep the aluminum from oxidizing. It's basically a mechanical format: the data is stored via the physical configuration of the aluminum sheet. As long as you don't physically destroy it, it seems very stable. Putting it into a sealed bag with an oxygen absorber would eliminate much of the oxidation risk, or you could use a metal like gold that doesn't oxidize (although I don't know of any commerically-pressed discs that use gold).

And if you repeat the data many times over the disk (redundancy), in particular keeping it away from the disk edges, that should give further defense against damage and oxidation. Overall it seems like a pretty good choice, provided other optical-disk formats exist which are backwards compatible all the way back to good old Red Book audio CDs.

CDs and DVDs have a bad reputation as an archival storage medium because of some very low-quality CD-R media produced (mostly in China) in the 90s that had dye fading issues. That's not an issue if you're talking about commercially-pressed discs (which don't have a dye layer), and it's mitigated somewhat if you do need CD-Rs by using really good discs with better dye and adhesive to bond the layers. Still, I wouldn't ever think of CD-Rs as an archival medium. But pressed discs are another story.

can anybody point to an actual example of a work that has been lost due to a file format issue

I've personally seen situations where the cost of recovering data in an old, obscure format was so high that it was not economically feasible to use it, and the business ended up recreating it instead. We're not talking about cinema here, just business data, but it's still data loss. I suspect stuff like that happens all the time, and there's no real reason why it couldn't happen to films.

The resources available to recover a particular film off of an old LTO tape are never going to be unlimited. At some point, if it's too expensive to construct a drive to read it, and then hire programmers to parse the data into something usable, it's effectively lost even if the media is still okay.

I personally think that encryption is a bigger threat to data integrity than file format issues though. You can reverse engineer most file formats by taking a look at them and trying to understand the challenges that the original creator of the format was trying to solve. But if data is encrypted (with a good encryption scheme) and the key has been lost, it's gone.
posted by Kadin2048 at 12:58 PM on March 2, 2014 [4 favorites]


Right, well I think in the early days of a new medium, people don't know what is worth keeping. But that "happenstance" reflected institutionalized values at the dawn of the era of film and recording. Books are a better example. Even the greatest library in the world does not keep every word ever written.
posted by spitbull at 1:00 PM on March 2, 2014


The solution is simple: splice the digital files into the junk DNA of flora and fauna, which our children can then spread all over the earth.
By saving our past we can save their future.




That's all well and good until the Gigliolus and the Jackassodils come for your family on their slender, clicking stalks.
posted by TheWhiteSkull at 1:09 PM on March 2, 2014 [11 favorites]


Oh, and re: codec obsolescence: if you're aiming at very long-term archival, you could backup a VM that has the codecs.

But don't you then need to have an archive of hardware to run the VMs? It will be VMs all the way down.
posted by njohnson23 at 1:22 PM on March 2, 2014


My guess is that a mass-produced CD-ROM — not one made on a CD-R drive but one actually stamped from a glass die — is very close to archival

The very first commercial CD I ever purchased, Yes - 90215, back in its first release wave in 1983, will not read in any player I try it in, and hasn't for about 5 years now. For years before that it would read with errors which introduced significant flaws during playback.

Any layered medium is not an archival medium. The simple fact that it is layered means it can and will ultimately fail.
posted by hippybear at 1:29 PM on March 2, 2014 [2 favorites]


Pros and cons: The tapes would be offline

This is a big "pro". Or at least it's one of the big selling points for the few remaining vendors of tape backup equipment.

If you need to store a lot of data — and I'm not even just talking about movies here, I'm talking about stuff like phone company billing records and the like, where you're churning out TB per day, all the time — but the retrieval rate is very low, hard drives don't look as good because they require electricity and maintenance to operate. Also, keep in mind that many places view storage media as a one-shot device: you write to it and then a few years down the road, it goes into a shredder or a furnace, it doesn't get reused. So the advantage of being able to reuse a hard drive doesn't matter to them.

Hard drive based storage systems are getting more and more popular, but they still take up power and space in a datacenter somewhere. The selling point of an LTO system is that it kicks out a tape periodically, you put the tapes in a box, send the box to Iron Mountain, and then (hopefully) never, ever see them again.

Since the same underlying technology that has allowed hard drives to get cheaper per MB also applies, broadly, to tape drives (they're both magnetic storage after all, just using different substrates), I don't expect tape to ever really die. The mass market appeal of hard drives is largely what makes them as cheap as they are; if tape weren't limited to an "enterprise" product (or if hard drives were, or if you just look at "enterprise" hard drives), tape would probably win big in cost/MB terms because of the cheap substrate.

Most businesses that use tape drives aren't buying new ones every 5 years either, so the backwards-compatibility problem outlined in the article isn't as big a problem as it appears on the surface. They know (basically) the volume of data they need to take in and archive, plus the growth rate, and they buy based on that with the idea of keeping the drives around for a while. IBM is still quite happy to sell you brand new LTO-4 drives, which will read LTO-2 tapes (introduced 2003 — so that's a 10 year read life with brand new equipment), and LTO-1 drives are still under support. The lifespan of these things is pretty long. I don't know about other vendors, but IBM didn't stop producing 9-track open reel drives (format introduced 1964) until 2003; there are probably still some of those under extended support somewhere.

If the movie studios have a problem reading tapes they created in 2006, it suggests they have a real problem in their IT acquisition process and an inability to pick a technology and stick to it; it's not a technology problem so much as it's a giving-a-shit problem.
posted by Kadin2048 at 1:40 PM on March 2, 2014 [4 favorites]


as a JPEG and the audio stored in a similarly ubiquitous format that will always be able to be read

I wouldn't be too sure of that. Are you actually familiar with formats like JPG and MP3? If we go through a period like the Dark Ages where the documentation is lost, even if some such files survive they will be indecipherable gibberish to our descendants. Hell, even WAV is likely to be indecipherable gibberish what with the headers and block links and such.

Even if we don't go through a Dark Age, file formats and media technology are evolving at a rate which makes any archival standard unworkable. I remember when the bee's knees was the SyQuest 44-then-88 Mb 5.25 inch hard disk cartridge. They were a common staple in diplomats' pouches, and I only ever needed three cartridges for my own external drive. They were far ahead of their time when they came out (when the next best thing was a 1.44 megabyte floppy) and they had a good run. But today you can fit the equivalent of 2000 of those cartridges on a micro-SD chip the size of a postage stamp. Some time back I took my drive apart for parts and chucked the carts.

And one day SD and uSD will be just as obsolete, supplanted by something denser or more durable or lower power, and the readers for the old stuff will become expensive curiosities that won't work without a lot of hacking, and the wheel will turn again.

Meanwhile, you can look at a filmstrip and tell what is needed to read it. You can even look an an analog audio tape, if you have an inoperable player, and figure out what you need to know to make an operable player. But an obsolete disk medium or file format, particularly one scrambled by forgotten DRM, is just garbage.
posted by localroger at 1:41 PM on March 2, 2014 [3 favorites]


1080p is good enough for me.
posted by zscore at 1:49 PM on March 2, 2014


1080p is good enough for me.

What's it like being a poorly sighted savage?
posted by entropicamericana at 1:56 PM on March 2, 2014 [5 favorites]


The practical result of this is that a digital film archive needs to invest heavily in data migration to maintain its assets.

A big part of where this problem seems to go off the rails is that migration needs to be expensive or that permanent media forms are desirable.

From a sheer economic standpoint, this seems completely wrong and misses the point of the revolution that has happened in storage. Archivists pretty much need a filesystem and a specific-format-neutral scheme for multiple layers of data redundancy and recovery (e.g., erasure coding at the file level as well as measurements of the files to detect if and when they have been corrupted) and they need heterogeneity in terms of filesystem providers, but the idea that they need some specific physical media is a hangover of a previous era. They furthermore need to include the specifics of the format (a useful way to do this would be to include an example of source code which can read the format and produce an audio stream with timecodes and individual frames - this is rather trivial to do and no, bitmaps will never, ever become outdated). Video and audio are the _least_ complex problem in this space (in comparison, I have data files from a BBS I ran in the 1980s and the content is basically no longer accessible).

In a lot of ways, the whole mindset seems to be mildly technophobic or or techno-ignorant or at the very least deliberately creating a set of hurdles and requirements which aren't there.
posted by rr at 2:11 PM on March 2, 2014 [7 favorites]


I wouldn't be too sure of that. Are you actually familiar with formats like JPG and MP3? If we go through a period like the Dark Ages where the documentation is lost, even if some such files survive they will be indecipherable gibberish to our descendants. Hell, even WAV is likely to be indecipherable gibberish what with the headers and block links and such.

What problem are you trying to solve? Stone age people should be able to watch Hot Tub Time Machine? Total loss of knowledge of programming languages?
posted by rr at 2:13 PM on March 2, 2014 [4 favorites]


Talking about hardware is really a distraction from the two main problems affecting preservation: copyright terms and digital format obsolescence. If you have bits, you can make copies of them easily – and while there are serious questions for anyone with a great deal of physical media, hedging your bets with multiple formats is a well-understood problem. You can make copies, build infrastructure for automatically distributing them, periodically test that the bits on any one storage pool are what you expected, etc. There are challenges but they're generally manageable and they're the challenge which the tech industry is the best prepared to address.

(There's a side issue which is that often the best-preserved material will by a lower-quality access copy (e.g. a DVD) rather than the higher-quality original files because the master files are larger and thus not always retained to save space and simply because something which was distributed widely is much harder for a single natural disaster to destroy. The good news is that if something was sufficient quality to distribute, it's usually acceptable for most future uses)

The problem gets harder for reading that data as formats fall out of popularity and thus support. The major problem here is that any format using lossy compression is generally the final step and cannot be ported to a new format except by either losing quality or dramatically increasing file sizes by transcoding to a loss-less codec. For some cases (e.g. those early SGI Indeo files) a modern lossless codec may do well enough that the bloat is acceptable but for most content you really want something which can play back the native video format. Yes, you can archive software and VMs but at some point you're going to be looking at a matryoshka VM setup with multiple layers of emulation so you can e.g. run Windows XP to watch a late 90s Real Video file. That requires a chain of software to be preserved – which usually raises issues of licensing or redistribution – and increases the barrier to entry for anyone who wants to view a particular item.

Obsolescence is a harder problem than bit-level preservation because there are more unknowns and more arcane technical work required to address them but that's still somewhat viable except for a big challenge: DRM and copyright restrictions in general. DRM complicates matters on two fronts: it's usually legally challenging for anyone other than the owner of the content to preserve it and because there isn't much of a legal economy, tools are less common or not focused on what the preservation community needs. This isn't a big deal if the rights holder is motivated and preserves the original masters but it does complicate matters considerably if the rights holder refuses or simply cannot be located to grant permission. Diversity in collections is really important because of the risk of short-term financial decisions: a company can preserve its archive competently for decades and then lose half of it overnight if they have a bad year and someone decides to cut their archival costs by jettisoning a bunch of less popular material.

These two problems are significantly compounded by a third, less specific problem: everyone's funding is shrinking. Every one of the problems above requires skilled staff and money to solve and that's been shrinking for awhile. As you preserve more data simple storage management ensures that a larger percentage of your budget will be required to maintain what you already have, reducing the amount of money available to develop tools for cumbersome formats or make access for researchers easier. In many cases, this is compounded by copyright restrictions: it's often possible, even easy, to find private donors who would fund preservation of specific items but interest plummets if you're forced to maintain a dark archive where the material cannot be viewed at all or only on-site due to legal restraints.

Looking on a broader scale, these challenges produce a long-term threat because we don't have enough people in the career pipeline: without more options to perform this activity legally, why train digital archivists if jobs are limited and half of the tools are legally dubious?

If anyone's interested in the broader preservation issues, the US Library of Congress has a great report on audio which talks about the risks: The State of Recorded Sound Preservation in the United States: A National Legacy at Risk in the Digital Age. It's focused on audio and also includes the huge at-risk collection of analog recordings but there are quite a few common themes. The most important point in my opinion being the need to lobby for copyright reform to allow libraries and archives to coordinate preservation efforts and some legal way to bypass DRM for preservation purposes.

(Disclaimer: while I work for the Library of Congress, I don't work directly on digital preservation and am speaking for myself)
posted by adamsc at 2:18 PM on March 2, 2014 [15 favorites]


Speaking of the break-down of film--here's a reel in "stage 5 deterioration" that looks like it's turning into a pancake.
posted by blueberry at 2:33 PM on March 2, 2014 [2 favorites]


But don't you then need to have an archive of hardware to run the VMs? It will be VMs all the way down.

Not really?

At some point you do need to keep your retrieval technology up to date, even if not with the kind of racket that keeps the LTO tape drive industry going. I can think of some hilariously impractical exceptions if the frames were going to be turned into Microfiche or something like that, where the reproduction step is literally "shine flashlight this side". (ie. not actually a bad idea if you can afford it and I feel comfortable assuming you can't.)

So, since you're going to have computers of some description running software of some description to retrieve data in some format at some point in the future, you're going to need to port the old software that runs the old data retrieval techinque.

If the software is instead written for a machine that doesn't exist (eg. Java) or that you have so much reference material for, in writing, that you can still manufacture the machine if needed (DOS), then it's easier for everyone involved if the software is written for the machine, and then whenever there's some technical upheaval or other everyone throws all their money at a porting project for the machine. It scales better that way.
posted by LogicalDash at 2:39 PM on March 2, 2014


If you need to store a lot of data — and I'm not even just talking about movies here, I'm talking about stuff like phone company billing records and the like, where you're churning out TB per day, all the time — but the retrieval rate is very low, hard drives don't look as good because they require electricity and maintenance to operate.
The challenge to this is that tape isn't particularly reliable and so what you're really talking about is making a multiple copies of tapes and periodically testing them so you can recreate corrupted data before you lose more than one copy, which requires extra copies and technicians. This can still beat HDD storage on cost but it requires massive scale to balance out the cost in staffing, and great entry cost for tape drives and software. You can do this if you have enough data — currently probably 10+ petabytes — but it's a tough trend to fight, particularly with services like AWS Glacier offering tape-like pricing without the high fixed base costs.
posted by adamsc at 2:41 PM on March 2, 2014


If we stored the spec for x86, the designs for DOS, a copy of a DVD, the decryption to access the data behind the DVD's DRM, and a player that runs on DOS and can play the movie, how many patents, trademarks, copyrights, and software licenses would we be breaking? How many instances of violating the DMCA per document?
posted by idiopath at 2:50 PM on March 2, 2014 [4 favorites]


But it isn't really about hardware, and those encoding and decoding algorithms have not yet been proven durable, as the long list of now deprecated or unsupported prior codecs demonstrates.

This doesn't make sense -- it's software, there is no such thing as a "durable" codec. "Unsupported" means Windows 27.1 won't have it installed, but the format doesn't wither away like tape stock, any more than the rules behind linear algebra fade with time.

If we go through a period like the Dark Ages where the documentation is lost, even if some such files survive they will be indecipherable gibberish to our descendants.

I don't think this is true. The reason I don't think this is true is because of the the 80's and 90's software cracking scene, the retro computing scene, the Free Software scene and the industrial espionage "scene" -- where you have people basically doing exactly what you describe with impunity -- right now. It's just bits -- maybe some formats are harder than others, but if our descendents can *read* the bits, then I'm pretty sure that not only will they be able to decypher them -- but they will actually have more and better techniques to do so.

Encryption is a different problem, but it also has a similar curve. If they can read the bits in 100 yrs, AES-256 may actually be trivial to brute force.
posted by smidgen at 2:51 PM on March 2, 2014


What problem are you trying to solve? Stone age people should be able to watch Hot Tub Time Machine? Total loss of knowledge of programming languages?

I know, when have civilizations ever lost or turned their back on the lights of knowledge or art? What we would even call an age like that anyway?

Plus, I mean, how likely it is for a civilization to fall? When has that ever happened on this planet? And if it did happen, why would any subsequent civilization be interested in what a previous civilization has produced?
posted by entropicamericana at 2:51 PM on March 2, 2014 [6 favorites]


idiopath: patents at least expire on a relatively manageable scale (20 years) but copyright is the real snag (95/120 years for corporate works, life + 75 for individuals) particularly because the DMCA has a huge chilling effect on the tools. These days you could probably build enough of a clone to emulate the first 1995 DVD player but the software which you'd need to break the CSS-encrypted content on your DVD image would likely be seen as illegal to build or distribute in the United States.
posted by adamsc at 2:57 PM on March 2, 2014 [1 favorite]


(and, to clarify, I'm completely aware that things like DeCSS started a wave of free software tools to play DVD content – the question is institutional liability: if you build a preservation program around a tool which is legally contested in your country, you have to worry about your reputation if you get sued, what happens if the major copyright associations successfully sue an open source project offline or whether any of your staff can contribute to it or even tell other people to use it)
posted by adamsc at 3:01 PM on March 2, 2014


Whenever I get a new hard drive I make a folder called "Old HDD" and copy the whole last one over. If I go deep enough, I still have freshman-year HS homework assignments. And I'm not even trying.
posted by save alive nothing that breatheth at 3:01 PM on March 2, 2014


save alive nothing that breatheth: I assume you don't make many movies, or even music?
posted by idiopath at 3:05 PM on March 2, 2014 [1 favorite]


save alive nothing that breatheth: out of curiosity, how often have you tried to read those old files? Silent corruption is a significant long-term risk unless you're using some sort of tool to periodically check that the content matches what you originally stored.
posted by adamsc at 3:09 PM on March 2, 2014 [1 favorite]


Lest I be accused of being too much of an optimist -- I do worry significantly more about reading the bits than interpreting them. I copy from the previous generation of media to a new set every year, but it still feels rather fragile to me.

The cloud is not feasible yet because, even for my relatively meager needs, it would take a week to upload, and I have no good way of verifying that the company whose servers I'm uploading it to wouldn't steal it for their own purposes, corrupt the data or just simply go under. Maybe that will change in the future -- but not anytime soon, given the way the broadband market is going and what's been happening to "cloud" services as more and more bad actors realize the gold mine they represent.
posted by smidgen at 3:11 PM on March 2, 2014


Oh, and re: codec obsolescence: if you're aiming at very long-term archival, you could backup a VM that has the codecs.

But don't you then need to have an archive of hardware to run the VMs? It will be VMs all the way down.


Or implement the codec in something with a narrow spec; speed is less important than repeatability. In the past, one would have said JVM bytecode or Lisp, though these days, JavaScript has fallen into that niche (if it's good enough for the reference implementation of a transistor-level emulation of the 6502, it's probably good enough for the reference implementation of the decoder for Fast And Furious 17 3D).
posted by acb at 3:19 PM on March 2, 2014


Bits is bits. Get the files on spinning disks, forward migrate on a regular schedule.
posted by LarryC at 3:32 PM on March 2, 2014


Bits is bits. Get the files on spinning disks, forward migrate on a regular schedule.

And when the massive solar storm knocks out the electrical and magnetic systems globally, frying the circuits and zeroing the disks?

The problem we're trying to solve here, really, is how do you get those bits onto something that's not electronic - something physical. This isn't so impossible when you're archiving fairly simple, low-bandwidth data, but until we invent diamond holocubes or something it's hard to come up with a stable solution.
posted by Jimbob at 3:43 PM on March 2, 2014 [1 favorite]


And when the massive solar storm knocks out the electrical and magnetic systems globally, frying the circuits and zeroing the disks?

I would think that's a non-issue. HDD platters are faraday-caged inside solid metal, they're all but impervious to exterior magnetic fields, and surge protectors sit between them and grid (not that even an unprotected grid spike could manage much more than frying the driver board, unless it started a house fire)

At least one astrophysics Phd seems to concur.
posted by anonymisc at 3:57 PM on March 2, 2014 [1 favorite]


(But the geek in me would totally buy into a Rosetta Project durable artifact for the ages that stored the world's codecs and formats :) )
posted by anonymisc at 4:05 PM on March 2, 2014


Ok, well Mr. Smarty Pants, what if an ancient demon gets unleashed and destroys all media not carved into stone in cuneiform? How far will your precious bits get you then? What if in the untold eons to follow, humanity loses its knowledge of written language? That's why I'm doing my part to archive our culture in the only truly durable format-- oral tradition. Now if you'll excuse me, I have to get back to memorizing the script of Transformers: Dark of the Moon for future generations.
posted by Pyry at 4:06 PM on March 2, 2014 [14 favorites]


Or implement the codec in something with a narrow spec; speed is less important than repeatability.

Thus was the Turing Machine both the beginning and the final salvation of the Age of Bit.
posted by anonymisc at 4:16 PM on March 2, 2014


I think what people are snarking about here is that prioritizing some media for archival over other media actually does make sense. I agree with this argument, and don't really see how it works as it is apparently being applied, in rebuttal to the notion that we should try to save everything we can.
posted by LogicalDash at 4:46 PM on March 2, 2014


save alive nothing that breatheth: out of curiosity, how often have you tried to read those old files? Silent corruption is a significant long-term risk unless you're using some sort of tool to periodically check that the content matches what you originally stored.

Oh yeah I've found a messed up file or two - most fine, and as I said, not trying.
posted by save alive nothing that breatheth at 4:55 PM on March 2, 2014


And when the massive solar storm knocks out the electrical and magnetic systems globally, frying the circuits and zeroing the disks?

There won't be a civilization left to appreciate the media.
posted by Talez at 5:00 PM on March 2, 2014 [3 favorites]


Beam the bits into space. Go get the bits later when we can set up a receiver faster than the speed of light. Or trust some other civilization to archive them for us.
posted by surplus at 6:07 PM on March 2, 2014


how often have you tried to read those old files? Silent corruption is a significant long-term risk unless you're using some sort of tool to periodically check that the content matches what you originally stored.

This is what forward error correction is for. You can determine the actual corruption rate by sampling and know when your EC factor is approaching its horizon.

It is not true that all content is equally important so the correct answer is to tier it. For those truly fundamental items (Mote museum-level recover-from-TEOTWAWKI knowledge transfer events) then sure, some printed material. For the rest, appropriate migrateable digital storage with honest best-effort.

I think the limiting factor here is that archivists want something they can touch and there is a not-unappreciable fetishizing of film.
posted by rr at 6:35 PM on March 2, 2014


If there's any fetishizing going on, it's the fetishization of digital technology and newness for newness's sake.
posted by entropicamericana at 6:41 PM on March 2, 2014 [4 favorites]


The rapid change in file formats seem particularly vexing. As far as I can tell, having the full-boat Adobe CC subscription, nothing in their current suite will touch a .p65 document, which was created in one of their own, now ancient, products.

Likewise, I'm not sure I have ready access to open an old .fh7 document, if I wanted to.

I cannot imagine that this would not be a profound issue as studios switch from one proprietary format to another, especially for things like 3-d rendering and the like.
posted by maxwelton at 7:16 PM on March 2, 2014 [2 favorites]


how exactly can one get into film preservation as a career?

Not that I'm bitter, but I made a valiant attempt to get into a position as a motion picture preservation specialist at NARA just last month, and it's pretty goddamned difficult to get through the paint-by-numbers multiple-choice USAJOBS.gov application process in order to make a case to an actual human being for how my twenty-year career as a microfilm archivist and preservation specialist with strong audio skills would be a good match for the work, but alas, 'twas not to be.

Well, yeah, actually I am bitter, because I did the preservation work on the microfilm from Birkenau and I made damn sure that no name on that film was forgotten. Stupid credentialist inflexible hiring process…

Still, there's a perfectly easy way to make sure films don't get lost.

1. Print a three strip, black and white (R, G, B) copy of the film on 70mm stock. Make three or four first generation sets of prints, and store them in libraries on three or four continents.

2. Print an optical backup of the film's digital data to the same kind of film, in a machine readable code that can be enlarged with nothing more than a garden variety lens, with complete instructions for decoding printed to the film every hundred feet or so. Store it in libraries on three or four continents.

3. Hire custodians of film that are as experienced, passionate, and detail oriented as the monks that kept humanity's literature alive during the dark ages. Pay them well, make their job a valued, celebrated, and largely budget-proof occupation, and avoid anyone prone to the magical thinking that's seemingly so inherent in boosters of things digital. Archivists are by nature conservative, skeptical, and a bit jaundiced in their view of the latest and greatest, and the best of them all are downright dogged when it comes to placing longevity without intricate systems of maintenance trusted without oversight over methods that have already worked.

But what do I know?



Damn NARA. Grr.
posted by sonascope at 7:49 PM on March 2, 2014 [10 favorites]


Beam the bits into space. Go get the bits later when we can set up a receiver faster than the speed of light. Or trust some other civilization to archive them for us.


Great. That'll be some comfort when the New Beckyists and the Old Beckyists finally trace our transmissions back to their source in an effort to resolve their intractable, generations-long galactic conflict.


Not to mention what might happen when we fall under the malign influence of the Unified Tarantino Heresy
posted by TheWhiteSkull at 8:22 PM on March 2, 2014 [1 favorite]


The very first commercial CD I ever purchased, Yes - 90215, back in its first release wave in 1983, will not read in any player I try it in, and hasn't for about 5 years now. For years before that it would read with errors which introduced significant flaws during playback.-- hippybear

I wouldn't use early CDs as a point of reference. Many of the manufacturers didn't really know what they were doing at first. Some of the first CDs wouldn't even last a year, or even had failures when brand new. Thus CDs started out with a really bad reputation for reliability. These problems were fixed.

I've heard that stored pressed CDs (from reliable manufacturers) should last at least 100 years, and some say many times that amount of time.
posted by eye of newt at 9:02 PM on March 2, 2014


longevity without intricate systems of maintenance

Transcribe to emoji and inscribe on clay tablets.

Seriously though, if the inscription was a tiny facet with 9 possible orientations (one byte), at one micron spacings, (tiny I know, but might be possible with a fine ceramic) then that's a gig per square meter, enough for a rough mpeg at least.

Bigger dimples, and a few more tablets, might still be practical. Not that much more volume than the 35mm film reels. The compression would probably be better than celluloid degradation.
posted by StickyCarpet at 9:05 PM on March 2, 2014


Everyone here seems to have forgotten that printing a movie onto film was a job for artists. It involved making decisions about color, exposure, grain etc for each scene.

Printing a digital movie to film would be transformative and not archival, unless you simply print the bits as bits onto monochrome film.
posted by monotreme at 9:12 PM on March 2, 2014


It will all become irrelevant once science fully understands and duplicates the way the brain records and stores information. Using this, they will create a massive synthetic 'brain', (or perhaps an ever expanding brain farm) that stores information perfectly and efficiently, replacing all data servers on the planet, and eventually the internet. Since the mechanics of the brain's data transmissions will be understood, people will be able to access the Brain Farm directly via uplink, since all the information is already in "brain" format, an archive of all human knowledge, in addition to the minds of all people connected to it, we'll have accidentally created the Singularity.

Or the Borg, i dunno, either way it'll be pretty cool.
posted by Uther Bentrazor at 12:00 AM on March 3, 2014 [2 favorites]


The amount of information we produce each year is growing exponentially. Film is just a small part of that.

In 100 years time, the difference between:

a) All of 2014 films being available
b) just a tiny select few of 2014's films being available

... will itself be negligible. Because both (a) and (b) will be fractionally tiny compared to the amount of media/info being produced in 2114*.

* bug assumption here is that we don't destroy our biosphere and hence our civilisation.
posted by memebake at 5:44 AM on March 3, 2014


Or to put it another way:

Imagine every single film made in the 20th century had been painstakingly preserved in a big warehouse. Would anyone be that interested? Probably not. The classics would get watched a fair bit, but the long tail would mostly be ignored. But on the other hand, people are prepared to try and recover old films precisely because they weren't well preserved. Its interesting.

Similarly: Trying to archive all our 2014 cinema output is a boring task. Don't bother. Let it go obselete and get lost, and that'll give people in 2114 something interesting to do. Codec archaeology, messing with old hardware, that sort of thing. It'll keep lots of people busy and interested.
posted by memebake at 5:50 AM on March 3, 2014


Anyone have experience with M-DISC? Encoding into stone is pretty compelling.
posted by effugas at 5:52 AM on March 3, 2014


at some point you're going to be looking at a matryoshka VM setup with multiple layers of emulation so you can e.g. run Windows XP to watch a late 90s Real Video file
MPlayer understands RealAudio and RealVideo codecs (and RealMedia container files), and it runs on every modern operating system, and because it's open source it runs on a whole lot of niche operating systems too and it's not going to be too hard to port to any sufficiently open future operating systems. This hypothetical matryoshka setup can probably skip the "I have an emulator for a system that has an emulator for Windows XP" requirement so long as it manages "I have a C compiler" instead.
posted by roystgnr at 6:04 AM on March 3, 2014


roystgnr: some of mplayer's codec support comes from copied dlls from Windows systems. I think the Real codecs are among those.
posted by Pronoiac at 7:05 AM on March 3, 2014 [2 favorites]


All this talk of film, but what of Video? We still have the earliest television shows because they were filmed off of a TV screen. But what about those that went directly to tape? Even NASA lost the original footage of the 1st moon landing, but when a copy was found only one machine in the world still existed that could play it, and it was in a museum. I have open reel 3/4" video made in the early 70's and late 70's that I cannot play because I can't find anyone with an old Sony reel video deck. However Super 8mm film made at the same time, of the same subject is easily accessible.
posted by Gungho at 7:13 AM on March 3, 2014 [2 favorites]


There's also the wrinkle that classics and the Canon are affected by what's around and available.

If Hot Tub Time Machine does well as an available movie, it may be more discussed in 40 years.
posted by Lesser Shrew at 7:46 AM on March 3, 2014 [2 favorites]


I've noticed recently that some 'legacy .rm' files aren't properly being decoded by mplayer now. I should look into it more, but while mplayer is wonderful it's not a panacea.
posted by mikelieman at 8:27 AM on March 3, 2014


Good callout, Pronolac. I had just assumed that because .rm files worked in mplayer on Linux for me that it was supported in source code, but I'd forgotten that RealPlayer had a Linux port from which binaries might have been snagged. It looks like there's Windows, Mac and Linux shared libraries for those codecs, which mplayer uses when available...

But I don't seem to have those installed, and I'm still able to play RealVideo (just downloaded a RealVideo 3.0 clip to double-check)... ah, mplayer says it's falling back on ffmpeg, whose website says it does have source code for decoding RealVideo 1.0 through 4.0. The world's archived RealVideo should be safely accessible for the foreseeable future, in the same atrocious quality as the day it was first compressed.
posted by roystgnr at 10:03 AM on March 3, 2014


here's a reel in "stage 5 deterioration" that looks like it's turning into a pancake.

The sad thing about that image is that the film has a plastic core on the reel. That implies that it was in good enough condition to be unspooled and respooled sometime in the last few decades, and that the deterioration happened relatively recently, not back in the 30s or 40s.
posted by Kadin2048 at 10:26 AM on March 3, 2014 [1 favorite]


roystgnr: it's likely that there is open source code somewhere for many of these files but there's an interesting question of whether that code will be kept around in the future. Software which isn't maintained tends to break and it's easy to imagine that a few years from now someone will make a big change and drop some older niche codecs if nobody steps up to maintain them. The code is around somewhere but building it becomes non-trivial if you have to chase down old versions of dependencies, deal with constructs which newer compilers don't accept, etc. Either path of this adds barriers to for anyone who doesn't have access to a software archaeologist.

It's not a huge risk versus the formats which were only accessible using proprietary DLLs but this is why I wish the legal situation was more clear for preservation. I'd hope ffpmeg is in the clear on antique Real codecs but suppose the only copy you had access to was DRM-ed — if you're in the United States you could have the data, source code and a technical specialist but still be banned from working on the problem because your institution doesn't want the legal risk of having to fight a DMCA case should someone sue.
posted by adamsc at 10:46 AM on March 3, 2014


The simple fact that it is layered means it can and will ultimately fail.

I don't get this, unless you mean that the simple fact that it is a physical thing means that it will ultimately fail. Film is layered, magnetic tape is layered, images on paper are layered.

The very first commercial CD I ever purchased, Yes - 90215, back in its first release wave in 1983, will not read in any player I try it in

eye of newt's comments about the general quality of early disc manufacture is correct, so I also wouldn't damn the medium from one anecdote.

CDs and DVDs are susceptible to different and similar types of degradation that will result in uncorrectable errors and ultimately playback problems. (Recordables are not worth discussing in a thread about archiving.) CDs (and Blu-Rays) have the data near the surface of the media, while DVDs are more like a data sandwich. Because of this CDs (BDs less so) are much more susceptible to oxidation of the reflective layer than DVDs. All formats are susceptible to physical problems including scratches and warping, although as the data density increases so does the sensitivity to parameter deviation. There are many other failure modes of optical discs, but from a purely physical standpoint I think the optical media would outlive the playback hardware. (I.e. if you packed then both in a climate-controlled cave today the hardware would fail before the disc. Probably right when you turn it on.)

And Yes, replicated discs ("pressed" is a misnomer) have been made with gold as the reflective material. Silver and aluminum are most commonly used today. A recent Michael Jackson release used copper because it looked like gold.
posted by achrise at 11:16 AM on March 3, 2014


Regarding codecs and formatting issues:

One of the things I like to do, when I burn images to CDs for my personal archive is put a copy of a codec for the format in question on the disc with the images. Sometimes in addition to actual runnable decoder software.
  • For JPEGs, NanoJPEG is very small and has no external dependencies AFAIK. It will do baseline (not progressive) greyscale and YCbCr, 8-bit files, which is what most cameras produce. It's pure C, takes in JPEG data and outputs PPM, which is an image format so basic it's actually human-readable (but very, very verbose and inefficient). You can toss a few copies of both the C source and the W32 executable on a CD-R and never notice the space it takes up.
  • For "camera RAW" files and DNGs, you can include dcraw. It's also a standalone C program that takes in a variety of raw-CCD camera formats and outputs either PPM or TIFF. It's platform-agnostic and can be compiled with GCC.
  • And for TIFFs, LibTIFF is a good bet, as long as you aren't doing anything stupid on purpose. As a container format, TIFF gives you a lot of rope with which to hang yourself, if you so choose. But if you stick to a really basic uncompressed or LZW-compressed, 8-bits per sample RGB images, it's pretty damn readable. Bonus points if you put a text file describing the format alongside the images. (That, in my mind, makes up for the fact that I'm not really sure how portable LibTIFF is.)
Of course, if you have your images stored in some ultra-obscure format that dcraw won't read, then you're in trouble: your best path is to convert them to something reasonable (like DNG, which is to TIFF what PDF is to PostScript). And there is the possibility that in the future you might have trouble compiling what we regard today as platform-agnostic C code. But at some point you have to draw a line and decide what your assumptions are going to be (civilization exists, computers exist, GCC-style C compilers exist, etc.).

My area of interest is photography but I suspect you could do similar things for video, with some forethought. Basically you just want to make sure that your video can be decoded with ffmpeg, and then put that alongside the video recording itself. If you can't decode the format with ffmpeg today, then it's probably worth converting it to another format today before storing it, while the decoding tools are all available.

ffmpeg is admittedly not nearly as portable and has a lot more dependencies than dcraw and other photographic tools, though. So for edge cases like RealVideo format, which ffmpeg plays today but may not play in the future, it probably still makes sense to convert them to a widely-accepted standard format like MPEG-4 ASP Level 5, which are unlikely to be dropped at any foreseeable point in the future.
posted by Kadin2048 at 11:34 AM on March 3, 2014 [1 favorite]


So in conclusion:

Digital media has to survive a catastrophe of cataclysmic proportions that would wipe out all human knowledge and we assume that film will have an underground order of monks that will magically keep the film in a climate and humidity controlled vault during this same upheaval.
posted by Talez at 12:36 PM on March 3, 2014


They're called salt mines.
posted by entropicamericana at 12:59 PM on March 3, 2014


New band name: Salt Mine Morlocks
posted by achrise at 1:07 PM on March 3, 2014 [1 favorite]


Has Google ever lost a byte? Seriously, I don't think they lose bytes, do they?

So that's a whole lot of archival going on there. If they can archive the world's cat videos, surely archiving the film industry's output is a no-brainer.
posted by surplus at 1:19 PM on March 3, 2014


Has Google ever lost a byte? Seriously, I don't think they lose bytes, do they?

It's my understanding Gmail pretty regularly loses random emails in the bottom of people's email archives, so that would be a 'yes'.
posted by immlass at 1:25 PM on March 3, 2014 [2 favorites]


I'm pretty fascinated by long-term archival, and robustness in the face of bit rot. I daydream about architecting a system with redundancy across disks, then computers, then data centers, then continents.

And over time, gee, I hope x86 doesn't totally disappear. I'd rather keep Linux packages around than source. From what I understand, it seems like *compiling* isn't that robust over time either - new versions of build software and compilers and I don't even know. Maybe Gentoo would exercise those build suites and provide a decent time capsule?
posted by Pronoiac at 5:28 PM on March 3, 2014


It's not the bits rotting that you need to worry about. It's understanding what the bits mean.

It's been less than 60 years since the creation of FORTRAN, and less than 80 since the very first prototype Turing-complete general purpose computers. There have already been some embarrassing high-profile cases of carts left in storage so long that the last viable readers had been scrapped, in a couple of cases spawning heroic recovery efforts. Right now the window for that sort of thing is the 1970's. Many very popular formats of that era are now impossible to read, perhaps the most ubiquitous being the "layer cake" removable platter mainframe hard disk. The hardware to read those is very mechanically elaborate and exotic and unlikely to ever be duplicated. Tapes are easier in general because they're tapes, but still not trivial even if you have a dead reader from which to salvage compatible multitrack heads.

What are your chances today to read even very popular tape or disk formats such as the Osborne I, Commodore VIC 20, TRS-80, or even the relatively popular and widely retro-supported Commodore 64 and Apple ][? Such retro support is increasingly dependent on migrations of tapes and discs to more modern digital formats, so how long will it be until it will be all but impossible to find someone to read a long-forgotten actual physical disk or tape?

And the thing is, supporting those obsolete physical formats is easier than supporting a forgotten digital chip format. At least you can see how the media physically work and maybe hack on some physically compatible reading hardware. Had a look at the comm protocols for a SD card or flash chip lately? Let 50 years pass and the part number got worn from the media, what are your chances of finding the docs or working that out by trial and error?

What worries me is the increasing complexity of even the simplest API's for communicating with storage media. Simply keeping track of the file system on a SD card in the most basic way necessary to read linear files requires a program which wouldn't have fit in the first personal computer I owned at all. There is legacy cruft which makes the required solutions non-obvious. It's nice that SD cards have a simple SPI comm mode which can be bit-banged by relatively simple controllers, but while most micro-SD cards implement it (because why develop a different controller chip I guess) SPI isn't actually part of the micro-SD standard. Even without DRM we're rapidly approaching a state where common media will not be accessible by hardware that does not have hundreds of thousands of man-hours invested in ensuring compatibility.

It really isn't necessary to posit the collapse of civilization to see the media on today's iPods being unreadable in a generation or two. All it takes is a bit of callous indifference.

And while you might think some things are so ubiquitous they can never pass, you might want to test that by looking for an actual Atari 2600 game console. At 40 million units plus sold it remains the most popular single computing platform in all of human history. But if you can find one for sale -- they do pop up on eBay still -- you'll find it more expensive than much more recent consoles. Because of its very primitiveness most units were discarded when better alternatives became ubiquitous. Now most of them are in landfills and the survivors are a conoisseur specialty. If the original sale run had been 400,000 instead of 40,000,000 only museums and a few lucky collectors would have functional units now.
posted by localroger at 7:09 PM on March 3, 2014 [4 favorites]


The creators of the Golden Record had the right idea.
posted by sonascope at 1:49 PM on March 4, 2014


Let 50 years pass and the part number got worn from the media, what are your chances of finding the docs or working that out by trial and error?

Given the number of proprietary things, the backwards/forwards compatibility of a lot of hardware and that a lot of interfaces remain electrically the same you'd be in a pretty good position right now.

And while you might think some things are so ubiquitous they can never pass, you might want to test that by looking for an actual Atari 2600 game console. At 40 million units plus sold it remains the most popular single computing platform in all of human history. But if you can find one for sale -- they do pop up on eBay still -- you'll find it more expensive than much more recent consoles. Because of its very primitiveness most units were discarded when better alternatives became ubiquitous. Now most of them are in landfills and the survivors are a conoisseur specialty. If the original sale run had been 400,000 instead of 40,000,000 only museums and a few lucky collectors would have functional units now.

There are millions of cheap chinese "a billion in one games" units which are basically a 6507, Stella, TIA and 6532 slapped together in with a ROM chip holding every VCS game ever released. The games themselves aren't going anywhere. The real hardware on the other hand with it's "timeless" wood veneer is what fetches the larger $$$. Besides, we've worked out emulation to the point where physical hardware has become largely irrelevant. Most of the shit is already archived and catalogued within the community at large and the problem has almost entirely been solved. Sure if you want to split some hairs and get pissy you can argue that not every single thing will work on an emulator/ROM combo.

Keep in mind that A New Hope, god damn Star Wars, was so degraded through stupid film stock choices and storage methods that they had to recompose the entire thing digitally anyway. And that's "shine a god damn light through the film" level of technology simplicity.
posted by Talez at 6:00 PM on March 4, 2014


Keep in mind that A New Hope, god damn Star Wars, was so degraded

Star Wars, the film I saw in 1977, is lost in no small part thanks to George Lucas himself. It was not called "A New Hope."
posted by localroger at 6:13 PM on March 4, 2014 [4 favorites]


How many instances of violating the DMCA per document?

Chilling Effects DMCA Archive is “Repugnant”, Copyright Group Says
posted by homunculus at 8:18 PM on March 16, 2014 [1 favorite]


« Older Random Teleporter.   |   A Vampire is a Flexible Metaphor Newer »


This thread has been archived and is closed to new comments