Skip

I can't forget that I'm bereft of all the pleasant sights they see
January 26, 2009 10:41 AM   Subscribe

British Library warns of 'black hole' in history if websites and digital files are not preserved. "Historians of the future, citizens of the future, will find a black hole in the knowledge base of the 21st century." In addition to dead file formats and lost information from government websites, Lynne Brindley also points to the habits of individuals. "I call it personal digital disorder. Think of those thousands of digital photographs that lie hidden on our computers. Few store them, so those who come after us will not be able to look at them."
posted by cashman (63 comments total) 11 users marked this as a favorite

 
via LibraryStuff.
posted by cashman at 10:42 AM on January 26, 2009


Think of those thousands of digital photographs that lie hidden on our computers. Few store them, so those who come after us will not be able to look at them...

For most cases, I'm not so sure this is a bad thing.
posted by jquinby at 10:47 AM on January 26, 2009 [1 favorite]


On the one hand...they do have a point. But on the other...

...Okay, are we SURE web sites are worth preserving? A lot of the government files are also on paper copies, I believe (they START on paper and are uploaded to the web site only after, I think), so there's the backup. As for private files -- ....well, I think the ease in internet web page creation has just lead to a change in the signal-to-noise ratio, to the point that I wonder whether someone would have to read through each web page and deem it worthy or unworthy of preservation -- and that'd take a hell of a long time.
posted by EmpressCallipygos at 10:48 AM on January 26, 2009


Is this like an 8" floppy disc? Gone, lost forever; until a move, and then it is found. Mildewed, stuck in place, covered in unknown stains; lost forever. Oh tears!
posted by buzzman at 11:00 AM on January 26, 2009


"Historians of the future, citizens of the future, will find a black hole in the knowledge base of the 21st century."

[insert goatse reference here]
posted by joe lisboa at 11:03 AM on January 26, 2009 [1 favorite]


[redact reference to "insertion," above]
posted by joe lisboa at 11:03 AM on January 26, 2009 [5 favorites]


Would these be those same British that claim you can catch the fat disease from sharing an elevator with the wrong person?
posted by mannequito at 11:10 AM on January 26, 2009 [1 favorite]


This is actually something I'm quite interested in. One of the better primary sources of information on the day to day lives of individuals has been personal letters. With the move to telephones, internet, and email, a good number of these private conversations between individuals are lost to future historians. There may be alternative means of capturing this information, but we haven't sorted it out.

And the point of preservation isn't really saving what we deem worth saving, it's to save what we can and let the future sort it out. A lot of junk lies in your local archives, but that junk might mean something to someone later on.
posted by teleri025 at 11:11 AM on January 26, 2009 [1 favorite]


Screw history. What about us right now?

When I worked at a big research university a few years back, the department of immunology had archived data to magnetic tape from some AIDS medical studies from the 1980's. Unfortunately, there is no working machine or manufacturer of machines that could read this sole archive copy of the data.

That data is already in the black hole.
posted by YoBananaBoy at 11:25 AM on January 26, 2009 [1 favorite]


So?
posted by notyou at 11:31 AM on January 26, 2009


Those same British that claim you can catch the fat disease from sharing an elevator with the wrong person?

The wrong class of person, the sort who might use the barbaric word 'elevator' rather than the proper 'lift.'
posted by rokusan at 11:34 AM on January 26, 2009 [2 favorites]


Evidence of big historical events is also being wiped out as website alter content or shut down, Mrs Brindley added. A major example was last week’s handover from President George W. Bush to Barack Obama. All traces of the Texan’s rule disappeared from the White House website

Hmm.

I'd actually make the opposite argument: we're already more or less past the black hole period, when data could easily get stranded in obsolete physical formats, and entering the networked age where it's much easier to move data into large, redundant, effectively permanent storage. Even if a particular data format goes into disuse, it's a lot easier to write new software to read an obsolete data format than to reinvent a physical device to read an obsolete physical storage format.

I have a stack of "magneto-optical" disks (which was the hot new format for about forty-five minutes, before it became completely forgotten) and 3/4" mag tape containing most of my graduate school work, which it's already financially infeasible to rescue. Gone. If the internet had existed back then, that data would still be sitting on a server someplace, handled by pros who would be smarter than I was about keeping their physical infrastructure up to date.
posted by ook at 11:36 AM on January 26, 2009 [3 favorites]


You can't simply dismiss this issue by claiming much of what is on the internet isn't worthy of being saved. Were 1970's Sears catalogs or Betty Crocker cookbooks worth saving? They both were the subject of fpps here, so there is some interest.

The fact is that it is all worth saving because What new technology allows us to do is improve the SNR of past datasets. Maybe there was something worth preserving that was overlooked at the time, but which now can be rediscovered.

Another more pressing issue is that there is currently no internet rollback facility, at least not that I'm aware of. If would be very useful to browse the internet as it existed on a given date and time, e.g. set the clock for Sept. 10, 2001, and then browse to www.metafilter.com or news.yahoo.com and see those pages as they were on that day.

More importantly, this kind of facility lets you see what disappears or gets erased over time. From this a future researchers could study why some blogs last years and others don't or why some ideas or memes propagated and others didn't.

Perhaps the data requirements are too large for something like that now, but it would be a very useful tool.
posted by Pastabagel at 11:38 AM on January 26, 2009 [3 favorites]


Couldn't they just pay some guys to make up history, like they do now?
posted by weapons-grade pandemonium at 11:39 AM on January 26, 2009 [2 favorites]


three guys, in a salt mine, lots a index cards, working part-time
posted by clavdivs at 11:54 AM on January 26, 2009 [3 favorites]


I'm pretty sure the NSA has this covered. Just make sure you transmit everything over the internet at least once.
posted by nanojath at 11:56 AM on January 26, 2009 [2 favorites]


I have a stack of "magneto-optical" disks (which was the hot new format for about forty-five minutes, before it became completely forgotten) and 3/4" mag tape containing most of my graduate school work, which it's already financially infeasible to rescue. Gone. If the internet had existed back then, that data would still be sitting on a server someplace, handled by pros who would be smarter than I was about keeping their physical infrastructure up to date.
posted by ook at 2:36 PM on January 26


The same could be said of countless dissertations from the 60's that are nothing but punchcards in a shoebox somewhere. And in a few years we will be saying this about VHS.

But in theory it is possible to preserve these things by keeping some units of every media reading device in some working order.

But in some respects, the problem isn't merely format obsolescence. What if the content to be preserved only exists in hardware?

In many cases, entire classes of devices are disappearing taking along with them whole chapters in the history of science and technology. For example, if video games are expected to sit alongside film and music in the pantheon of "worthwhile" artistic media, wouldn't you say there is a need to preserve that industry's early history? Not simply the software, but the machines themselves. We all know what Pong or Space War looks like, but most of us have never seen the actual original Pong or Computer Space video arcade machine, and fewer of us still have any clue about how it works. The popularity of Pac-Man is recorded in periodicals and books from the time of it's presence in arcades. But Pac-Man isn't on a disk or removable memory format. It's sitting in ROM chips soldered into the boards of the arcade machines. It also exists in PC versions or nostalgia reissues, but are those the actual game, or merely simulations of the game? Does an Adobe Flash version of
Pac-Man tell you anything about how Pac-Man was written/programmed/built in 1980?

Underground projects like MAME and MESS are very important in this regard, because they seek to model the increasingly rare and rapidly deteriorating hardware of those kinds of machines in software (the fact that games run on that virtual hardware is a side effect). But in the case of Computer Space, no conventional hardware emulation is possible, because the machine uses not mcroprocessor or microcontroller, instead being cobbled together with TTL chips and diodes. So there is no "software" to speak of -- the circuit is the software. You either preserve the hardware (or the schematic) or you preserve nothing.

MESS is a particularly interesting project as it seeks to emulate not just arcade machines, but electronic chess games, calculators, and early computers. It is, in effect, moving the physical structures of the machiens into a more permanent format that can be easily carried forward into the future.

If so much time was spent by people in that era with devices like those, it is worth the effort to preserve them. Likewise, if it is worth preserving a copy of Look magazine from 1952, it is worth preserving a blog that was moderately popular for a few years.

I suppose that in the end some things can't be saved and they disappear from everywhere save human memory, and in time from even there. In the grand scheme of things, maybe a blog that has only one reader isn't worth saving as something of cultural importance, but it is sad to think that that connection between a writer and his sole reader, which was undoubtedly very important to both of them, will ultimately disappear and be forgotten.
posted by Pastabagel at 11:58 AM on January 26, 2009 [4 favorites]


They aren't kidding. All the poetry I wrote in high school is on 5&1/4" floppies. I wrote that stuff using Deskmate - is there an emulator for that?
posted by gyusan at 11:59 AM on January 26, 2009


Donate to the Internet Archive.
Um, that's not really meant to be an imperative.
posted by Lemurrhea at 12:03 PM on January 26, 2009 [1 favorite]


Pastabagel: though archive.org is incomplete, I hope you do know that their wayback machine is designed to do what you are describing?
posted by idiopath at 12:07 PM on January 26, 2009


Frankly I'd like to see the Internet Archive spidering more sites rather than hitting the biggies daily... I can't count the number of times I've needed to go dig up an obscure historical/technical item located on Geocities or Angelfire, but the site's gone and Internet Archives shrugs with "Sorry, no matches."
posted by crapmatic at 12:10 PM on January 26, 2009 [1 favorite]


Many early websites from before 1999 are missing from the Internet Archive. Do you know what the original WWW Project homepage looked like, or the NCSA What's New pages? Do you have some idea of where you'd want to look, if you were interested in knowing? Well, in both instances there's my website and at most one other mirror.
posted by shii at 12:11 PM on January 26, 2009 [3 favorites]


Pastabagel: though archive.org is incomplete, I hope you do know that their wayback machine is designed to do what you are describing?
posted by idiopath at 3:07 PM on January 26


I'm thinking more along the lines of setting your browser to a date, and then you just surf the web as you normally would, except that everything comes up as it would have on that date. I tried to mimic this behavior by choosing google.com in the wayback machine, but that seems to break it.
posted by Pastabagel at 12:15 PM on January 26, 2009


Software interfaces for obsolete data formats can be hacked. Depending on the format it may be more or less of a pain in the ass, but it can be done with relatively little work. Hardware differences are worse, but still not insurmountable. In the second place there's enough demand for data recovery on old formats to keep several businesses doing just that alive. Here in Amarillo, for example, there's a company specializing in business software that maintains a couple of old punch card machines because quite often that's the only format they can get the old code on.

What worries me is DRM.
posted by sotonohito at 12:18 PM on January 26, 2009


I admire what Wayback has done and it's the best we have but the solution could be much better, Wayback has a lot of holes. It would have to be a publicly funded service because of the cost, or perhaps Google with some sort of revenue generating model like Google Books. The longer we wait, the more time goes by and the stuff just disappears. Wayback started around 1995 or so.. the web started in 1993.. the first couple years are the most interesting, historically, and it's mostly gone.
posted by stbalbach at 12:23 PM on January 26, 2009


Not quite what Pastabagel was looking for, but the Ressurect Pages Firefox plugin can save a few clicks.
posted by benzenedream at 12:24 PM on January 26, 2009


Hey, where's Charlie Stross? He's got a whole book that sort of indirectly rests on this premise.
posted by hifiparasol at 12:25 PM on January 26, 2009


*hits print button*
posted by disclaimer at 12:26 PM on January 26, 2009 [2 favorites]


idiopath - it doesn't work that way. I don't know how they archive pages, but it seems they don't follow links within the same site. That, or they don't map pages that way. You can, with patience, find pages by searching for the linked pages and content, but it's not as easy as plugging in a website and browsing the old version.
posted by filthy light thief at 12:26 PM on January 26, 2009


*sigh* This is the expected result of our current laws. Back when copying anything required a serious investment in infrastructure, 0, 14, or at most 28 years of copyright protection was considered sufficient.

Now that copying is trivial, but media life is *much* shorter, we have placed centuries-long restrictions on making copies.

Does that seem right to you?
posted by ChurchHatesTucker at 12:28 PM on January 26, 2009 [1 favorite]


teleri025, that's why I print all my text messages.

Actually, I did start to transcribe the interesting ones into a text file. Then I got a USB cable for my current cellphone and BitPim, and it's all archived with ease. I haven't really looked into printing these out, but that is probably an option.

My work still relies on microfiche to archive important information. Mind you, everything must be printed first, then transferred to microfiche.
posted by filthy light thief at 12:32 PM on January 26, 2009


Ugh. This is just a bunch of paranoia. The mysterious "magneto optical" drive that ook was talking about? You can get a USB 2.0 MO drive for $289. Not cheap but not outrageously expensive.

5.25" disks and 3.5 inch drives are easy to find, they still work in modern PCs. Text formats are easy to reverse engineer, etc.
posted by delmoi at 12:34 PM on January 26, 2009


maybe a blog that has only one reader isn't worth saving as something of cultural importance

Well, Pepys' diary only had one reader. The fact is, we have no way of knowing what will be of interest to future historians.

I keep meaning to get a 3.5" disk reader to make a master-backup of all my old floppies. There's a lot of writing on those. I really should stop mucking about and get on with it.

One bit of the record that we're going to lose almost 100% of: faxes. I kept some old faxes, and they were dust and illegible brown crumbly tissue when I finally threw them away. That's essentially all my letters from my Father when he was abroad: gone. (Unless he kept the originals, and he's as much a squirreler as I am so it's not impossible.)
posted by WPW at 12:40 PM on January 26, 2009 [1 favorite]


My father is obsessed with this. He's been digitizing old family photographs for years and is terribly concerned that there will come a day when they will be totally unreadable. His latest attempt to assure data immortality - he burned a DVD of his latest photo series and sent about ten or twenty copies around to members of the family, including me. He included a note explaining what was on the disk, as well as a hope that at least one recipient would maintain the data and update it to new file formats as required.

It kind of reminds me of those propeller seeds from oak trees.
posted by backseatpilot at 12:46 PM on January 26, 2009 [3 favorites]


delmoi: "5.25" disks and 3.5 inch drives are easy to find, they still work in modern PCs. Text formats are easy to reverse engineer, etc."

When I went to restore my 5.25" and 3.5" about 50% were unreadable, the media had deteriorated. This is normal I since learned. I'm having the same problem with DVD+R, about 20% go bad within a few years. The only media that seems to last are hard drives. So now everything is backed up to hard drives (sitting un-powered on a shelf).
posted by stbalbach at 12:51 PM on January 26, 2009


The same could be said of countless dissertations from the 60's that are nothing but punchcards in a shoebox somewhere. And in a few years we will be saying this about VHS.

Absolutely. That's pretty much exactly my point -- I'd date the "black hole" from (roughly) punchcards until the widespread use of the internet. It's almost over. If (when) "cloud computing" becomes the norm, and individuals are no longer responsible for maintaining their own personal data on their own personal hard drives, that will be the absolute end of it. (My photos on flickr are a lot more likely to survive as part of the historical record than the ones on my personal hard drive, and a lot more likely than the old print ones sitting in a shoebox in my closet.) Data storage will grow cheaper and cheaper, and the data itself further and further from any specific physical manifestation.

No, we won't save all data for all time. We never did: paper burns, clay dissolves, stone tablets break. Alexandria burned. But the very fact that it's now conceivable that we could store everything...

Basically I think the British Library is dead wrong on this one. I think a far greater percentage of today's information will survive than that from any previous era.

Indexing and retrieval will be the real problems. The supply of data, not so much.
posted by ook at 12:51 PM on January 26, 2009 [2 favorites]


You can get a USB 2.0 MO drive for $289.

Holy shit. Thank you!! (Seriously, I went hunting for one of these about five years ago, and it was as if the format had never existed.)
posted by ook at 12:53 PM on January 26, 2009


These panic stories come out every so often, and they miss the point completely. Archiving anything isn't simply about boxing it up and unearthing it in 20 years. With paper records, you have to file them properly, and then actually check up on them occasionally to be sure that they're not infested with bugs or fungus.

With electronic media, it's analogous. Proper archival procedure would be to back up the data in a logical file structure, and then periodically check the disks to make sure things are still good. For personal computing, hard drives double in size faster than data formats bloat. Properly archived, your photos, papers, and videos can be copied from your old computer to a tiny sliver of the space on your new computer. Text documents can be read, photos in formats like GIF or JPG or PNG can be viewed, etc.

The problem is that instead of actually backing up the data and maintaining these backups, people copy something to DVD, and leave it in a box for years to get lost or degrade. With the networked capability of computers these days, there really shouldn't be an excuse for not having periodic backups of important information.
posted by explosion at 12:58 PM on January 26, 2009 [3 favorites]


I could go on a while here, but I actually have work to do here. And my job involves some of this stuff, I deal more with the public record laws end of it, nothing technical. The government org I work for is using ArchiveIt to capture government web pages.

This is a huge, huge question and problem for archivists and records managers. I know awhile back the San Diego Supercomputing Center was working with the National Archives on some massive email saving project. I'm not sure where that stands.

A local university is developing an app that will convert a former governor's emails and electronic records to .txt files.

A lot of organizations are going toward PDF or PDF/A to save electronic documents long term. With the PDF readers being so ubiquitous that might be feasible.

A lot of people (news orgs particularly) want government to save EVERYTHING when it comes to email. That makes for a huge haystack to find that needle in.

I'm just kind of rambling and throwing some thoughts down, I'll shut up and get back to work (maybe).
posted by marxchivist at 12:58 PM on January 26, 2009


it is sad to think that that connection between a writer and his sole reader, which was undoubtedly very important to both of them, will ultimately disappear and be forgotten

To preserve all cultural memory for all time, we need Borges' librarian skills now more than ever.
posted by Blazecock Pileon at 12:58 PM on January 26, 2009 [1 favorite]


If the internet had existed back then, that data would still be sitting on a server someplace, handled by pros who would be smarter than I was about keeping their physical infrastructure up to date.

Until they decided to erase it because
1. not enough disk space on the current server
2. not enough money to buy new server space
3. no one has accessed the data in more than a year, so it is obviously obsolete
4. the user account has expired

need I continue?
posted by Gungho at 1:03 PM on January 26, 2009 [3 favorites]


Huh, this feels familiar.
  • Charlie Stross for open data formats, & against DRM

  • Kevin Kelly on access vs. ownership for digital goods and services

  • Jason Scott's argument against trusting everything to the cloud


  • Having a personally appendable archive.org seems like a good idea. (deja vu) "I'm curious if a DIY distributed Internet Archive or github would be better for archiving, diffing, & annotating revisions."
    posted by Pronoiac at 1:17 PM on January 26, 2009


    1. not enough disk space on the current server
    2. not enough money to buy new server space


    Nah, not in the long term. I probably have preferences files now that take up more space than that old data. Moore's law will peter out at some point, sure, but not before today's data looks trivially small.

    3. no one has accessed the data in more than a year, so it is obviously obsolete
    4. the user account has expired


    Sure. Some data will be lost through neglect or deliberate decision. But anything someone actively wants to be preserved can be preserved now, with minimal effort. That's never been true before.

    Jason Scott's argument against trusting everything to the cloud

    That's less an "argument" than a "fuck tha man" tirade. I've personally lost a lot more data to hard drive crashes and personal stupidity than I have to "technogeeks giggling at me". If we're talking about safeguarding every single twitter and cat photo on the internet, sure, he's got a point. But if we're talking about a long-term historical record, then no, he doesn't.

    Going back to the "disappearance" of the Bush whitehouse.gov site: There are probably fifty different caches of the site squirreled away by various people for various reasons. For any even semi-significant event, there are thousands of people explaining it, discussing it, blogging it, arguing over it. Not all of those records will survive, and what does will be tangled and disorganized and will require a whole new field of digital paleontology to piece back together. But there will be a hell of a lot more of it left to piece together a thousand years from now than we have now of a thousand years ago.
    posted by ook at 1:39 PM on January 26, 2009


    Even if a particular data format goes into disuse, it's a lot easier to write new software to read an obsolete data format than to reinvent a physical device to read an obsolete physical storage format.

    It may be easier, but that doesn't mean its easy. Especially if you've no idea what format the stuff's in. I've got some PageMaker 1 and Quark v2 files that are murder to open nowadays, just 15-odd years after they were made. I'm not overconfident of getting my FinalWriter Amiga files back, either.
    posted by bonaldi at 2:12 PM on January 26, 2009


    If you live in Washington State we've got your covered.

    It is true that much will be lost, true that much of the loss will not matter, true that future historians will wail over our carelessness. But remember that we have only a very partial record of any era. We have some great 19th century letters but they represent only some small fraction of what was written. There are entire runs of hundreds of defunct newspapers where not more than a copy or two remains. The Great Library at Alexandria was burned to the ground.

    The loss of data makes the work of future historians more difficult but not impossible.
    posted by LarryC at 3:45 PM on January 26, 2009 [1 favorite]


    "All these visions will be lost to time... like tears in rain."
    posted by autodidact at 3:59 PM on January 26, 2009


    LarryC - For those Digital Archives, it's "if you're a governmental agency in WA, you're covered."
    posted by Pronoiac at 4:12 PM on January 26, 2009


    > I'd date the "black hole" from (roughly) punchcards until the widespread use of the internet. It's almost over. If (when) "cloud computing" becomes the norm, and individuals are no longer responsible for maintaining their own personal data on their own personal hard drives, that will be the absolute end of it.

    I agree with this to a certain extent: I think the situation is definitely looking better, rather than worse, as we move forward in the near future.

    However, what really is a major issue, worth discussing on both an individual and group (institutional or even national) level, is dealing with all the information that's been produced since the advent of mechanized storage/retrieval, up until very recently where it's started to seem like we've started to get a handle on some of the challenges involved.

    Right now, we have an opportunity to mitigate some of the effects of the "black hole" that's otherwise going to swallow up a lot of primary source material from the latter half of the 20th century. This is stuff that right now is sitting on punch cards, magnetic tape (whether digital tape used for computer storage, or analog tape used for audio or video recording), early floppy disks, non-archival photographic film and paper, or trapped in poorly-documented formats even if stored on modern devices.

    This is not a theoretical problem, it's one that exists right now. And almost by the day, lots of information becomes more expensive or difficult to retrieve. (And that really means it's less useful as a result. Information that is prohibitively expensive to access might as well be lost.) A few years ago, I had a devil of a time getting some old video recordings transferred from an early 1" open-reel video tape format; Betacam is already close to being that bad. VHS will take longer but it's going to get there eventually.

    All this talk of the coming dark age seems to miss the point: many of us are letting one, albeit perhaps a small one, occur right under our noses.

    Right now it's not a hard problem to avoid: photo and film scanners are cheap, video digitizers even more so. Chances are you still have a working VCR around, or know someone who does, and the tapes are probably still playable. In 10 or 20 years (maybe even less, depending on when you plan on tossing that tape deck), it will almost certainly be a lot more difficult.
    posted by Kadin2048 at 4:33 PM on January 26, 2009


    ...Okay, are we SURE web sites are worth preserving?

    Yes, actually. For one thing, we have no idea what will be interesting to future historians when they're trying to understand us.

    To offer a couple of concrete examples: We actually know very little about medieval life for ordinary people; ordinary people were not, by and large, literate. We don't even have a lot of descriptions of life in much of the medieval period from literate sources, because those had narrow ranges of interest. So, for instance, the heresy trials conducted against the Cathars are incredibly historically valuable, because they represent a large body of documents that describe, not just the Catholic view of the Cathars, but the lives of ordinary Languedoc peasants, and it's a relatively uncommon sort of resource.

    Shakespeare: there are only a handful of documents about the man. His works only survived because some of his friends and aquaintences were determined to have them recorded - and many of his plays have no real definitive version precisely because of that - he didn't write them down for posterity, and he didn't oversee the process. If we had some writing from some of his actors, or the guys who built the globe, or theatre goers, we'd be immeasurably enriched. Likewise, we don't have most of the work of contemporary notables, because most of them didn't have friends trying to record their work for posterity.

    Warfare: What motivated ordinary people to join the crusades? Who knows? They didn't write about it. They didn't know how, for the most part. How big were medieval battles, really? The only reliable guide - as opposed to official propoganda comissioned by the principals - are quartermasters' records.

    For that matter, many sources are needed just to have a chance of surving. So much of our knowledge of the ancient Greek states and Rome come through a handful of sources, and historians have to do a huge amount of work to piece the pictures we've built up; we almost lost all knowledge of Ancient Egypt, because manuscripts and buildings were being destroyed at a huge rate before we learned how to read hieroglyphs. These were all (relatively) highly literate societies with quite good record-keeping. For less literate societies we know fuck all. Who were the Sea Pirates? What do we know about ancient Germanic societies? Only what their more literate enemies told us, and have survived.
    posted by rodgerd at 4:40 PM on January 26, 2009 [5 favorites]


    Medieval manuscripts, modern decoding. I'm an academic who works with obsolete material; I have full faith in conservation, future (re)discovery.
    posted by woodway at 4:44 PM on January 26, 2009


    DAMN this is the right link.
    posted by woodway at 4:45 PM on January 26, 2009


    > Shakespeare: there are only a handful of documents about the man. His works only survived because some of his friends
    > and aquaintences were determined to have them recorded - and many of his plays have no real definitive version precisely
    > because of that - he didn't write them down for posterity, and he didn't oversee the process. If we had some writing from
    > some of his actors, or the guys who built the globe, or theatre goers, we'd be immeasurably enriched.

    I trust Shakespeare's notion of what's worth fetishing and preserving a great deal more than I trust his friends'. One has to let most of the past go or be overwhelmed by it. Do we imagine that, had there been a massive effort to preserve more English Renaissance drama, we would now have more Shakespeare? We would not. We would have roomsfull and roomsfull of sub-Beaumont and Fletcher. Remember Sturgeon's Law and be comforted.

    As for preserving the www, we can safely trim the job down a bit. The nineties may be memorialized by one page (bgcolor=#ff00ff) bearing one line of marquee blink text and one "site under construction" animated gif, all from AOL. Since then: bit-bucket all the pr0n sites and all the random flamewars and then apply Sturgeon's Law to what's left. The amount of work facing the archivists and curators in the crowd will seem a lot less daunting. (link to airsicknessbags.com courtesy of gman)
    posted by jfuller at 7:10 PM on January 26, 2009


    The internet black hole already exists, as has already been pointed out. I remember finding it very difficult to find old Addicted to Noise reviews, and lots of news sites and blogs I used to follow in the late 90s have disappeared without so much as a fond memorial note. (Bring the Rock, anyone? Ah, to read up on bands I never listened to like Braid and Slint...)
    posted by chrominance at 7:22 PM on January 26, 2009


    Despite the medium that carries it, I'm not too worried about file formats themselves. It seems like the formats for the things they're concerned with losing have actually remained pretty stable. Tiffs, jpegs, AIFF or WAV audio files, .doc, .txt, and now the ubiquitous .pdf etc, seem like they're not going anywhere. Also, if you really need something bad, there are still interfaces around for 5.25" floppies, zip discs, Cyquest drives, & more. Intrnal hard drives have grown exponentially, which allows people to recover their files from these off-line storage mediums as they become obsolete. I feel relatively certain that my tiff scans of all my Fuji film slides will outlast the originals by quite a long time, and that the digital transfers from the fucking melting Ampex 456 2" tapes are also more secure.

    If I ever become famous enough for the smithsonian or someone to want all my crap, I doubt they'll have any trouble reading it in 20 years, or even 50.

    Also, backup, backup, then backup again. If it's important.
    posted by Devils Rancher at 7:23 PM on January 26, 2009


    LarryC - For those Digital Archives, it's "if you're a governmental agency in WA, you're covered."

    True enough! Though I am always pushing to expand our definition of public records, we will never back up stuff like local blogs or discussion boards or non-governmental websites. Once someone stops paying their hosting bill the stuff exists only on their hard drive--and maybe if we are lucky as a bare-bones kind of thing at the Internet Archive.
    posted by LarryC at 8:30 PM on January 26, 2009


    Pastabagel: "I'm thinking more along the lines of setting your browser to a date, and then you just surf the web as you normally would, except that everything comes up as it would have on that date. I tried to mimic this behavior by choosing google.com in the wayback machine, but that seems to break it."

    Oh man, you must have missed it. Back in October Google, to commemorate their 10th birthday, resurrected their oldest reliable search index (January 2001) and hooked it up to a search page. You could see a page of Google results for any term as they were way back when, before iPods and YouTube and September 11th. Even better, the results were linked (where possible) to the Internet Archive, so you could see the preserved version of the page you wanted. It was incredible.

    But for some bizarre reason -- they claimed it was because it would be "silly" to celebrate your birthday all the time -- Google took the search page down at the end of October, and there's no indication it's ever coming back.

    But hey! At least they're still running Knol, right? So much more valuable and interesting, Knol. Yes.
    posted by Rhaomi at 10:59 PM on January 26, 2009


    Ah, here was their "reasoning":

    How long will this service be available?

    One month. It's kind of lame to be celebrating a mid-September birthday in late October, don't you think?

    posted by Rhaomi at 11:04 PM on January 26, 2009


    It is modern day Library of Alexandria. When the large meteor hits the Earth and society throughout the world collapses, a thousand years later when some sort of renaissance occurs, all that data stored on hard drives on various servers around the world will have turned to dust.

    Every now and then someone should print out the entire contents of Wikipedia. Really.
    posted by eye of newt at 11:32 PM on January 26, 2009


    192.168.x.x IS OFF LIMITS TO YOU GUYS, OKAY? you can have the rest of it.
    posted by davemee at 12:54 AM on January 27, 2009


    The problem, as far as I can tell, isn't that they can't archive the digital world.

    We know more than 50% of what is online is crap that could go down the memory whole and the world would be no poorer. We're just not entirely sure what the 50% we've got to keep is.
    posted by MuffinMan at 7:47 AM on January 27, 2009 [1 favorite]


    I trust Shakespeare's notion of what's worth fetishing and preserving a great deal more than I trust his friends'. One has to let most of the past go or be overwhelmed by it. Do we imagine that, had there been a massive effort to preserve more English Renaissance drama, we would now have more Shakespeare? We would not.

    Then we wouldn't read it. On the other hand, if someone had used the same criteria and not preserved Bach's music during the times he was unpopular, we'd be significantly poorer.
    posted by ersatz at 12:50 PM on January 27, 2009


    I look at it like this; while there are undoubtedly some great things that will be lost through media type depreciation, this will lead to people completely forgetting about certain styles. This means that in a decade or two, my first web page from 1996, with its red, center-justified text, on a black background, and spinning animated gifs of flaming skulls on fire is going to look totally cutting edge and revolutionary!
    posted by quin at 2:56 PM on January 27, 2009


    It's sad how fast the web falls apart. I remembered this story recently, and was surprised to find that every. single. link. was dead. Sure, archive.org has mirrored most of the links (but not the OP), but it would be much better with the original pages in place.
    posted by ymgve at 6:31 PM on January 28, 2009


    « Older Greylock Arts curated webcomics exhibit   |   Of Neil Gaiman and Infinite Canvas Newer »


    This thread has been archived and is closed to new comments



    Post