Join 3,512 readers in helping fund MetaFilter (Hide)


Internet Archive: One copy of every book ever published, in shipping containers
June 6, 2011 7:41 PM   Subscribe

A common refrain is "a library is not (just) a warehouse of books." Except, when it is. Internet Archive, best known as the worlds largest collection of digital books in the public domain, has started collecting "one [physical] copy of every book ever published" for long-term warehousing in shipping containers.
posted by stbalbach (58 comments total) 28 users marked this as a favorite

 
Cool!
posted by limeonaire at 8:11 PM on June 6, 2011


When I was a kid, I used to want to preserve every book ever published. It broke my heart to think that all could be lost someday. This is very cool.
posted by Xoebe at 8:25 PM on June 6, 2011 [1 favorite]


Sounds expensive. But if they can afford to do this while still expanding their digital collections and everything else they're currently doing, then good for them.
posted by MrFTBN at 8:30 PM on June 6, 2011


Yeah, I'm with Xoebe. In a kind of distant, unfocused way, I think doing this was kind of my dream job.
posted by penduluum at 8:33 PM on June 6, 2011


I didn't realize the Internet Archive was so well funded.
posted by jayder at 8:35 PM on June 6, 2011 [2 favorites]


I'm going to be a pendant and say that when you're warehousing books for the purpose of preservation, with limited ability to browse and checkout (mostly relying on physical access by a specialist), then you're not really a library so much as you're an archive.

It is a terrific project though. I'm glad to see that stimulus money helped to fund it.
posted by codacorolla at 8:47 PM on June 6, 2011 [1 favorite]


The Internet Archive people are really some of the most awesome people on earth. They are the digital librarians of our age; and when I say the word "librarian," it has overtones of "hero."
posted by koeselitz at 8:58 PM on June 6, 2011 [4 favorites]


It's like one of those seed arks under the mountains in Norway. I feel in my bones like this instinct to put valuable things in one place and bury them, to hedge against the prevailing attitude that all things are ephemeral and replaceable (or at least surpassable), is wise. And needful. I couldn't even say why, except that as I get older I begin to resent entropy in a different, more personal way.
posted by penduluum at 9:02 PM on June 6, 2011 [3 favorites]


Maybe all of these should be digitized... you know, for back up only.
posted by underflow at 9:05 PM on June 6, 2011


It seems somehow a less expansive, lofty ideal once you've wandered through a basement filled with hundreds of thousands of westerns and Harlequin Romances. I hope they're applying some sort of metric of worth to this project, as it would be too bad if, after the Apocalypse, the few starving and half-naked survivors were to crack open the last surviving freight container only to have 50,000 Louis L'amour books spill out at their feet.
posted by Devils Rancher at 9:26 PM on June 6, 2011 [1 favorite]


it would be too bad if, after the Apocalypse, the few starving and half-naked survivors were to crack open the last surviving freight container only to have 50,000 Louis L'amour books spill out at their feet.--Devils Rancher

Why stop there? Just imagine this world 1000 years later, with its civilization based on these 50,000 books, the only written history the population has to rely on.
posted by eye of newt at 9:34 PM on June 6, 2011 [5 favorites]


The Library (of Congress) serves as a legal repository for copyright protection and copyright registration, and as the base for the United States Copyright Office. Regardless of whether they register their copyright, all publishers are required to submit two complete copies of their published works to the Library if requested—this requirement is known as mandatory deposit.[13] Parties wishing not to publish, need only submit one copy of their work. Nearly 22,000 new items published in the U.S. arrive every business day at the Library. Contrary to popular belief, however, the Library does not retain all of these works in its permanent collection, although it does add an average of 10,000 items per day. Rejected items are used in trades with other libraries around the world, distributed to federal agencies, or donated to schools, communities, and other organizations within the United States.[14] As is true of many similar libraries, the Library of Congress retains copies of every publication in the English language that is deemed significant

Worthy endeavor. I feel like I've lived in the golden age of books. About a third of my books were ruined in flooding, and the sense of loss I feel about that gives me an understanding of the motivation. Digital books are not the same at all; whatever books become, digital publishing will change them.
posted by theora55 at 9:43 PM on June 6, 2011 [2 favorites]


One of the greatest ideas I’ve ever heard. Speaking of Louis L’amour; what we know about past civilizations is only based on the random collection of books we’ve found, I always joke that those people might be totally embarrassed if they knew. Maybe we’re reading the Danielle Steele or Theodore Kaczynski of ancient societies.
posted by bongo_x at 9:46 PM on June 6, 2011 [5 favorites]


I wonder if at some stage our greatn-grandchildren are going to curse our memory for having thrown so much money, effort and materials into monolithic storage in the face of threats to survival of physical information. The seed vault and possibly this project too both strike me as foolish gestures in the face of the uncertain spectre of social and environmental stress and collapse.

Digging deeper in the article, I do see that Internet Archive's physical storage plan appears to anticipate multiple maintainer organizations and storage locations, so my criticism (and other folks' comments in this thread making allusions to the Svalbard seed vault) may actually be largely misplaced, though I do wish the article talked more about this part of the plan. As for the suitability of the shipping containers for this task---very mobile and manipulable, so long as you have the accessible and affordable transportation infrastructure needed to use them efficiently, and protect them in-transit, much less beneficial to the mission if those links begin to break down or the perceived value of the containers becomes greater than their contents---the jury will have to remain outstanding for some time.

It surprises me that the fate of the Library at Alexandria hasn't had more currency as we contemplate moves like the Internet Archive plan. While central Hellenistic and Roman libraries/archives throughout the Mediterranean were destroyed or lost in political upheaval or natural disaster, it was a fraction from the distributed private collections of the Roman aristocracy, transplanted to monasteries, that survived to sow the seeds (or, more likely, at least inform) the civilizational rebound that ultimately followed.

If Archive's plan is based on a viable long-term model for distributed responsibility, storage and self-financing, it has a much better chance of serving us well in the long term than monoliths like the Svalbard seed vault. In my view, success in convincing educational institutions to shift authority for their off-site storage to local caretaker organizations would be a much more important story than an archival system built around shipping containers.

Also, the best way to ensure the survival of materials intended for archiving is to maintain their contemporary relevancy. Are there other ways to ensure the survival of niche media in the face of digitization? Would it be more effective in the long run to promote the collectibility of endangered facets of our global bibliography, rather than boxing up one fragile ark? And what of the vast quantities of non-English language physical media? How endangered by digitization are Hindi crime fiction, or Estonian technical manuals? Are we neglecting, and thus damning, non-English bibliographies, or are they thus far more resilient to death-by-digitization by virtue of not attracting the same Silicon Valley interest and capital (and U.S. govt stimulus funding*) to digitize them?

* This, to me, is the most interesting revelation in the whole article: that much of the momentum of Internet Archive's digitization work has been provided by job-creation subsidies. I wonder if Google is taking advantage of the same programs? It is kind of like burning the scrolls to keep warm...
posted by waterunderground at 9:57 PM on June 6, 2011 [2 favorites]


It's a bit homunculus-y, but I don't care about, and am not sentimental towards, the form—be it book, ebook, etc—but the content, the story. Things, be they books, cds, or teaspoons do serve as a mental weight, something to always be worried about.

I can't help but think in light of the already existing legal repositories the Internet Archive's money could be better spent.
posted by oxford blue at 10:04 PM on June 6, 2011


The more copies of a text - in whatever form - the better the chances of survival, so I'm all for this. And I hope they do warehouse all the popular literature as well, no matter how ephemeral and terrible. You can tell as much about a society from what they furtively and guiltily read in dark corners as by what they proudly displayed to visitors.
posted by lesbiassparrow at 10:39 PM on June 6, 2011


I can't help but think in light of the already existing legal repositories the Internet Archive's money could be better spent.

Well, Brewster noticed that libraries were throwing books away after they were digitized, so there is no guarantee libraries will preserve physical book collections, or just digitize them to reduce cost and greater accessibility. So this project is meant to address that, by being a book warehouse, not meant for human access except on a bulk basis for some future unknown reason. I'm not sure anything like it exists, other than personal storage lockers.
posted by stbalbach at 10:41 PM on June 6, 2011


LOCKSS
posted by GeorgeBickham at 11:10 PM on June 6, 2011


There's a couple in Saskatchewan whose neighbour's husband passed away and left behind his collection of 350,000 vintage and rare books. The neighbour started burning them after her husband's death. It'd be great if somebody could find a way to hook them up. Read about it here.
posted by empatterson at 11:43 PM on June 6, 2011 [1 favorite]


To me, archiving books is like archiving tcp packets. Why would I want to preserve the transmission medium? When I digitize stuff, I throw away the original.
posted by ryanrs at 11:44 PM on June 6, 2011


Sorry. Posted without finishing the story... The horrified couple rescued the books and is now saddled with the entire collection and are literally drowning in books.
posted by empatterson at 11:44 PM on June 6, 2011


To me, archiving books is like archiving tcp packets. Why would I want to preserve the transmission medium? When I digitize stuff, I throw away the original.

One difference is that born-digital objects, TCP packets, are all alike. The notion of an 'original' TCP packet is not just redundant, but nonsensical. However, born-analog objects, like books, embody data that no digital copy can fully capture. Paper is the obvious example: without the original you have a greatly reduced ability to detect forgeries in the historical record. But paper also carries other kinds of information. The cheap yellowing paper of a Louis L'Amour novel tells a story about the economic and cultural status of the "information" within the book. Form and content are not just related, but indistinguishable.

Archiving books is also a relatively cheap backup method (where cheap is relative to loss, the cost of which is incalculable).
posted by GeorgeBickham at 12:20 AM on June 7, 2011 [3 favorites]


There's a lot of over-sentimentalization going on here. Not all books deserve to be preserved. Society will not be worser off if every book is not stored for some future unspecified purpose. We are already at the level where future generations will have unprecedented amounts of data about every aspect of our lives, culture, etc. That satisfies whatever academic duty we may owe the future.

Saving every or even most published works does not meaningfully contribute to any purpose. Knowledge for knowledges sake must have some limits.

The cheap yellowing paper of a Louis L'Amour novel tells a story about the economic and cultural status of the "information" within the book

To what end? How will this help mankind today or tomorrow?

We do not think every child's drawing is worthy of permanency. Yet books with the same, or even less, worth (on an emotional or intellectual level) are imbued with some sort of mystical value.

A book can be a nice thing—the story it tells is an even better thing—but neither is essential. Of course that's a value judgement, but in the creative world—broadly speaking—value judgements are made constantly.
posted by oxford blue at 12:32 AM on June 7, 2011


There's a lot of over-sentimentalization going on here.

Whilst I agree that many people do tend to be rather sentimental about books and other cultural objects, I don't see this as a problem.

Not all books deserve to be preserved.

Can you provide me with a list of which ones do not?

Society will not be worser off if every book is not stored for some future unspecified purpose.

How will we know?

We are already at the level where future generations will have unprecedented amounts of data about every aspect of our lives, culture, etc. That satisfies whatever academic duty we may owe the future.

I don't get this. Because we have lots of data it doesn't matter what it is?

Saving every or even most published works does not meaningfully contribute to any purpose. Knowledge for knowledges sake must have some limits.

The cheap yellowing paper of a Louis L'Amour novel tells a story about the economic and cultural status of the "information" within the book

To what end? How will this help mankind today or tomorrow?


It might never. But sometimes the material that we think is most ephemeral has a habit of turning out to be pretty interesting (warning: eye-bleedingly bad page design, and I'm glad that the Internet Archive preserves vintage web-pages things , too). I sometimes think that we learn more about a society from the things that it chooses to throw out than what it chooses to preserve, with all the pomp and pretension that memorial embodies.
posted by GeorgeBickham at 1:00 AM on June 7, 2011 [5 favorites]


Sumerian clay tablets weren't immediately valuable once the transactions they reported, or the laws they contained, were no longer immediately useful. But long after, they were a useful record of a writing system, a language, a culture, and an economy.

A non-curated archive contains what future historians will find valuable, where present curators may not.
posted by zippy at 1:26 AM on June 7, 2011 [4 favorites]


However, born-analog objects, like books, embody data that no digital copy can fully capture.

Are your books handwritten by monks or something? Modern books are digital from manuscript to printing press.

Want to capture the subtle qualities of the paper? Attach a datasheet from the paper mill: econo_pulp_9033_techdata.pdf
posted by ryanrs at 1:40 AM on June 7, 2011 [1 favorite]


"A non-curated archive contains what future historians will find valuable, where present curators may not."

Exactly! Archaeologists of today have made wonderful finds by rooting through ancient rubbish heaps and middens, not to mention all those monasteries that were the medieval equivalent of Grandad's basement full of Louis L'amour novels. It's the temporary, ephemeral stuff that is always least preserved and most mysterious to future generations.

This book warehousing sounds like a very good idea.

"To me, archiving books is like archiving tcp packets. Why would I want to preserve the transmission medium? When I digitize stuff, I throw away the original."

For you as an individual that may be fine, but for a society it's a bad strategy. Electronic storage is a very fragile thing, dependent on the smooth operation of many different systems to function. And then there's the constant evolution of technology, which necessitates the recopying of everything from time to time into more advanced formats. We don't notice the effort involved in keeping all this information accessible because it's spread out among the whole human race, but if there was a big enough disaster or war almost everything could be lost in a short span of years. Without electricity the Internet becomes a bunch of delicate discs stored inside plastic boxes. Paper books are much tougher than that.
posted by Kevin Street at 1:52 AM on June 7, 2011 [5 favorites]


Modern books are digital from manuscript to printing press.

Fair point, but we aren't so much talking about modern as historic books that were never digitally embodies. I should have made this a bit clearer. However, not even all modern books have gone through that workflow and I wouldn't want to try to reverse-engineer it solely from printed copies where they have, should I be interested in doing so (and people are and increasingly will be).

Want to capture the subtle qualities of the paper? Attach a datasheet from the paper mill: econo_pulp_9033_techdata.pdf

I'd love to see this become part of digitization practice! More metadata=better. There's still the forgery issue, though.

Just to be clear, if it's not already, I love me some digitization. Internet Archive's digital copies of books are tremendously valuable to me the whole time - just yesterday I found some material that will be saving me a tonne of work. But a digital copy shouldn't be seen as a replacement for the original. Internet Archive are simply taking responsibility for the long-term backup of books that they themselves have digitized. That humility and thoughtfulness about their own mission is rather unusual, and admirable.
posted by GeorgeBickham at 1:56 AM on June 7, 2011 [2 favorites]


Interesting. If you publish something in Germany you are obligated to submit a copy to the federal national library. As far as I know they keep at least one copy of every published (German) book.

There seems to be nothing similar in the US, except the legal deposit system.
posted by yoyo_nyc at 3:16 AM on June 7, 2011 [1 favorite]


I think the American system is basically the same as the German system you describe. Two copies of [pretty much] every book get sent to the Library of Congress.
posted by ryanrs at 3:29 AM on June 7, 2011


I think the American system is basically the same as the German system you describe. Two copies of [pretty much] every book get sent to the Library of Congress.

Ditto for the British Library, I believe. They archive magazines too, meaning that the Library holds what must be by far the country's largest collection of smut.
posted by metaBugs at 3:46 AM on June 7, 2011 [1 favorite]


Perhaps to head of a wave of 'me toos' here's Wikipedia on Legal Depository—it's a standard, rather than exceptional, practice.
posted by oxford blue at 4:01 AM on June 7, 2011


So are they cataloging ephemeral like brochures, manuals, mail-order catalogs, legal decisions, courthouse records and the like? It seems like from a cultural history perspective, if we're going to be completist, that sort of stuff bears more of an analog to the Sumerian clay tablets that ancient historians prize, and is the most apt to get lost because it's not immediately valued, like a work of literature.

I've been looking for information about a couple large shirt printing equipment makers that went bankrupt before the Internet was widespread, and there seems to be just absolutely nothing. A couple obscure patent record references appear to be the sum total of our on line repository about some of this stuff that's at least esoterically interesting to me.
posted by Devils Rancher at 5:11 AM on June 7, 2011 [1 favorite]


I work in the Physical Archive, in fact, I am there now.

Standing next to the wall of containers is really, truly awe-inspiring. It is the Archive's commitment to being a proper library AND archive writ large.

The thinking going on here is deep. Example: the conditions inside the containers. Because the Archive will only allow regular access to the digital copy of a book, the physical copies can be stored in conditions that preserve them for longer periods of time than other archives. In other words, there will be greater access to the digital text, and longer life for the physical.

The Archive's model also represents diversity. Sure, there are other archives, especially in the library ecosystem. There are other book seed-banks. But none of them have the funding model that the Archive does, and so none of them have quite the same abilities to move in different directions. If something terrible happened to our library funding, (like Congress) the Archive would still be around. Diversity in model and intent makes a robust future for information access.

As for the Library of Alexandria comments... you're kidding, right? The Internet Archive built a copy of itself in Alexandria.

posted by fake at 6:29 AM on June 7, 2011 [9 favorites]


conditions that preserve them for longer periods of time than other archives

Well, yes, but not indefinitely. I wonder, won't most of the books still fall apart in a few decades due to the poor quality acidic paper modern books are generally made from? In a few hundred years, no matter how good the conditions, most of the containers will be full of nothing but dust, won't they? Or if they're not planning to keep them for a few hundred years, what is the plan?
posted by Segundus at 7:27 AM on June 7, 2011


In fact, on reflection I think books are inherently ephemeral. They're meant to be read, not preserved in a dungeon. If you want to preserve a text for a very long period, history suggests that the only way is to produce new copies repeatedly.
posted by Segundus at 7:31 AM on June 7, 2011


As for the Library of Alexandria comments... you're kidding, right? The Internet Archive built a copy of itself in Alexandria.

I wasn't actually. Many of these efforts look very frail. And Segundus' point builds on what I said about relevancy. The greater task is to promote an environment in which new copies of this material are being made.

It really depends, and Segundus' questions are leading towards this, on what timescale is informing Archive's actions. If the main objective is to preserve as much niche physical material as possible for the next 30-40 years until we've figured out where digitization is actually going and whether we're going to survive the coming ecological crises, then this is probably a reasonably good way to do that (assuming we manage to ride those crises out).

If this is a much longer-form effort to preserve a 1000 year physical archive, this isn't at all the way to go about it, and will ultimately fail of one cause or another. As said previously, the best way to secure some success from the specific method that Archive is right now pursuing is to support the widest and most robust distribution and redundancy possible by encouraging the transfer of institutional storage collections to independent caretaker organizations whose responsibility would be rooted in looking after that specific collection and who wouldn't be bound to the financial and organizational whims that are currently threatening so many library collections.
posted by waterunderground at 8:08 AM on June 7, 2011


If I had the resources of Bill Gates or Warren Buffett, I'd like to build a giant labyrinth out of solid granite, with the full text of as many books as possible etched in the walls, then bury it.

Preferably on the moon.
posted by General Tonic at 8:11 AM on June 7, 2011 [2 favorites]


Maybe we’re reading the Danielle Steele or Theodore Kaczynski of ancient societies.

Really, not so much. There is second and third rate classical literature, but for the most part it wasn't worth while to have the thing copied out unless it was really worth saving. We have Danielle Steele, they had Homer. (Or, for the low brows, Apuleius.)

Books are cataloged, and have acid free paper inserts with information about the book and its location,


Nice touch, but are they neutralizing the acid full paper of the books themselves? I think of these places and what comes to mind is that warehouse at the end of Raiders of the Lost Ark, only with containers of books, all quietly engaged in a slow burn of wood based paper that will turn 99% of this stuff to dust in due course.

Though who knows? Scholars 'n' scientists have recently begun to recapture hidden stuff from the Oxyrhynchus papyri
posted by IndigoJones at 8:20 AM on June 7, 2011


I'm still sort of boggled by your assumptions, waterunderground. Do you think that the people responsible for this haven't thought these same thoughts? Do you imagine that they embarked on this without consulting anyone, or leaning on the experience of all the other archives they are connected with? Are you really assuming this is the one and only location? The linked photograph shows approximately 10% of the total number of containers, which fill about 20% of the available space in the current Physical Archive.

The Archive is exactly about making new copies of the material. Digital copies, sourced from and inextricably linked to the actual physical book that is stored in the container. You bring up ecological concerns - producing, storing, and maintaining NEW copies of all paper books is an obviously unsustainable notion.

The short answer to your assumptions is: you have to start somewhere. Starting with a million or so books, a better-than-libraries humidity, handling, and temperature level, and a half decade of digitization and preservation experience is not bad.

but are they neutralizing the acid full paper of the books themselves?

We have researched the chemistry and there is a plan in place.
posted by fake at 9:04 AM on June 7, 2011 [1 favorite]


With regard to the TCP packet comment: because when I left college about 10 years ago, it only took a few years until I could no longer read the digital format (the proprietary email 'archive' file) and then a few more until I couldn't read the physical format (disks).

The few printouts I have are still readable today. You know, with my EYES.

Digital archives, aren't. They all require constant human intervention. Basically the re-publishing of worthy works mentioned upthread. Paper won't last forever but even cheap acid-plenty copies will last a hell of a lot longer than whatever is on your computer right now.
posted by danny the boy at 9:14 AM on June 7, 2011 [4 favorites]


I have a clipping of an article I was quoted in, about this new thing called 'Friendster'. The newspaper itself may be yellowing but it has outlasted the network it was reporting on (and all the data on it).
posted by danny the boy at 9:18 AM on June 7, 2011 [2 favorites]


I am in general agreement concerning analog vs digital formats -- I'm a particular fan of microfilm and microfiche -- but I think many of the digital-storage horror stories, where information becomes unreadable in just a few years, aren't the fault of digitization per se, but more the attitudes of the people who established the format.

E.g., if you're designing a mail program, it's easy to ignore archive concerns. Chances are you probably are more concerned with performance, meeting your deadlines, and then moving on to the next project or the next version. Software developers, commercial ones in particular, are notorious for not thinking beyond the next version, and forgetting that software can linger on for a very, very long time. This is unfortunate.

But I've also seen digital systems that are very well-designed, by people who care deeply about not losing data over the span of centuries. Computer Output Microfilm (COM) and aperture-card systems are still the gold standard, in my opinion, although they're not as popular as they once were. More modern systems require constant maintenance, but the amount of effort required per page of information stored in them is pretty small.

I know of small organizations (banks, etc.) that are maintaining, in digital form, the equivalent of large warehouses of paper, with very small budgets. Some of these organizations have been around for a century, their business models are very stable, and the maintenance of the archives costs little enough that it's unlikely to be interrupted. They've already, in some cases, rolled information from COM (silver-halide film) to COLD (typically magneto-optical), to hard drives, long before the older formats have become unreadable. That's pretty impressive and it's possible because the technology is cheap and available.

And at the same time, our ability to search information has gotten much better. If you design the retrieval system intelligently, there's no downside to having years of information available. (Typically you just design it to return newer results first, since that tends to be what people want.) I occasionally see objections to infinite-retention systems like Gmail, on the basis that having all that stuff in one's archive would be problematic ... but I've never had a problem finding something in Gmail because of the old mail stored there. There's no issue with "hoarding" digital assets, as long as you can easily search them, and we're getting better at that all the time.

So while I'm pleased that physical books are being archived as a backup, I think we need to give credit where credit is due, and blame where blame is due -- digital archival strategies can work very well, and a lot of the data loss that has occurred in the past few decades is due to poor or otherwise shortsighted design.
posted by Kadin2048 at 10:09 AM on June 7, 2011 [1 favorite]


Do you think that the people responsible for this haven't thought these same thoughts?

No, I'm not assuming anything of the sort. As I noted in my original comment, the blog article appears to indicate, in a single bullet point, that the intention is distributed location and responsibility, however that's not what it focuses on. The focus is on the whizbang shipping containers and indexing. A number of commenters here are likening this to Svalbard, which is an easy jump to make under the circumstances although I hope that that impression is fundamentally incorrect.

The Archive is exactly about making new copies of the material. Digital copies, sourced from and inextricably linked to the actual physical book that is stored in the container. You bring up ecological concerns - producing, storing, and maintaining NEW copies of all paper books is an obviously unsustainable notion.

As I said, it comes down to what the actual priorities for Archive are to be. Is it long-term cold storage? Is it a middle-term effort to protect this material until we get a better handle on the trajectory of digitization? Is it an interactive archive? To me, this point is very confused in the way the physical project is being conceptualized and promoted.

How do you ensure that in 50 years people will be celebrating our foresight in creating this thing, and won't be flying it into the sun? If the goal is a long-term ark, how do you ensure that it doesn't die, abandoned, in an industrial park when the civilizational shit hits the fan, or that much of the archive isn't lost when scrappers dump the worthless fiber bulk stores to get the sheet steel? (hint: you probably can't ensure this) Or is it just a short-term buffer, to reduce loss in translation and discourage some of the idiotic destruction currently taking place? If the intention is to aim for everything at once, what are the trade-offs and how should they be mitigated? Please don't get me wrong, I agree that a project like this is very necessary. But things like this can be very frail, and I'd be more comfortable with it if there was a clearer position being communicated on what it is, what it wants to be, and what it isn't.

I'm sure people inside the project are thinking about these things, but I don't think it should hurt for other people to beg questions about concept and governance. The less sexy aspects of the project are far and away the most important, and folks at Archive should be encouraging as much interest in these topics as possible, rather than being defensive about them.
posted by waterunderground at 10:32 AM on June 7, 2011


Oh, hey!  I was at the opening!  I met jscott for the first time, who I've signaled about this thread.

This location can store around a million books with the containers that are there now (30 containers, around 400k books each).  Then, they can fill the other half with more containers, adding another million.  They've made steps to make the storage low-maintenance.  Their target is to archive ten million books, which requires multiple locations.  For transportation, there are nearby train tracks - the vibrations of which concern me actually, because the books will be sort of desiccated and hopefully not too brittle.


underflow: Maybe all of these should be digitized... you know, for back up only.

They digitize and OCR them before being boxed up, put onto pallets, & put into shipping containers.  The scans are useful for the blind and dyslexic, & there was talk of loaning them out, one copy at a time, which I didn't follow.  I think they're happy to scan books borrowed from libraries and private collectors, too.


ryanrs: Why would I want to preserve the transmission medium? When I digitize stuff, I throw away the original.

They like tracking provenance, and being able to go back to the demonstrable original.


fake: I work in the Physical Archive, in fact, I am there now.

Heh.  I thought I recognized your work in the scanning equipment.
posted by Pronoiac at 11:00 AM on June 7, 2011 [1 favorite]


How do you ensure that in 50 years people will be celebrating our foresight in creating this thing

That sounds like what investors require who expect a return on their money. Rather, this is more like basic research that may or may not lead anywhere. When Brewster started archiving the web in the 90s it seemed like an interesting exercise the value of which was unclear. Today the value is becoming increasingly clear with every passing year. We won't know for a long time if archiving books this way will be celebrated by the future, but if history is a guide, most likely "yes".
posted by stbalbach at 11:20 AM on June 7, 2011 [1 favorite]


When I was a kid, I used to want to preserve every book ever published. It broke my heart to think that all could be lost someday.

It makes me think of the burning of the library in The Name of the Rose. Ack!

It's a bit homunculus-y,

And there's nothing wrong with that.
posted by homunculus at 12:10 PM on June 7, 2011


waterunderground: “As said previously, the best way to secure some success from the specific method that Archive is right now pursuing is to support the widest and most robust distribution and redundancy possible by encouraging the transfer of institutional storage collections to independent caretaker organizations whose responsibility would be rooted in looking after that specific collection and who wouldn't be bound to the financial and organizational whims that are currently threatening so many library collections.”

This has been done. The founding purpose of the Internet Archive has always been to provide digital, freely distributable copies of every text it records, with server redundancy, so that these things are freely available, and so that copies can and will be made all over the internet. Looking at the cross-pollination between the Internet Archive and Youtube (for instance) this seems to be largely successful.

“Also, the best way to ensure the survival of materials intended for archiving is to maintain their contemporary relevancy. Are there other ways to ensure the survival of niche media in the face of digitization? Would it be more effective in the long run to promote the collectibility of endangered facets of our global bibliography, rather than boxing up one fragile ark? And what of the vast quantities of non-English language physical media? How endangered by digitization are Hindi crime fiction, or Estonian technical manuals? Are we neglecting, and thus damning, non-English bibliographies, or are they thus far more resilient to death-by-digitization by virtue of not attracting the same Silicon Valley interest and capital (and U.S. govt stimulus funding*) to digitize them?”

I guess you're talking about something you refer to as "death by digitization," but I'm not sure what that means, nor am I at all confident that, whatever it means, it actually jives with the Internet Archive's mission structure. How does digitizing a thing kill it? Current digital media is not necessarily long-lasting, but that doesn't mean it can't be made more permanent. Moreover, the technology surrounding books can and must be improved in order to make preservation of all texts a feasible goal.

At this point, the Internet Archive has 2.8 million digitized texts. I respect their goal of collecting a copy of every book they've digitized, and that may be a sensible practice moving forward, but somehow I wonder if it'll be possible to acquire all of those books for previously-digitized texts. Even moving forward, this article notes that the books will only be retained when a loaning library isn't expecting them back.

It's interesting to think about what technology could be brought to bear to create a replacement for books that combines a certain amount of permanence with important qualities like lack of necessary equipment and immediacy to the reader. One possibility is miniscule print readable with a magnifying glass, a simple technology that I think we can expect will be available in the future. Using miniscule print, those two million texts might be printed out and bound, and thereby stored in a much smaller space than the shipping-container archive being envisioned.

waterunderground: “This, to me, is the most interesting revelation in the whole article: that much of the momentum of Internet Archive's digitization work has been provided by job-creation subsidies. I wonder if Google is taking advantage of the same programs? It is kind of like burning the scrolls to keep warm...”

Again, I don't quite get what you're saying here – how are scrolls being burned to keep warm? – but the Internet Archive has created a lot of jobs in the past ten years, so it makes good sense for job-creation subsidies to be offered it. Moreover, I want to point out, since it's tangentially related, that Google is an utterly different organization that does not and will not do many of the things that the Internet Archive will do. For one thing, Google is not at all dedicated to the preservation of texts; they're dedicated to being the chief source of texts, which is a different project. Google's deals with libraries have often been predatory and have preyed on the natural unsavviness of many librarians when it comes to the best use of their texts; for instance, in exchange for the imposition of loaning massive numbers of books to them, Google often simply promises to "digitize" those books, without promising to make the digital copies available to anyone in particular at all. In contrast, the Internet Archive has consistently worked with libraries, attempting to give a fair exchange while supporting the project of the preservation of information.
posted by koeselitz at 12:10 PM on June 7, 2011


One possibility is miniscule print readable with a magnifying glass, a simple technology that I think we can expect will be available in the future.

See: The Rosetta Project- this is how they're going to store things.
posted by BungaDunga at 12:15 PM on June 7, 2011


Good luck getting that Gutenberg Bible.
posted by spamguy at 12:23 PM on June 7, 2011 [1 favorite]


Moreover, I want to point out, since it's tangentially related, that Google is an utterly different organization that does not and will not do many of the things that the Internet Archive will do. For one thing, Google is not at all dedicated to the preservation of texts; they're dedicated to being the chief source of texts, which is a different project. Google's deals with libraries have often been predatory and have preyed on the natural unsavviness of many librarians

This is exactly the sort of thing I was alluding to. If Google (or its contractors) have been able to take advantage of similar job stimulus programs, and a by-product of their work is a weakening of long-term information access and stability, the allusion is apt I think. I certainly wasn't accusing of Archive of acting with the same kind of disregard, and am sorry if I conveyed that impression.
posted by waterunderground at 12:25 PM on June 7, 2011 [1 favorite]


...it would be too bad if, after the Apocalypse, the few starving and half-naked survivors were to crack open the last surviving freight container only to have 50,000 Louis L'amour books spill out at their feet.
That's not fair. That's not fair at all. There was time now. There was all the time I needed...! That's not fair!
posted by spamguy at 12:26 PM on June 7, 2011 [2 favorites]


...Many have a negative visceral reaction to the “butchering” of books...

I was reminded how I felt the first time I saw the scene in George Pal's Time Machine when he runs his hand through the shelf of books and they're all just powder.

The library in my hometown saved me from many wasted years, and learn to educate myself. When the power goes out, so long as there's a light, I'm content.
posted by Twang at 3:50 PM on June 7, 2011


Exactly! Archaeologists of today have made wonderful finds by rooting through ancient rubbish heaps and middens, not to mention all those monasteries that were the medieval equivalent of Grandad's basement full of Louis L'amour novels. It's the temporary, ephemeral stuff that is always least preserved and most mysterious to future generations.

Empty trash. Buy milk. Forge history. To trace the great arcs of civilization, historians tap the humble list
posted by homunculus at 9:57 AM on June 8, 2011


Nicholson Baker should approve.
posted by Rarebit Fiend at 10:18 AM on June 8, 2011


Hey, you know, my comment above came off the wrong way, and I'm sorry about that - the stress of moving has *really* been getting to me.

I've only been with the Archive about a week, and so it wouldn't be fair to say anything I've said reflects badly on them, or represents them at all.
posted by fake at 12:42 PM on June 8, 2011


I'm intrigued that recent conversations about books - physical copies, digitizing, e-books - don't seem to include any mention of the First Sale Doctrine. In my humble (and, apparently, extremely minority) opinion, the threat to the First Sale Doctrine is one of the most insidious things about the growth of e-books.

I will not buy e-books or MP3s until they include First Sale Doctrine rights to resell, give, lend, or otherwise pass along as I choose. I doubt e-media will acquire those rights, so I may never own an e-book.

Of course, that's not a problem with these physical books, but the lack of ability to hand down a book (when it's an e-book) highlights, for me, the danger of losing things we think can never be lost.

The Internet Archive may well be rescuing and preserving the only copy anywhere ever of some books. With the increasing availability of digitized books, both newly-published e-books and scanned copies of physical books, it's easy to imagine that all books ever published will be readily available to us. But there are countless books in print that may never be digitized. If the last copy of one of those books is pulped or burned for heat or simply left to disintegrate in a landfill somewhere, it's gone.

I love old books. I've bought dozens and dozens of books at each of the last Big Book Sales put on by the Friends of the San Francisco Library. My very favorite book from the most recent sale (High School Subjects for Home Study) was published in 1936; it's not available online, not in Google Books nor elsewhere, as far as I can tell. I've never seen it before, despite the hours I've spent at used book stores and library sales. There are only 9 copies listed at ABE Books ... not a lot. Maybe some folks wouldn't care if this book - every remaining copy - disappeared forever, but it would sadden me immensely.

I think this Internet Archive project is an important mission. I am such a fan of the Internet Archive that I sought out their contributions page so I could send them a check. I think I'll have to send in another.
posted by kristi at 5:52 PM on June 8, 2011 [1 favorite]


Maybe IA can help this couple from Pike Lake, Sask. - they "saved" 350,000 books (not a typo) from a neighbor ... now they're overloaded.

(That's a lot of books ... a college I went to had 250,000 that took up one floor of a huge building)
posted by Twang at 4:43 PM on June 9, 2011


« Older Dogglegänger will analyse the contours of your fac...  |  Few connections are deeper and... Newer »


This thread has been archived and is closed to new comments