Live from the Internet
December 23, 2011 7:33 PM   Subscribe

What is being scanned around the world The Internet Archive updated in real time. posted by Sailormom (17 comments total) 15 users marked this as a favorite
Cat scan jokes will result in perma-bans.
posted by Burhanistan at 7:36 PM on December 23, 2011 [2 favorites]

Hey, they are using sparklines on the summary slide.
posted by Foci for Analysis at 7:40 PM on December 23, 2011 [2 favorites]

Go Shenzhen Scanning Center!
posted by bottlebrushtree at 8:40 PM on December 23, 2011 [1 favorite]

This is curious. The Shenzhen Scanning Center has scanned 131,895 books, more than any other scanning center. Yet, there are only 328 book in the Shenzhen collection. And of those, most are off-limits copyright for the blind only (Daisy). So, where are all the other books from Shenzhen? The ones showing up from Shenzhen on also appear to be late copyright. This is a mystery for Charlie Chan.
posted by stbalbach at 9:45 PM on December 23, 2011 [2 favorites]

Legends of the Holy Grail by Folklore Society (Great Britain) 1878. Boston Public Library. Cool!
posted by stp123 at 10:05 PM on December 23, 2011 [1 favorite]

These appear to be real-time updates. Not because there are people scanning at 1am on Friday before Christmas, but because the raw scans were done days or weeks ago, added to the queue for deriving the various formats such as PDF (which is very CPU intensive), then showing up as they come out the pipeline.
posted by stbalbach at 10:13 PM on December 23, 2011 [1 favorite]

The fact that as soon as I clicked on this link (before reading the rest of the thread) it shored "The Gardens of Thomas Kincaide" at the Shenzhen Scanning Center made this pretty fascinating from the get-go.
posted by MCMikeNamara at 10:28 PM on December 23, 2011 [1 favorite]

I think there's something wrong with the site's Javascript, as clicking some links does not work.
posted by Blazecock Pileon at 12:28 AM on December 24, 2011

Hi there. Jason Scott, free-range archivist and general history dude.

I work for the Internet Archive, and among my projects are to increase awareness of all the good work that place does. Sure, folks know about the Wayback Machine, and over the years a person might be directed to the Speed Run Archive or have seen something from the Prelinger Archives which house thousands and thousands of those great government and educational films, and which seem to end up in almost every documentary of a historical nature. But the archive is just busting hump 24 hours a day to add so much more material online.. like, a whole lot, much more than I think the vast majority of people who've heard of the Internet Archive even know.

So I worked with a talented developer named Gijs van Tulder and, inspired by the Panic Staus Board over at Panic, Inc. in Portland, OR, we sat down and designed a "Status Board" of our own for Internet Archive's incoming scanned books.

This thing has been officially live less than a few hours, and is itself less than a week old from start of project, so there's some rough edges, but we dropped the news to the world because I found anyone at the Archive who looked at it lost productivity, so why not spread more of that for the Holiday season.

The Status Board is meant to be on a big, big screen, one of those types that you would see at an airport or near a bunch of cubicles, telling people what the Zeitgeist of the enterprise is (just like Panic's). What it's doing is a one-hour-delayed realtime portrayal of the incoming scanned books and documents being done by the dozens of Internet Archive scanning centers and partners.

If you're amazed and bewildered, that's fine. You're supposed to be. This is a firehose of information, reflecting the pure, amazing, miraculous mass of material that the Internet Archive is scanning for its partners, projects, and archives. I did an estimate some time ago and found they were adding a new book or document every 90 seconds - and that's just the scanning centers for documents. That doesn't count all the user-contributed content, or any other media type like movies, audio or web crawls. It feels like a lot going on the screen because it is a lot. That's what I'd hoped would come across.

Naturally, you'll see something intriguing go by, and the idea is that if this status board is running as a screensaver or as a decoration in the Internet Archive's lobby, you can grab the mouse and interact with it a bit - go right to the book, or browse the pages being shown, or see more about the scanning center. Again, this whole project (the status board) is less than a week old, so not everything's perfect - but I think the point is made.

As for books going by you then can't see, that's because many of them are ending up in the groundbreaking Open Library Project, where the Internet Archive and librarians representing all fifty states (and other places too) are working to provide a massive, networked e-book checkout system based on (among other things) the physical holdings of libraries. So you might see a book go by, scanned in just the same way that Google or Amazon or any of a number of groups scan books, that then is held awaiting for the right credentials to be shown (usually a checkout at a participating library). But for every book you see that you can't then open right then and there (and maybe you should sign up at the Open Library to be able to), there's going to be dozens of documents and books, some hundreds of years old, showing up for you to check out.

So there you go, a fun little project that I hope brings light to an amazing institution. Sorry for going on so long about it - imagine what it's like when people meet me in person and ask about it.
posted by jscott at 1:15 AM on December 24, 2011 [52 favorites]

One last clarification I realize I skipped in all that. As I show in this weblog entry, a book or document is barcoded (put into the system), then queued for being scanned at one of the centers, then actually scanned, then quality-checked for the image quality and completeness, and only then is uploaded into the system and set for derivation/storage. What the Status Board shows is the moment that the uploaded material is finally derived (into PDF, HTML, Daisy or other formats) and considered completely ingested. I've seen the process take a day and take 130 days, based on a massive pile of factors - just another fascinating aspect of the things the archive does that this status board helped me understand.
posted by jscott at 1:27 AM on December 24, 2011 [5 favorites]

Brewster rocks.
posted by Twang at 3:31 AM on December 24, 2011

This is much more interesting on week days, and not over a holiday, since it is realtime. (go jason and team).

There are 29 scanning center in 6 countries, about 1,000 new books a day. Books coming from lots of libraries (scroll down).

The Shenzen scanning center is rocking-- this is a Chinese Dept of Education sponsored project that is a joint project between a university, a contractor, and the Internet Archive. Most of the books are modern books. All are going to the blind and dyslexic and many to the everyone via the lending library (expanding via the in-library program).
posted by brewsterkahle at 5:29 AM on December 24, 2011 [4 favorites]

Okay, the real-time Live at the Archive thing is phenomenally, phenomenally cool - really fabulous work, Jason and Gijs and Archive, and many many thanks to Sailormom for posting this.


jscott says: the Internet Archive and librarians representing all fifty states (and other places too) are working to provide a massive, networked e-book checkout system based on (among other things) the physical holdings of libraries


So here's what I just did:

* clicked over to Jason's link to read about the project and see whether by any chance my library might be participating (yep, it is)
* read this 7-year-old thread about good summary history books
* noted the recommendation for John Merriman's A History of Modern Europe
* looked that up in Open Library
* clicked the Borrow eBook link
* signed in (I joined last February so I could update bits of info occasionally ... which apparently I've done exactly once, sigh)
* chose to view the book online rather than in Adobe Digital Editions

... and POOF - there it was in front of me.

I paged through the table of contents, skimmed the list of maps, and leafed through the first chapter. Then I returned it so someone else could check it out. (This is an excellent, very important feature - I've heard that Overdrive doesn't let you return a book when you're done with it, you have to wait 2-3 weeks until your checkout expires.)

AND there's a handy link to ABE Books so I can pick up a copy if I want (which I just might do).

I am GOBSMACKED. This is an amazing, wonderful, incredible thing, and it makes me very, very, very happy. It makes me feel especially good about the tiny donation I sent in last year and has me reaching for my checkbook to add another.

Thank you Jason, and Brewster, and everyone at the Internet Archive. This is one of the best Christmas presents I've ever gotten.
posted by kristi at 4:58 PM on December 24, 2011 [3 favorites]


The Internet Archive is a relatively small group of people for the amount of effect and work that has been done, and one of my goals is for things like the Status Board and other projects to bring people into contact with these amazing efforts. They've only recently started more intense promotion of what they have, having instead focused for years on making the whole thing as well-built and far-reaching as possible, something I can totally understand and get behind.

But now it's time to spread the word. And donate, too.
posted by jscott at 5:24 PM on December 24, 2011 [1 favorite]

Open library.

Wow. Just...WOW!
posted by BlueHorse at 10:36 PM on December 24, 2011

posted by Glinn at 8:07 PM on December 27, 2011

Kristi, Overdrive has added the ability to check books back in, at least in the Android app. Amazing, I know. Goddamn Overdrive.
posted by epersonae at 10:44 AM on December 28, 2011

« Older Let it..   |   dotEPUB Newer »

This thread has been archived and is closed to new comments