129,864,880 books
August 5, 2010 11:43 PM   Subscribe

How many books are there? 129,864,880.
posted by Joe Beese (68 comments total) 11 users marked this as a favorite
 
Wow, I am surprised this is so low. That's one book for every 50 people on the planet. I wonder how much this changes if you include self-published independent pieces no one's heard of. And I wonder how it compares to the number of CDs by both major labels and local bands.
posted by scrowdid at 11:54 PM on August 5, 2010


Does this mean that there are 600 million unique books in the world? Hardly. There is still a lot of duplication within a single provider (e.g. libraries holding multiple distinct copies of a book) and among providers -- for example, we have 96 records from 46 providers for “Programming Perl, 3rd Edition”. Twice every week we group all those records into “tome” clusters, taking into account nearly all attributes of each record.
Hmm, are any books less ephemeral then computer books? They're almost all completely useless just a few years after publication.
posted by delmoi at 11:55 PM on August 5, 2010 [4 favorites]


There is truly nothing more important, for the promulgation of information, than unlocking the vast amount of human knowledge in published books. We need the depth of libraries back.

Wikipedia compares "OK" with Encyclopedia Brittanica. That's nice. We need the rest.
posted by effugas at 11:57 PM on August 5, 2010 [1 favorite]


There are hundreds of uses for corn, all of which I'm going to tell you right now.
posted by roll truck roll at 12:01 AM on August 6, 2010 [5 favorites]


"It makes sense to consider all editions of “Hamlet” separately, as we would like to distinguish between -- and scan -- books containing, for example, different forewords and commentaries."

That's a start, and a considerable advance on where GBS started from. But the assumption that editions of Hamlet differ only in their paratexts (forewords, commentaries, etc.) suggest that Google still has a ways to go before it understands books as they actually are (some information on the history of the text(s) of Hamlet: these are major differences, but every surviving copy of the first folio is also said to be subtly different in its typesetting).

"One definition of a book we find helpful inside Google when handling book metadata is a “tome,” an idealized bound volume."

Whay about unbound materials - ephemera, for example?

Then we get some not-unreasonable discussion of the differences between catalogues, although a note of exasperation creeps in - We have to deal with these differences between cataloging practices all the time.

That's because descriptive cataloguing is partly a conjecture about the purpose behind the search for something as well as a statement about the thing in itself. Go and get me Hamlet and you'll see what I mean (cf Bateson's 'The Mona Lisa is in the Louvre, but where is the text of Hamlet?'). Then there's straighforward human variance and/or error. There's no shortage of errors and differences of opinion in OCLC..

So after all is said and done, how many clusters does our algorithm come up with?

Now we get to the heart of the matter. Google does algorithms. That's very powerful (I should say that I use GBS for work most days of the week) but they don't do semantic, top-down, conjectural, humanistic description. Books need both: so do we.
posted by GeorgeBickham at 12:23 AM on August 6, 2010 [4 favorites]


Two things I want Google Books to make publicly available: (1) their book-grouping algorithm; (2) some way of matching existing bibliographic records to Google's own "tome" master records. Obviously they're not going to publish the recipe for their secret sauce, but libraries would benefit greatly from it.

Someone should probably mention FRBR. Google's fundamental unit is the tome, an "idealized bound volume." That's a pretty unsophisticated way of looking at how books can relate to one another; it doesn't capture the ontological connection between, say, a Spanish translation of Hamlet and the English original, or the variations that GeorgeBickham mentions. In theory, library metadata practices can capture those complexities, but you can't design a catalogue-type system around a data model that's that complex. FRBR is a compromise.

Google being Google, I won't be surprised if they eventually end up brute-forcing those sorts of relationships by taking advantage of the sheer comprehensiveness of the data at their disposal. See also.
posted by twirlip at 12:41 AM on August 6, 2010 [3 favorites]


That was awsome.
posted by Ironmouth at 1:03 AM on August 6, 2010 [1 favorite]


There is truly nothing more important, for the promulgation of information, than unlocking the vast amount of human knowledge in published books. We need the depth of libraries back.

More important than the promulgation of information? How about the collection, analysis, and presentation of information - i.e., content creation. When "promulgation" works against that, we've lost more than we've gained.

Libraries full of dusty, out-dated books are called "museums."

In a progressive society, the encouragement should always be for the creation of new content. You know, progressive knowledge. Conservatives are the ones holding onto the past and standing in the way of progress.
posted by three blind mice at 1:26 AM on August 6, 2010 [1 favorite]


twirlip: Two things I want Google Books to make publicly available: (1) their book-grouping algorithm; (2) some way of matching existing bibliographic records to Google's own "tome" master records.

To be fair to Google, there are some encouraging signs in relation to your (2), even though the sums involved are rather modest.
posted by GeorgeBickham at 1:42 AM on August 6, 2010


three blind mice--

Our present situation involves the endless recycling of the same information, over and over and over. Teachers, there's got to be one of you on this thread. How homogenized have your student's papers become? I can tell you personally I saw something on Wikipedia a while back that interested me; I went searching for more and all I could find was...

That very same Wikipedia data, republished over and over and over again.

You're not wrong that authorship needs to be rewarded. But content farms and Wikipedia are absolutely drowning out any signal that might otherwise exist. Society requires a balance, which is why there are libraries in the first place -- we could just have bookstores, or private collections, or whatnot. But instead, we accept that some level of large scale access to quality content is a social good that ultimately enriches authorship because people have more knowledge to process.

Once, Google let you find anything on the Internet. Now, if it's not on the Internet, it's not found. We need to find a way to reward authors, but more importantly, we need a way to bring libraries and librarians online. The status quo is making us dumb.
posted by effugas at 1:57 AM on August 6, 2010 [6 favorites]


In a progressive society, the encouragement should always be for the creation of new content.

How are you going to be able to tell if content is new or not without checking the written record at some point? We don't always know what we know, you know..
posted by GeorgeBickham at 2:35 AM on August 6, 2010 [1 favorite]


Effugas: I can tell you personally I saw something on Wikipedia a while back that interested me; I went searching for more and all I could find was...

That very same Wikipedia data, republished over and over and over again.


Well, you could research it, and put the findings of your research online. Just firehosing information out of libraries onto the net and relying on search algorithms to make use of it isn't very elegant. A lot of bad and inaccurate books are published, and always have been; their noise-to-signal ratio probably isn't nearly as bad as that of the web, but there's still more than enough noise. The trouble is that search algorithms, immensely useful as they are, don't filter noise from the signal - that's what editors, researchers, curators and authors do. Even journalists, too, although most of them are on the side of the noise.
posted by WPW at 3:09 AM on August 6, 2010


Google still has a ways to go before it understands books as they actually are

That's where the scanners come in.
posted by ryanrs at 3:17 AM on August 6, 2010


How can I find that turkey probe in the catalogs? Thanksgiving is just months away, and I want to put in my ILL now.
posted by zippy at 3:23 AM on August 6, 2010 [1 favorite]


Google still has a ways to go before it understands books as they actually are

That's where the scanners come in.


Only if we are going to scan every single copy of every book in the world. Editions which may vary significantly include (but are by no means limited to) all early printed books of the handpress period (c.1450-c.1800). If not, we will need some a priori concept of what a book is in order to arrive at the desired granularity of description. It could be a singular object (maybe with marginalia and other interesting copy-specific information); a common setting of type; a common setting printed on fancy vellum or cheaper paper; a common setting of type issued with an entirely different title-page including new dates and an attribution of authorship; variant imprints; completely different illustrations; a number of books made up of mostly similar settings of type (hint: early printing is not like pressing Copy) or could demonstrate other of the common ways in which what constitutes 'a book' can be fuzzier and more interesting than you might think, not all of which can be represented by images. This isn't a matter of aesthetics, it's a matter of ontology.

Scanning every copy of every book (and then performing optical collation) would indeed be way cool: Google isn't going to do it, though. The Quartos site has some of this functionality - you can overlay transparent images of variant pages of Hamlet and see for yourself. I expect to see this tech more and more automated, and useful, quite soon. But once we have that raw optical difference, we will still want to describe it, which requires understanding.
posted by GeorgeBickham at 3:56 AM on August 6, 2010 [1 favorite]


As a point of reference, the Library of Congress has about 32,000,000 books.
posted by twoleftfeet at 4:24 AM on August 6, 2010


But the assumption that editions of Hamlet differ only in their paratexts (forewords, commentaries, etc.) suggest that Google still has a ways to go before it understands books as they actually are (some information on the history of the text(s) of Hamlet: these are major differences...

And some of the differences are really, really, really major, GeorgeBickham.
Basically, I'll use any excuse to quote The Skinhead Hamlet - by Richard Curtis - the "Four Weddings & a Funeral" guy...:)


ACT III SCENE II

(Gertrude's Bedchamber.)
(Enter HAMLET, to GERTRUDE.)

HAMLET: Oi! Slag!
GERTRUDE: Watch your fucking mouth, kid!
POLON: (From behind the curtain) Too right.
HAMLET: Who the fuck was that?
(He stabs POLONIUS through the arras.)
POLON: Fuck!
HAMLET: Fuck! I thought it was that other wanker.
(Exeunt.)
posted by Jody Tresidder at 4:43 AM on August 6, 2010 [5 favorites]


I'll wait for the movies.
posted by Halloween Jack at 4:52 AM on August 6, 2010


Haven't read the Skinhead Hamlet for a while - is that the Quarto or the Folio version, though :)

BTW, Richard Curtis would be more complimentarily named the Blackadder guy.
posted by GeorgeBickham at 4:56 AM on August 6, 2010


Well, you could research it, and put the findings of your research online. Just firehosing information out of libraries onto the net and relying on search algorithms to make use of it isn't very elegant. A lot of bad and inaccurate books are published, and always have been; their noise-to-signal ratio probably isn't nearly as bad as that of the web, but there's still more than enough noise. The trouble is that search algorithms, immensely useful as they are, don't filter noise from the signal - that's what editors, researchers, curators and authors do. Even journalists, too, although most of them are on the side of the noise.

The point is that the raw material is just...missing now. It's not that there's "good search" and "bad search". It's that there's no search, because the vast, vast majority of human knowledge can't be sought out and -- increasingly -- tomorrow's thinkers aren't even aware of what they're missing.

We need an environment where editors, researchers, curators, and authors can gather, exchange, discuss, and ponder information from the comfort of their own home. Or they simply won't, and we'll be dumber.

Google knows this.
posted by effugas at 5:05 AM on August 6, 2010 [1 favorite]


So little time...
posted by Devils Rancher at 5:17 AM on August 6, 2010


So about 3 TB for a compressed file with all the books ever written, without formatting though.

Can't quite fit on my ebook reader, yet. But my desktop can almost manage that.

Anyone have a torrent link?
posted by Greald at 5:17 AM on August 6, 2010 [3 favorites]


Counting only things that are printed and bound, we arrive at about 146 million.

Interesting, but this all still seems too narrowly defined somehow, too automated-record-bound. (And I say that as someone who used to do online cataloging for big libraries.)

But for example, in my collection, I probably have on the order of a hundred or so poetry collections published small press / DIY / chapbook, produced in runs of 50 to a couple hundred, that I would call real books (and so, I imagine, would the other folks that bought copies). Many or most of those have not made it into the hands of a librarian who has created an online record; they are circulated at poetry readings and sold from the counters of independent bookstores. Maybe a few hundred to a thousand independent / small press literature books are small potatoes to the Google books folks' grand total, but don't the boundary conditions test the success of a model / method?
posted by aught at 5:35 AM on August 6, 2010 [1 favorite]


So about 3 TB for a compressed file with all the books ever written, without formatting though.

Without formatting is damn right. There isn't enough data in the world to contain all the books ever written, or even the fraction that we still have. Paper and other supports, bindings, histories of ownership, marginal notes .. these things are all jumbled together, alongside whatever you want to call content, within the technology of the book. We still don't know enough about the book to be able to unpick all these strands.

Here are some of the folks who are trying to do so. They all do digitization, some of them in partnership with Google, which is what makes me optimistic that things may be heading in the right direction.
posted by GeorgeBickham at 5:36 AM on August 6, 2010


The point is that the raw material is just...missing now. It's not that there's "good search" and "bad search". It's that there's no search, because the vast, vast majority of human knowledge can't be sought out and -- increasingly -- tomorrow's thinkers aren't even aware of what they're missing.

We need an environment where editors, researchers, curators, and authors can gather, exchange, discuss, and ponder information from the comfort of their own home. Or they simply won't, and we'll be dumber.


It's not missing, it's just not online yet. But this really sounds like it's a problem with the thinkers, not with the net. If their spirit of inquiry drops dead as soon as they have to leave the house ... or they're not even aware of what might be gained from leaving the house ... that's not very impressive. In fact, that's downright dystopian.
posted by WPW at 5:37 AM on August 6, 2010


I'll wait for the movies.
posted by IndigoJones at 5:47 AM on August 6, 2010 [1 favorite]


Fact of the matter is, Google didn't scan all those books to help authors or to satisfy librarians. They scanned all those books to do statistical analysis on the text corpus, and to train their OCR engines and translation engines on the highest-quality material available. All the discussions surrounding cataloging issues, authors' rights, and copyright are a sideshow. Google's non-display uses will dominate. Computers will be the most voracious "readers" going forward.
posted by fake at 5:50 AM on August 6, 2010 [2 favorites]


There is truly nothing more important, for the promulgation of information, than unlocking the vast amount of human knowledge in published books. We need the depth of libraries back.

Wikipedia compares "OK" with Encyclopedia Brittanica. That's nice. We need the rest.
posted by effugas at 2:57 AM on August 6


There are two things more important: first, that people read the books; and second, that they understand what they have read.
posted by Pastabagel at 5:57 AM on August 6, 2010


If their spirit of inquiry drops dead as soon as they have to leave the house ... or they're not even aware of what might be gained from leaving the house ... that's not very impressive. In fact, that's downright dystopian.

Put on your Hazmat suit and start taking your soma, because many students are already there.
posted by rory at 6:22 AM on August 6, 2010 [1 favorite]


Counting only things that are printed and bound

It's amazing how much a discussion of literature can sound like a discussion of an underground sex club.

Counting only things that are printed and bound, we arrive at Fifi, with her collection of tattoos and nylon rope.
posted by Astro Zombie at 6:40 AM on August 6, 2010


So why aren't they scanning serials?
posted by jb at 6:41 AM on August 6, 2010 [1 favorite]


So why aren't they scanning serials?

They get crumbs all over the scanner, ba-dum shish.
posted by edbles at 6:45 AM on August 6, 2010 [2 favorites]


Our handling of serials is still imperfect. Serials cataloging practices vary widely across institutions. The volume descriptions are free-form and are often entered as an afterthought. For example, “volume 325, number 6”, “no. 325 sec. 6”, and “V325NO6” all describe the same bound volume. The same can be said for the vast holdings of the government documents in US libraries. At the moment we estimate that we know of 16 million bound serial and government document volumes. This number is likely to rise as our disambiguating algorithms become smarter.

They're excluding them from the current total count, because they can't reliably differentiate unique instances, yet.
posted by edbles at 6:46 AM on August 6, 2010


They get crumbs all over the scanner, ba-dum shish.

I have no idea how these people got their cereals wedged in their scanners, or why.
posted by Devils Rancher at 6:52 AM on August 6, 2010


Is 129,864,880 books total, or unique titles?

LibraryThing has cataloged 54 million books, of which about 5.5 million are unique.

If 129 is unique titles, only a small universe of those titles does anyone actually own and read. Since about 2005, over 1 million LibraryThing users, who represent a sizable fraction of serious readers and book owners, have cataloged "only" 5.5 million unique titles. These 5.5 are the books anyone is most likely to own/read. (note: this includes many library holdings, not just personal accounts)

The number of potential books is vast (129+ million) but the number of books to realistically read is much much smaller. So I get anxious about reading 5 to 10 million books, not 129 million.
posted by stbalbach at 6:59 AM on August 6, 2010 [1 favorite]


thing is, variations in editions that don't constitute significant changes to the text aren't really of interest to the majority of readers, including historians, unless they are historians of books. Most of us are just happy that google has made one not very clear, low detail scan of books which we might otherwise be unable to read in any edition in the city we live in.

I'm perfectly happy to have google scam each text in just one version. If I wanted to do research on variations or marginalia, I would visit the rare book libraries to view the originals. But Google isn't doing this to reproduce every volume -- they are creating an online edition of a given text.

Actually, they probably should just put up one text of Hamlet -- well annotated as to the different versions of the play itself -- and then put up all intros, commentary, etc, as their own works. Partly because those are the only parts in copyright, but also because, if you are interested in the commentary, that would actually be easier to find (give it a keyword "Hamlet commentary").

but that would require more thinking than just hoovering up all the volumes. which is, nonetheless a useful thing -- I find references by googling, and now have my own PDF of an obscure and not very well written company history from 1830, and I can drink coffee while skimming through it at home, rather than wasting archive time.
posted by jb at 7:03 AM on August 6, 2010


: Without formatting is damn right. There isn't enough data in the world to contain all the books ever written, or even the fraction that we still have. Paper and other supports, bindings, histories of ownership, marginal notes .. these things are all jumbled together, alongside whatever you want to call content, within the technology of the book. We still don't know enough about the book to be able to unpick all these strands.

Please excuse me, but that really sounds like a lot of romanticism for an old form. When I read a book, I appreciate the binding work and margin space, but it's the text I'm after.

As a point of reference - the same book may be published in hard cover, soft cover or in an anthology: do these reformatings change the meaning significantly? The whole argument is really moot - Google books presents the pages as-scanned, with a searchable text layer superimposed.
posted by Popular Ethics at 7:35 AM on August 6, 2010


If [129,864,880] is unique titles, only a small universe of those titles does anyone actually own and read.

Natch.

Look at the Pulitzer Prize winners for Fiction from the 1950s and earlier. In most cases, it has been decades since they've been of interest to anyone but book collectors specializing in Pulitzer winners. And those were supposed to be among the most important novels of their year. Behind each one were thousands of potboilers as disposable as a Harlequin series romance.
posted by Joe Beese at 8:07 AM on August 6, 2010 [1 favorite]


And I own .03% and climbing. That warms my heart.
posted by FormlessOne at 8:33 AM on August 6, 2010 [1 favorite]


Wake me up when Google's metadata doesn't suck. As things stand, they're basically guessing and making it look like a ridiculously precise number.
posted by languagehat at 8:44 AM on August 6, 2010


And I own .03% and climbing.

Whoa. You own nearly 40000 books!
posted by bumpkin at 8:46 AM on August 6, 2010


: Without formatting is damn right. There isn't enough data in the world to contain all the books ever written, or even the fraction that we still have. Paper and other supports, bindings, histories of ownership, marginal notes .. these things are all jumbled together, alongside whatever you want to call content, within the technology of the book. We still don't know enough about the book to be able to unpick all these strands.

Please excuse me, but that really sounds like a lot of romanticism for an old form. When I read a book, I appreciate the binding work and margin space, but it's the text I'm after.


I don't personally appreciate the bindings (but I know some do, a lot). Margins may or may not be meaningful: since paper was by far the biggest proportion of the production costs of a book until relatively recently (last 150 years or so - too tired to look it up right now) having a wide margin is a very visible sign of the book's cost and cultural status. Same goes for the size of the page and the quality of the paper. There's no romanticism in those data points, nor in the other criteria I mentioned, although there certainly could be if you wanted to see it like that.

I'm also after the text, most of the time. I'm also a huge fan of digital editions and particularly appreciate their accessibility and searchability: they are transforming the nature of the written record. But part of their value is, ironically, in making you aware of what gets left behind when you make a copy of a work and what the current state of the art is in doing so.
posted by GeorgeBickham at 8:49 AM on August 6, 2010


Since about 2005, over 1 million LibraryThing users, who represent a sizable fraction of serious readers and book owners, have cataloged "only" 5.5 million unique titles.

I suspect that part of the issue may be that LibraryThing makes it easiest to enter books with ISBN numbers. I've thought about putting my collection into LibraryThing, and the 90% that have ISBNs would be easy; the 5 or 10% (probably less) that lack them would be really obnoxious. But it's probably the non-ISBN books that would be unique; that's definitely where the "long tail" is.

Although you'd have to worry about system-gaming,* if your goal was to build up a huge catalog, you could probably provide some minor incentive to users of a LibraryThing-type service to offset the PITAness of putting in books that aren't already in the system, and that long tail would open up dramatically.

* I.e., if the incentive was too big, you'd encourage people just to make up nonexistent books.
posted by Kadin2048 at 8:50 AM on August 6, 2010


> the 5 or 10% (probably less) that lack them would be really obnoxious.

Why? You just enter two or three keywords and the search function almost always finds it. I've had very few manual entries, and I've entered over 5,000 books.
posted by languagehat at 8:56 AM on August 6, 2010


There isn't enough data in the world to contain all the books ever written, or even the fraction that we still have. Paper and other supports, bindings, histories of ownership, marginal notes .. these things are all jumbled together, alongside whatever you want to call content, within the technology of the book. We still don't know enough about the book to be able to unpick all these strands.

This is overstated. The map is not the territory, except that in most cases it is. In other cases we have libraries. Even with Shakespeare, unless one's work turns on the differences between specific editions, one is usually allowed to choose one and stick with it. Certainly no one cares about (or pays any attention to) marginalia in individual reader's copies of Shakespeare unless they are interested in the specific reader.
posted by OmieWise at 9:17 AM on August 6, 2010 [1 favorite]


Well, in absolute terms it's not bad, but relative to recently-printed books that have ISBN/EAN barcodes, it's probably several orders of magnitude slower. I guess it's more the barcodes that make the difference rather than the ISBN itself; with a CueCat or other scanner you can input a large bookshelf in a few minutes. (Although if you can touch-type on a number pad I bet you could probably input ISBNs pretty quickly.)

So it's the difference between a manual one-book-at-a-time process and something you can do in batches of a few hundred books at a time.

But maybe it's not as much of a deterrent as I was thinking it was to typical users.
posted by Kadin2048 at 9:20 AM on August 6, 2010


Having just spent all summer trying, among other things, to computationally disambiguate permutations of "multiple people with the same name" from "one person who appears in the database multiple times (ie, changed jobs, etc)" in a 125,000 entry database, I feel their pain on a three-orders-of-magnitude reduced scale.

This is pretty awesome.
posted by Alterscape at 9:36 AM on August 6, 2010 [1 favorite]


It's not missing, it's just not online yet. But this really sounds like it's a problem with the thinkers, not with the net. If their spirit of inquiry drops dead as soon as they have to leave the house ... or they're not even aware of what might be gained from leaving the house ... that's not very impressive. In fact, that's downright dystopian.

*shrugs*

Let me put it another way: What if the copyright system banned open libraries? It could -- and believe me, as libraries have attempted to support multimedia collections (remember that word? It was a rather big deal in the 90's) they've really had to assert their right to operate.

The copyright system at present absolutely does ban Google from properly hosting Google Books, at any price. That's a thing that has to change. Because leaving the house is in fact, a pretty big deal. You're being remarkably first world, and even big city, thinking that's necessarily going to be an option.

A kid living in the Ozarks should have access to more than just Wikipedia. (No offense to Ozarkian librarians.)
posted by effugas at 9:43 AM on August 6, 2010


And I own .03% and climbing. That warms my heart.

Hmm... not totally implausible. If my math is correct, you'd have to buy 302 books a year to keep up.
posted by Jahaza at 9:43 AM on August 6, 2010


As a point of reference, the Library of Congress has about 32,000,000 books.

This catalog is arguably of better quality than Google's catalog, and is therefore more significant (for the time being...)
posted by KokuRyu at 10:08 AM on August 6, 2010


languagehat: "As things stand, they're basically guessing and making it look like a ridiculously precise number."

Also: the Whopper doesn't look as appetizing in the wrapper as it does on television.
posted by Joe Beese at 10:15 AM on August 6, 2010


Effugas, the thing about library (and used-book stores) is that they have the (I would say full and unstinting) support of the people actually making the content. Show me an author who wants to close libraries and shut down used-book stores and I will show you a lonely crank. Making all books available in full online for free at the moment of publication does not enjoy that support. And that kid in the Ozarks, if he has access to the internet, has access to a hell of a lot more than just Wikipedia, he just doesn't have access to absolutely everything.

Also, I made a reasonable point in a civil fashion, so please don't accuse me of cultural prejudice. You need a computer to exercise the rights you're calling for, so it's a two-way street.
posted by WPW at 10:39 AM on August 6, 2010


Am I the only one that thinks the 128 million number seems low?
posted by caphector at 10:40 AM on August 6, 2010


The map is not the territory, except that in most cases it is. In other cases we have libraries.

I really don't disagree with this at all. We may not agree which cases are which, though. YMMV.
posted by GeorgeBickham at 10:48 AM on August 6, 2010


> Am I the only one that thinks the 128 million number seems low?

You might want to check out the very first comment in the thread.
posted by languagehat at 10:59 AM on August 6, 2010


> Wow, I am surprised this is so low.

It's a bit of a moving target. "C'est fini," dit-il, et la plume tombe de sa main inanimée
posted by jfuller at 12:18 PM on August 6, 2010


>the 128 million number seems low?

The number is confusing. It appears, from reading the article, that it accounts for unique works, but which includes multiple editions of the same work. So in terms of unique works, the number is in the ballpark, on the high side even, of what I have heard other estimates in the past. In terms of total books in the world, it is obviously not even close, there are probably 10s of billions. Using LibraryThing as a rough guide, there is about a 10:1 ratio between unique works and number of books, so just add a zero to 129 million and its 1.2 billion, but that's got to be a low estimate considering the average print run is a lot more than 10, post-1880 or so, when the book industry really took off with cheaper printing technology.
posted by stbalbach at 12:41 PM on August 6, 2010


Effugas, the thing about library (and used-book stores) is that they have the (I would say full and unstinting) support of the people actually making the content. Show me an author who wants to close libraries and shut down used-book stores and I will show you a lonely crank.

First of all, there are plenty of authors who are lonely cranks.

Second, and more importantly, of course authors are comfortable with libraries and used book stores. They helped them.

This newfangled Internet, what has it ever done but ruin content?

Making all books available in full online for free at the moment of publication does not enjoy that support.

Believe me, the media industry isn't exactly in love with libraries handing out DVDs. Good thing it's not all about the media industry.

And that kid in the Ozarks, if he has access to the internet, has access to a hell of a lot more than just Wikipedia, he just doesn't have access to absolutely everything.

No, that's what you're missing. What he has access to is crap. Tiny little soundbites pretending to be knowledge. It's a paucity, a sham, a drought of data.

Also, I made a reasonable point in a civil fashion, so please don't accuse me of cultural prejudice. You need a computer to exercise the rights you're calling for, so it's a two-way street.

I wasn't intending to insult you. I was expressing the very real fact that real differences in information access exist in the world. The digital divide is bad enough without copyright law creating a fundamental have/have not system. Clearly there is some tolerance for free access to data, or libraries couldn't exist at all. Figuring out a way to port that worldwide, online, in a way that still rewards content collation and generation is truly one of the great challenges of our time.

Your attitude was really, oh, just go outside and hit the library. I'm saying that's just not always an option. This is factually true.
posted by effugas at 4:46 PM on August 6, 2010


This newfangled Internet, what has it ever done but ruin content?

I've said nothing that's against the internet. You go on to say that the internet is "crap ... Tiny little soundbites pretending to be knowledge. It's a paucity, a sham, a drought of data." The internet that has helped authors to date is one where new books are not given away free, and it's helping authors a phenomenal amount despite being the desert you describe.

Believe me, the media industry isn't exactly in love with libraries handing out DVDs. Good thing it's not all about the media industry.

I'm not talking about the media industry objecting to libraries handing out DVDs, I'm talking very specifically about authors objecting to Google making their new book available online, in full, for free. Those authors have a right to be wary. The copyright system at the moment clearly isn't working, and is wide open to abuse - stopping libraries lend DVDs is as stupid as authors trying to stop libraries lending their books. But authors are right to want to defend their rights here. Google is also part of the "media industry", why shouldn't authors be wary before a vast corporation has its way with the basis of their livelihood?

No, that's what you're missing. What he has access to is crap. Tiny little soundbites pretending to be knowledge. It's a paucity, a sham, a drought of data.

Do you really believe this? The whole internet? It's a bit better than that.

I wasn't intending to insult you. I was expressing the very real fact that real differences in information access exist in the world. The digital divide is bad enough without copyright law creating a fundamental have/have not system. Clearly there is some tolerance for free access to data, or libraries couldn't exist at all. Figuring out a way to port that worldwide, online, in a way that still rewards content collation and generation is truly one of the great challenges of our time.

Fixing copyright law won't fix the digital divide. But otherwise I agree completely - it is a great challenge of our time. I think you'd find many authors who would agree, and who want to find a way to expand free access to knowledge without destroying their livelihoods. Overall, we're probably a lot closer to agreement than you imagine.
posted by WPW at 6:07 AM on August 7, 2010


> What he has access to is crap. Tiny little soundbites pretending to be knowledge. It's a paucity, a sham, a drought of data.

What you're hearing there is the sound of someone who's followed his own argument right off the edge of a cliff. When you find yourself saying ridiculous things like that, it's time to reevaluate your debating tactics.

1) The internet has a lot of crap and a lot of valuable information.

2) Libraries have a lot of crap and a lot of valuable information.

Now, take it from there. (Hint: The task is to distinguish.)
posted by languagehat at 7:52 AM on August 7, 2010 [1 favorite]


languagehat, WPW,

OK. So, I've been doing a lot of coding lately. Problem: I'm not actually a native coder. It's something I get done through sheer force of will.

I've gone through a lot of forums. I've read some hideous documentation. I can't even describe the hideous mail archive aggregation sites I've used, just to get a clue about a particular OpenSSL API.

A single book, an actual text, an actual tome of knowledge, would have been an oasis in the desert. But I've made to with what I could find online.

The reality is that a single book on World War 2 contains more context, more depth, more data, more information, more knowledge than Wikipedia does. Wikipedia offers valuable filtering, but you know what? A book purporting to filter the events of Wikipedia itself beats the crap out of it. And lets not kid ourselves, Wikipedia is the best the web has to offer right now. It's very cool. It's very neat that it worked.

But it's like having a library that only has the full set of the Encyclopedia Brittanica. You wouldn't consider such a library impoverished or incomplete?

We as a society can do better. I'm not saying first run books should show up on Google, for free, either. But I think there's a deep denial of the low quality of Internet research. I really wish some teachers would have shown up on this thread. Go research something that isn't code related -- something historical, something obscurely political, something having to do with music or dance. Realize there are tremendous books on the subject -- and fairly little online.
posted by effugas at 4:41 PM on August 7, 2010


effugas, you seem to be confusing the internet with Wikipedia. While Wikipedia is a valuable resource, it obviously must be used with extreme caution. The same is not true of, say, the Perseus Library, or the Russian Online Library, or the many, many scholarly books available through Google Books or ventures like the UC Press E-Books Collection. It's fine to complain about what's not available, I do it frequently myself, but you just look silly when you claim that the internet as a whole is a vast wasteland.

By the way, what does WPW mean? The Googles, they do nothing!
posted by languagehat at 5:15 PM on August 7, 2010


By the way, what does WPW mean? The Googles, they do nothing!

WPW is the name of someone participating in this thread.
posted by andoatnp at 5:49 PM on August 7, 2010


Hello languagehat.
posted by WPW at 5:21 AM on August 8, 2010


languagehat--

ahem

Let me try to rephrase the question: Of what's on the Internet, of what's actually answering people's questions, how much would you actually allocate space in a library to?

There is a quantitative and qualitative difference in data quality going on.
posted by effugas at 9:44 AM on August 8, 2010


And, languagehat, are you sure we disagree? My point is that the Internet needs books. You just pointed to a number of sources of books on the Internet.

I think that's great. I think we need more of that -- a lot more.
posted by effugas at 10:09 AM on August 8, 2010


> WPW is the name of someone participating in this thread.

> Hello languagehat.
posted by WPW


D'oh!
*slaps self*

> And, languagehat, are you sure we disagree? My point is that the Internet needs books. You just pointed to a number of sources of books on the Internet.

I think that's great. I think we need more of that -- a lot more.


Well, now I'm not at all sure we disagree. Dammit, how am I going to get my daily dose of vitriol?
posted by languagehat at 2:10 PM on August 8, 2010


languagehat--

:)
posted by effugas at 6:25 PM on August 8, 2010


« Older And So It Goes...   |   The Music of Jacques Brel Newer »


This thread has been archived and is closed to new comments