Plagiarists Beware!
November 3, 2005 11:18 AM Subscribe

Google Print debuts today. Working with the University of Michigan, Harvard University, Stanford University, The New York Public Library, and Oxford University, Google has scanned and made searchable at least ten thousand books, with many more to follow. NY Times story here. Meanwhile, certain politicians are trying to "reign in Google" and stop the experiment before it begins.
posted by LarryC (58 comments total)

Previously discussed here and elsewhere, but now that it is live it seemed worthy of a new post. I am eager to hear what our Mefi librarian contingent thinks of it.
posted by LarryC at 11:21 AM on November 3, 2005

What I don't understand is why they put on, for example, Tom Sawyer. Surely the benefits of this project is to make the rare and obscure easily available?

PR again, I suppose.
posted by IndigoJones at 11:28 AM on November 3, 2005

posted by brownpau at 11:30 AM on November 3, 2005

Google Print ... I'm not usually such a big fan of Google, but ... my G-d....

Politicians: GRRRR!
posted by jefgodesky at 11:30 AM on November 3, 2005

Cool. I typed in the name of an imaginary town, and found a single book that devotes a single paragraph to an actual (no longer existing) town of that name.

I must admit: I can't think of any other way to get that piece of information, and I got it in a few seconds.

Ultimately I don't know what the future of this will be, but I for one wish I lived in a society where this type of knowledge sharing was embraced by all.
posted by davejay at 11:32 AM on November 3, 2005

Interesting. It does, indeed, seem to have the entire texts of recent books, but it only lets you easily read five pages at a time. So could you "steal" a whole book through Google Print? Theoretically, yes, but by the time you'd finished Amazon would have shipped the damn thing to you. It seems to work just fine for researchers and casual browsers, but nobody's going to read whole books on it. "Certain politicians" seem to have already gotten what they're clamoring for, and therefore need to shut up now.
posted by Faint of Butt at 11:34 AM on November 3, 2005

I did a search for Victor Moscoso, who certainly won't appear in anything old enough to be out of copyright. I found plenty of books mentioning him, but the real interesting one was: "Reign of George Third, 1760-1815". How a 1960s graphic designer would be mentioned in that book is a puzzler. I check it, and it appears to be a typography book filed under the wrong book title.

Beta software is always fun.

It doesn't let you read too much of a book, works an awful look like the Amazon in book search from a couple years ago. It's fun for discovering stuff, but they're clearly not limiting themselves to out of copyright info.
posted by inthe80s at 11:34 AM on November 3, 2005

Hmmm...the news stories write about works in the public domain, but pulling a few recent academic books off my shelves (published in the last ten years and definitely under copyright) and typing in a sentence gets a hit on every one. The results show a facsimile of the page of the book where the quote occurs, and I can also two pages before and after the quote, though no more than that. This could actually be useful for research.
posted by LarryC at 11:35 AM on November 3, 2005

This is utterly amazing and unprecedented. I work, indirectly, in a part-time capacity for one of the publishers that is suing google. I think it's time to end that relationship.

references to metafilter

Joel Garreau, I believe, noted that Google is doing this not so people can read these books but so A.I. can.
posted by craniac at 11:39 AM on November 3, 2005

I noticed yesterday (or the day before?) that there suddenly seemed to be more results, including snippets, and strange-looking, old periodicals.

For their part, Google could explain the project a little better. These pages don't really distinguish very well between the Library Project and Google Print, and it would be nice to see updates on how far along in scanning the collections they are, instead of what an awesome public service they're performing.

That said, More books = Yay!
posted by steef at 11:39 AM on November 3, 2005

While you won't be reading an entire book with this by any stretch, you will find that there are certain cases where you could easily find all the info you'd ever want from a particular book via the program.

For instance, I searched for "Duran Duran" and hit the Complete Book of British Charts (excellent book, have it on my bookshelf already). A couple of pages are protected in the search results, but for the most part, you'd see the info you want on just one page anyways (the chart positions various songs reached). It looks like the block the actual chart pages, but not the indexes with the ranks in them.
posted by inthe80s at 11:41 AM on November 3, 2005

*Very useful for research.
*Cool fingers at front and back of scans are always neat.
posted by the giant pill at 11:44 AM on November 3, 2005

Hack to extract entire books in 5, 4, 3...
posted by selfnoise at 11:45 AM on November 3, 2005

I've typed in some rather obscure names and gotten amazing numbers of results, in a variety of languages. So, I am willing to bet that they have far more than 10,000 books in there.
posted by beagle at 11:46 AM on November 3, 2005

I've been using Project Gutengerg for this by finding the book I want and searching the page for a certain word using my browser's search function. Google seems to have a different selection of works, which is nice, along with the ability to search critical editions of texts.

The fact that you can only view about three consecutive pages is a bit annoying though.
posted by benightedly_heedful at 11:57 AM on November 3, 2005

The publishers groups trying to ban this seem, in my mind, almost completly insane. Many of the authors they "represent" arn't happy at all with their flipping out.

Most likely, they just want royaltees from google on the books that they show.
posted by delmoi at 11:57 AM on November 3, 2005

As a librarian, I, for one, welcome our new Google overlords.

Actually I do, I'm thrilled that Google is taking on print. Library catalogues are catastrophic failures. I am looking forward to seeing how students use Google Print.
posted by Hildegarde at 11:58 AM on November 3, 2005

it would be cool if google allowed you to 'buy' rights to a book via you gmail account for a couple bucks, with the money going to the publisher.
posted by delmoi at 11:58 AM on November 3, 2005

Sure, one day Google will morph into a force for evil that will have to be destroyed at a terrible cost, but Google Print is just about the best thing since agriculture. I love the idea that the entire library of human knowledge will one day be something that we can just have handy to beam into space upon reciept of our first radio message from an advanced civilization.
posted by slatternus at 11:59 AM on November 3, 2005

Holy shit. I am torn between "this is an awesome research tool" and "where are my fucking royalties, Google".

The obscure hits it produces are amazing. Yes, either way more than 10k books in there or else the scanners started with thee letter "a".
posted by Rumple at 11:59 AM on November 3, 2005

I can't believe Wil Wheaton slagged off Metafilter in that book nobody read. This is USEFUL!
posted by fire&wings at 11:59 AM on November 3, 2005

Glad to see that the Washington Times article does not mispell "rein in". Nor do they give things "free reign". Horses do not rule.
posted by Aknaton at 12:01 PM on November 3, 2005

Ok, I don't understand how this is stealing. They are copying books from a LIBRARY and the last time I checked, it's free to read books you get in the library. How, exactly, is this stealing?
posted by aacheson at 12:02 PM on November 3, 2005

This is great. A hack to couple this with OCLC's slightly bass-ackwards 'Find in a Library' feature, to let you enter your zipcode and see local libraries that have your hits in their collections, would be a fantastic addition.

If it grows, it could kill the library catalog interface (not to mention OCLC) almost dead, which would be a very, very good thing. ILS vendors would need to offer google-compatible APIs instead of the craptacular, welded-shut, glommed together shitbags that are currently sold as web catalog engines.
posted by ulotrichous at 12:04 PM on November 3, 2005

I'm hoping commercial vendors will be just plowed over by this and google scholar, frankly. Now, if we could merge google print and google scholar into some google university edition, reflecting the contents of your particular university library system, and then add in the ILL links for books that you can't get locally, why, then life would be truly good and just.
posted by Hildegarde at 12:06 PM on November 3, 2005

selfnoise: "Hack to extract entire books in 5, 4, 3..."

2, 1, 0, blastoff!
(8 months old, so possibly no longer working)
posted by Plutor at 12:11 PM on November 3, 2005

Sort of disappointing. I was expecting something more like Project Gutenburg, which is about a million times better.

But I guess it's still pretty cool.

So could you "steal" a whole book through Google Print? Theoretically, yes,

Unless they have already thought of some clever way to make it difficult, I give it about three days before there's a program around to automatically grab whole books from it.
posted by sfenders at 12:12 PM on November 3, 2005

Cool. Who precisely is doing all this scanning, anyway? I hope there isn't a sweatshop somewhere with some scrawny eight year-old girl heaving massive tomes onto a flatbed scanner....
posted by brundlefly at 12:15 PM on November 3, 2005

IndigoJones writes "Surely the benefits of this project is to make the rare and obscure easily available? "

That's certainly not the only benefit. It's an awesome tool for finding passages in works that you already know. Especially if it's been a long time since you've read them. You only need to remember a snippet of words or a character's name and zoom, there you are. For instance, if you want to find the central passage of Huck Finn:

All right, then, I'll go to hell
posted by mr_roboto at 12:18 PM on November 3, 2005

Playing with it some more, I still just can't get my brain to accept that they've gone to the trouble of converting all these books to electronic text, but then they give you only *images* of the text. Presumably it's for some arcane legal reason, but I can't figure out what exactly could it be?
posted by sfenders at 12:21 PM on November 3, 2005

Aknaton: Du'oh! Thanks for the correction. I will never forget that horses do not rule.
posted by LarryC at 12:24 PM on November 3, 2005

Can anyone find some actual copyright-free results? For instance, mr_roboto's link to Huck Finn is copyrighted, as is every other copy of Huck Finn.
posted by smackfu at 12:29 PM on November 3, 2005

Searching Amazon Inside-the-Book for my name produces several hits. Google Print produces none. Therefore, I prefer Amazon.

Like amazon search inside the book, I do find this useful for research, though. And besides finding sources, and finding people who cite sources I already use, I like these search inside the book type features because I can use them to search books that I own. Once I know what page I read it on, I can just go pick up my copy of the book and read it there.

And woe be to students who still plagiarize out of books instead of off the internet.

I still just can't get my brain to accept that they've gone to the trouble of converting all these books to electronic text, but then they give you only *images* of the text.

I presume it's so you can't copy and paste.
posted by duck at 12:31 PM on November 3, 2005

A question related to smackfu's: What's the copyright status of recent editions of public domain texts? For instance, the copyright on Dickens' A Christmas Carol has certainly expired, but the text shown for this edition still has copyright notices on every page. Do the publishers of the recent edition gain some sort of copyright upon publishing a public domain text? What aspects of the publication, exactly, does this copyright cover?
posted by mr_roboto at 12:37 PM on November 3, 2005

A publisher may not own copyright of the original text, but they retain copyright on their version of that text. Their typeface and layout, for instance.
posted by Hildegarde at 12:43 PM on November 3, 2005

I presume it's so you can't copy and paste.

You mean select and copy as text, I guess. These days, you can copy and paste images just fine. They aren't any "less copyrighted" than plain text.
posted by sfenders at 12:48 PM on November 3, 2005

I believe a re-publisher gets some kind of copyright on the new physical embodiment of the material -- you can put the text of A Christmas Carol up on the web, but you can't just photocopy a new edition (or, presumably, copy their editing decisions). You can put up early Louis Armstrong 78s as mp3s for people to download, but you can't copy a CD reissue of the same. However, IANACL.
posted by uosuaq at 12:48 PM on November 3, 2005

I'm very disappointed. As for PD text, it's no where near as useful as Internet Archive which provides the full PDF/TIFF source of the entire book for download, which can then be uploaded to Kinkos or LuLu and a printed copy made for $15 (400 page softback including shipping from LuLu) -- for out of print, rare, public domain books, it is crucial. Who is going to read an entire novel of online scanned pages? Sounds good, but no one does that (that I know).

As for the copyrighted books, a9.com and Amazon search inside have been providing this for years. Why no lawsuits against Amazon?

It's really disapointing Google Print is not proving the full downloadable source of the Public Domain works. So close, yet so far.
posted by stbalbach at 12:49 PM on November 3, 2005

"I presume it's so you can't copy and paste."

ah, wait... I bet you can't trivially copy them in MSIE. That's probably it. In Firefox you can just go to "page info" to copy the image, but they "save image" doesn't work thanks to some javascript kludge. It's all rather stupid. There is no way this makes any difference to copyright law.
posted by sfenders at 1:05 PM on November 3, 2005

Well, I just tried to go there, and I got a 404 error - and I know that Google isn't blocked by my work firewall.

This is the error: "Not Found
The requested URL /sorry/?continue=http://print.google.com/ was not found on this server. " I really hope that this is just a momentary glitch, and not the service being taken down.
posted by spinifex23 at 1:18 PM on November 3, 2005

Joel Garreau, I believe, noted that Google is doing this not so people can read these books but so A.I. can.

Johnny 5 needs more input.
posted by NickDouglas at 1:21 PM on November 3, 2005

Welp, momentary glitch. I'm in now.

Let the book hunting begin!
posted by spinifex23 at 1:46 PM on November 3, 2005

Looks like the people (me included) who want to see library catalogue integration may get there way. The help page refers to a "Find this in a library" feature, though it doesn't look like that's implemented yet.
posted by pasd at 2:07 PM on November 3, 2005

Yeah I'm with stbalbach, I'm really waiting for the Open Content Alliance to start putting out some sweet stuff via the Open Library project. Google's contracts with the libraries had some shady "you can't share this digital copy that we give you with your competitors" language that made me feel a bit ooky in its broadness and vagueness, but yeah it's a neat tool.
posted by jessamyn at 2:08 PM on November 3, 2005

"Interesting. It does, indeed, seem to have the entire texts of recent books, but it only lets you easily read five pages at a time. So could you "steal" a whole book through Google Print? Theoretically, yes, but by the time you'd finished Amazon would have shipped the damn thing to you. It seems to work just fine for researchers and casual browsers, but nobody's going to read whole books on it. "Certain politicians" seem to have already gotten what they're clamoring for, and therefore need to shut up now."

At least not until two weeks from now when some coder comes out of his dorm room long enough to upload BOOKSTER 1.0a via bittorent... ;)
posted by stenseng at 2:26 PM on November 3, 2005

I wonder if print.google.com is mentioned in this book... =)
posted by joquarky at 2:36 PM on November 3, 2005

Whoa, the dude who wrote Planet Simpsons is a Mefite?
posted by Quartermass at 2:49 PM on November 3, 2005

I imagine google could not have gotten this off the ground at all if fully d/loadable books were part of the deal. Also, they are making the information available which is in accord with their mission statement. Did amazon try to do deals with the big institutions to fund their digitzing of their holdings? I understand your misgivings jessamyn but google are the ones funding the digitizing of Oxford, for instance; although it bodes strangely for the future, it must commercially entitle them to some non-circulation agreement with respect to the resulting data.

I googlethink that this is a good googleidea and am happy to googleplay around and googleparasitize that which my googlemasters have magnanimously googleallowed.

Next, instead of seeing michelangelo's hand of man separated from hand of God, you'll go look inside a book and the google logo will have joined them.
posted by peacay at 3:01 PM on November 3, 2005

I imagine goggle could not have gotten this off the ground at all if fully d/loadable books were part of the deal.

Why not? There are many Public Domain books. Is there some reason Google has their corporate logo on every page and wont let you download and print it? Evil.

Luckily there are other initiatives doing the same thing that are more enlightened.
posted by stbalbach at 3:32 PM on November 3, 2005

Fortunately, now that metafilter has thee metafilteruser flickr tag, we can look up each other in addition to using Google Print!
posted by craniac at 3:47 PM on November 3, 2005

Even if someone produces a hack, you still won't be able to download entire books. I've run across a couple of pages already that are purposely masked (even to logged-in readers) so that there will always be missing chunks. One of these masked pages was the final page of an argument I was trying to follow. Very annoying. I wonder what percentage of pages are masked? Are the masked pages random?
posted by painquale at 4:05 PM on November 3, 2005

I still just can't get my brain to accept that they've gone to the trouble of converting all these books to electronic text, but then they give you only *images* of the text. Presumably it's for some arcane legal reason, but I can't figure out what exactly could it be?

One of the possible reasons is that their OCR is, unsurprisingly, not perfect. Giving you an image of the page will gloss over the fact that their text version may well be reading "arid lie saicl" instead of "and he said" (they've probably run a spellcheck to catch the 'saicl', actually). ... yep, over 600 results for "arid lie" - "He is also the author of Stray Cats arid Tue Loop. arid lie wrote the book for
the musical Mayor. He has written for such publications as Roiling Stone." They are no doubt developing the best OCR and automagical proofreading software ever, but they're not there yet.
posted by Lebannen at 4:44 PM on November 3, 2005

(Nobody's perfect. this is still a very cool thing. and that page I used in my example above featured weird contrast and italics, which are trickier for OCR, but that doesn't stop me being amused by "Roiling Stone".
posted by Lebannen at 5:05 PM on November 3, 2005

Sounds like another Google idea that is both smart and unsettling.
posted by Dean Keaton at 5:59 PM on November 3, 2005

I can't believe Wil Wheaton slagged off Metafilter in that book nobody read

Where? I can't find this and I want to, for some reason. Damn that Wheaton and his crazy antics. It's irritatifying.
posted by Sparx at 6:03 PM on November 3, 2005

Score! I've been thanked in two books I didn't even know about.
posted by nev at 6:36 PM on November 3, 2005

A rather timely press release for Amazon Pages, and Amazon Upgrade (buy the book, get full access to the online text).
posted by steef at 9:21 AM on November 4, 2005

And Microsoft/British Library release a statement.
Gawd. The future may be strange.
posted by peacay at 12:52 AM on November 5, 2005

« Older Politics and fat | Patenting a Plot Newer »

This thread has been archived and is closed to new comments

MetaFilter

Plagiarists Beware!
November 3, 2005 11:18 AM Subscribe

Tags

Share

Plagiarists Beware! November 3, 2005 11:18 AM Subscribe

Tags

Share

Plagiarists Beware!
November 3, 2005 11:18 AM Subscribe