Amazon implements searching for words in books
October 23, 2003 6:24 AM   Subscribe

Starting today, every word (33 million) in ALL the books (270,000) sold at Amazon.com can now be searched word for word. File this under technologies used to implement more sales and better service to the end user aka marketing at work for you.
posted by omidius (95 comments total)

 
[this is goo . . . hey, the link doesn't work!
posted by cogat at 6:31 AM on October 23, 2003


The link doesn't work.
posted by LukeyBoy at 6:32 AM on October 23, 2003


A link that will work.
posted by Prospero at 6:33 AM on October 23, 2003


Just go to www.amazon.com to see the announcement.

I'm blown away - this is an awesome feature. Combined w/ one-click it will definitely add to my Amazon purchases.

Is there similar keyword searching available anywhere else? I would love to have this type of search available at the library.
posted by drobot at 6:35 AM on October 23, 2003


Hopefully they'll add it to their API too so you can plug into it from your own sites.
posted by ao4047 at 6:37 AM on October 23, 2003


sorry bout the bum link, added an extra '.
posted by omidius at 6:41 AM on October 23, 2003


A searchable digital concordance of my book collection. This is like half of everything I've ever wanted in the world.
drobot - I know from somebody who works there that Harper Collins has a searchable database of all the manuscripts it publishes. I imagine all of the other large publishing houses have something like that too, but I don't know of anything else like this available to the public.
posted by twitch at 6:43 AM on October 23, 2003


A better link.
posted by jpoulos at 6:44 AM on October 23, 2003


mon dieu!
posted by shoepal at 6:44 AM on October 23, 2003


Also, NOT every book is included in the search--just those that are "participating". It's still 120,000 books--but not 270,000.
posted by jpoulos at 6:48 AM on October 23, 2003


This is most excellent. I hope many more books will follow.
posted by Songdog at 6:53 AM on October 23, 2003


It's still a little rough.

"If music be the food of love..." received 121 results. But Twelfth Night didn't show up in the first twenty results.

No sign of Faulkner within "the human heart in conflict with itself." But it did turn up the Cliff's Notes for Absolam! Absolam! at 12.

So much for small favors.

Not sure how they created the algorithim (and, more importantly, how Amazon is tracking this information), but I have big questions about (a) what's being prioritized and (b) relying upon a commercial site that tracks my every whim for any helpful search.
posted by ed at 6:53 AM on October 23, 2003


Despite amazon.com being 'evil', they sure do some cool-ass stuff.
posted by scottq at 6:55 AM on October 23, 2003


"It was a dark and stormy night" returns many books with that text in the title, but it's a while before you get to A Wrinkle in Time, which actually opens with those words. You do get Snoopy, though, which is some compensation.
posted by Songdog at 6:58 AM on October 23, 2003


"Mrs. Whatsit" works.
posted by grabbingsand at 7:01 AM on October 23, 2003


Looks like their OCR ain't perfect:

1. on Page 483:
". . . ra«a 5raem sraam sv,tma 1211nlznols.. slssROw I:zssii'am 51~12~ 1:z9:12am 6(0012W01'.29:12M1 sfaelzaw 1 .naz rw sfoslzooo nzssz nn slmlztroo 1 .zsaz an slwlzooo 1 .n.1z an s/~fAW I.n:12wn -am sltsf2~oo1'.zs'.IZan SYSütam 01~12000 .29. 2 Am ólosf-0 '2 '.12 5mem slosl2aoo : . . ." for this page. Somehow it thought the pictures were paragraphs and it went insane.
posted by zsazsa at 7:01 AM on October 23, 2003


So does "A Wrinkle in Time," but it matches the top of all left-hand pages.
posted by Songdog at 7:04 AM on October 23, 2003


This means Amazon is scanning in most of their books and using OCR or they're getting text files directly from the publishers -- or both. I wonder what they're up to, besides word searches. Are they planning a massive sale of ebooks?
posted by grumblebee at 7:13 AM on October 23, 2003


It worked great for 'Oulipo'. It looks like they are hoping to make this better going forward based on the contest they are running in conjunction with the launch.
posted by drobot at 7:18 AM on October 23, 2003


For the record, the algorithm bit of this isn't actually very hard. You turn each document into a keyed database of words (and pointers into the actual document so you can find the excerpt) and store it in some intelligent database-y way so that searches are fast, and then you put that on enough hardware that it is reasonably fast. The algorithms have been around for a long time --

--What I am astonished by is the effort to get electronic copies of 120K books in the system. I had thought there'd be some saner method than OCR, but judging from the above comments, maybe not. Perhaps some publishers sent electronic copies at Amazon's request?

On preview, my thoughts exactly, grumblebee. :)
posted by Trombone Borges at 7:18 AM on October 23, 2003


I don't really see the benefit of this.

I don't buy books based on phrases used in them. And even if I did, presumably, I'm going to buy books that I don't already have, and wouldn't know the phrases in it in the first place.

I mean, how useful is it to be able to find all the books that use the word lugubrious, or what-have-you?
posted by crunchland at 7:23 AM on October 23, 2003


This rules. Now I can Google and Amazon myself.
posted by VulcanMike at 7:33 AM on October 23, 2003


crunchland - what if you wanted to find books that contained a particular poem, essay, or short story? What if you want a reference book that covers a particular topic in detail? Being able to go straight to the page to be sure is a very valuable feature.
posted by Songdog at 7:33 AM on October 23, 2003


Wow. My mind is boggled (more so than usual).

Now, with clever combos of search terms, I have an index to most of my books. So ["robert wright" +cupboard +bare] takes me to Robert Wright's Nonzero and I can read those pages online.

crunchland: if your search term is "Coluccio Salutati," you would find this service very valuable.
posted by micropublishery at 7:34 AM on October 23, 2003


Mmm...the cool delicious taste of Pepsi. Pepsi Blue.
posted by stavrosthewonderchicken at 7:35 AM on October 23, 2003


crunchland: Presumably, you're not interested in bibliographies, endnotes or tracking down books that quote specific sources. Or, for that matter, tracking down that elusive quote you can't quite remember.

Actually, one of the more fascinating uses of this new tool is to keep plagiarists in check.

On preview: what Songdog said.
posted by ed at 7:36 AM on October 23, 2003


Also, Bulwer-Lytton is the source of "It's a dark and stormy night," and, while the Bulwer-Lytton Contest is referred to in those results, Sir Edward himself fails to make an appearance within the Amazon results. The big problem here is that out-of-print books (and some titles go out of print immediately) are less likely to be OCRed (if at all) than Dan Brown's latest crapfest. So if you want to find a beautiful passage from Frederic Prokosch or John P. Marquand referencing Bulwer-Lytton in H.M. Pulham, Esq., the noggin seems a better algorithim
posted by ed at 7:42 AM on October 23, 2003


I can see a lot of utility in this. I routinely spend money on very technical texts on integrated circuit design. These are pricey books, but since there's no technical university near me I can't preview them first. So based on other peoples reviews I've purchased books with costs well upwards of 100 dollars. A lot of these books are good, some are horrible. If I get advice I can search for information in that book and actually read it. How does the author present information? Is it dumbed down to the point of non-practicality etc?
posted by substrate at 7:44 AM on October 23, 2003


They had this for the menus in Amazon Restaurants in late 2002, I believe. Try "nutella" =)
Getting it to books was just a matter of time, I guess. Great stuff!
posted by XiBe at 7:49 AM on October 23, 2003


Mmm...the cool delicious taste of Pepsi. Pepsi Blue.

Y'know, that seems like pretentious bullshit in all but the most blatant cases of advertising. Here's something a company (boo hiss!) has done that's a pretty impressive technological feat. Moreover, it's kinda cool if you think of it in terms of the last thousand years: 1003 AD, I've probably got access to 0 books in my house. 2003 AD I can search 120,000 from my desk.

I get that you're not interested. I don't get why this shouldn't be posted.
posted by yerfatma at 7:51 AM on October 23, 2003


As a literary agent for a wide range of authors, I have VERY mixed views of this.
posted by twsf at 7:54 AM on October 23, 2003


I just found out that somebody with the same last name as me was named plenipotentiary of the Department of Justice in Germeny in 1918.
posted by goethean at 7:57 AM on October 23, 2003


I don't get why this shouldn't be posted.

Who said that it shouldn't be posted? Not me.
posted by stavrosthewonderchicken at 8:01 AM on October 23, 2003


All 49 results for fuckwit :

And they tell me these "books" are worth looking at. Ha!

I mean, this is pretty cool.
posted by cryosis at 8:15 AM on October 23, 2003


Well, since Amazon doesn't let me do what I sometimes do in a real, live bookstore -- peruse the index of a programming text for a particular hack, or the index of a cookbook for a particular recipe -- I guess this does have some use. On the other hand, what it doesn't let me do what I'd do in a real, live bookstore is sit down and read it there, and save me the expense of buying it.

I still think this is more of a "can we do it?" than a "should we do it?" sort of thing. Seems like a huge expenditure of computing power for not a lot of function, since I can't seem to recall the last time I needed to find Coluccio Salutati.
posted by crunchland at 8:15 AM on October 23, 2003


As a literary agent for a wide range of authors, I have VERY mixed views of this.

Why?
posted by mkultra at 8:17 AM on October 23, 2003


Huh. So we have to assume that they have the text of all their books digitized somewhere, right? With that information, they could boil every book down into vectors and compare one book's vector to all the other books' vectors and find "books like this one."
posted by rusty at 8:23 AM on October 23, 2003


I can't believe "Asshat" isn't in any of those books....
posted by Eekacat at 8:26 AM on October 23, 2003


we have to assume that they have the text of all their books digitized somewhere

I await all_amazon_books.rar.torrent, coming soon to my favorite tracker.
posted by stavrosthewonderchicken at 8:27 AM on October 23, 2003


A correction, its not 33 million words, but rather 33 million pages. 33 million words in 120.000 books, is less than 300 words per book which would make for very short books.
posted by talos at 8:30 AM on October 23, 2003


I wonder if there is or will be a limit to how many pages you can view of a given book. So far I've been able to see at least 15 pages by doing consecutive searches. Who will be the first to write a script that downloads the whole book?
posted by mcguirk at 8:31 AM on October 23, 2003


Phew. My masters degree just got that much easier! Who needs to buy a book or go to the library when it is all right here! I am presuming that this is the problem twsf has?
posted by Quartermass at 8:35 AM on October 23, 2003


"Resistojet" - good word Bezos.
posted by tomplus2 at 8:36 AM on October 23, 2003


talos: they might be taking out common words like a, an, the, and, etc.? not that this makes the "less than 300 words per book" figure less alarming, but it does put it into greater perspective.
posted by pxe2000 at 8:37 AM on October 23, 2003


My god. I happen to be reading a history of the 1930s (The Dark Valley, by Piers Brendon—an excellent book, if anyone's interested in the subject) and had just run across a reference to a battle in which the Italians kicked the (badly led and pathetically armed) Ethiopians' asses, Amba Aradam. I'd never heard of it and was curious to find out more about it, but it's a pretty specialized topic (the name of the battle isn't even in my Webster's Geographical Dictionary) and I didn't know where to look. So I typed it into the Amazon search. Boom: 12 results, the second of which was the book I was reading. I click on the link for The Italian Invasion of Abyssinia 1935-36 by David Nicolle and get:
2 references to amba aradam in this book:
1. on Page 4:
". . . of Tembien results in a draw though the Ethiopian offensive is halted. 10-15 February 1936 Battle of Enderta, Italians take Amba Aradam. 27 February-2 March 1936 Second Battle of Tembien, Italians take Worq Aruba. 31 March 1936 Battle of Maych'ew, Italians . . ."
2. on Page 30:
". . . followers. ABOVE Men of the élite Italian Alpini, probably the 5th Alpine Division Pusteri, in action at the battle of Amba Aradam on the northern front. Here, a large Ethiopian force under Ras Mulughieta took up a position from which it could . . ."
I try A History of Ethiopia Updated Edition by Harold G. Marcus and get:
". . . flank, and destroy the Ethiopian armies one by one. Within a four-week period, Badoglio's forces conquered Ras Mulugeta at Ámba Aradam, demolished Ras Kassa's army at a second battle in Temben, and defeated Ras Imru in Shire. The rapidity . . ."
This is amazing. I doff my cap to Amazon and to omidius.

And crunchland... you're weird.
posted by languagehat at 8:37 AM on October 23, 2003


ed, thanks for the Bulwer-Lytton info. I knew L'Engle didn't originate that opening, but I hadn't known the source. I just remembered from my childhood that that sentence was used in both A Wrinkle In Time and "Peanuts."
posted by Songdog at 8:38 AM on October 23, 2003


I tried the first words, first sentence and first page of Strunk & White and it found nothing but some science fiction novels. Same with the Thames & Hudson guide, Bringhurst, the AP Stylebook, and every other style guide I tried.
posted by luriete at 8:39 AM on October 23, 2003


Also, Bulwer-Lytton is the source of "It's a dark and stormy night," and, while the Bulwer-Lytton Contest is referred to in those results, Sir Edward himself fails to make an appearance within the Amazon results.

Actually, B-L is in print: he's being kept alive by Kessinger, a publisher specializing in mystical and paranormal books. Wildside, a small CA press that does a lot of fantastic/adventure reprints, also has some of B-L in print. Presumably, neither publisher supplied an electronic text file.

This looks extremely useful. I'm working on an annotated bibliography for a library acquisitions journal and already the search is turning up books which I would not have found otherwise. Cool, cool, cool. I see my wish list growing.
posted by thomas j wise at 8:41 AM on October 23, 2003


This is a total Trojan horse marketing app for Amazon.

Step One: Make lots of books full-text searchable.
Step Two: Let you upload the UPC or ISBN codes of all the books in your library, so you can search just the books you own, your friends' libraries, etc. (Everyone already has their free CueCat, right?)

Now they have a complete list of books you own, even those you didn't buy from them. Their recommendations get 100 times more eerily prescient. They start e-mailing you reminders when new editions of books you already have are released (people in the computer industry will love it). In the end, they sell a lot more books.

Amazon is the only retailer that is seriously and continuously attempting to take advantage of the retailing potential of the Internet. They're so far beyond everyone else now they don't even need to do stuff like this, but they do it anyway, which just floors me. It's a sign of a great company that is never satisfied with how great it already is.
posted by kindall at 8:47 AM on October 23, 2003


pxe2000: point taken with the common words but Amazon actually talks about 33 million pages.
posted by talos at 8:49 AM on October 23, 2003


As a bookseller, I would say one of the main uses would be to find lines from poems. In the past I've always done this by just going to the big poetry anthologies and looking through the indexes... Suprisingly, this is usually quicker than looking most things up on computer.
posted by drezdn at 9:00 AM on October 23, 2003


Let you upload the UPC or ISBN codes of all the books in your library.

Yeah, I'm gonna do that. As soon as I finishing indexing all the dollar bills I've ever handled, sorted by serial number.
posted by crunchland at 9:02 AM on October 23, 2003


kindall: Factor in RFID and you've got the makings of a potential book-selling monopoly. As we speak, following up on the Wal-Mart mandate to its top 100 suppliers, RFID is being considered in (at the very least) Seattle and San Francisco libraries, as well as books sold in B&M stores. Ostensibly, the purpose is to keep track of lost books. But there's the added Orwellian component of not being able to shut off the RFID signaling device after you've purchased a book or checked out a specific book from a library.

Now say you're Amazon. What you want, ideally, is a solid indication of what your consumers want so that you can market aggressively to your base. RFIDs are applied to the latest titles, circa 2005-2006. Result: you can track all books purchased or shipped in the past couple of years and track them to one geographic location (i.e., an individual's book collection). Compare that information with the search terms that the individual is typing into Amazon and what s/he is responding to and buying. Everything from Bulwer-Lytton to an obscure Italian-Ethopian battle. And you have privacy-invasive marketing that makes collaborative filtering look like a flimsy Logo program. You have an individual company that not only has a more astute record of your tastes than is reasonably required, but may very well use this information to sell you more worthless junk or bombard you with email updates -- all of them, of course, specifically tailored to your personality and interests.
posted by ed at 9:02 AM on October 23, 2003


kindall, I very much agree which is why I was so disappointed that, after they flew me up for in-person interviews earlier this year, no job offer was extended.

ed, factor in the recent creation of A9, Amazon's own Google-ish competitor.
posted by billsaysthis at 9:07 AM on October 23, 2003


And now let's imagine what happens when the FBI subpoenas that database.....

...just sayin'...
posted by briank at 9:12 AM on October 23, 2003


Result: you can track all books purchased or shipped in the past couple of years and track them to one geographic location (i.e., an individual's book collection).

Are you saying Amazon (or someone else) is going to be able read the RFIDs of books in my house without my consent? RFID is short-range (tens of meters at best).
posted by mcguirk at 9:20 AM on October 23, 2003


text searching or not, amazon still sucks as a way to buy stuff online. Their service is awful, items marked "in stock" seldom really are, delays are common and prices are high. More and more I use their site as a way to find a product, then to to pcconnection.com or barnesandnoble.com to acutally buy the thing.

But judging from the enthusiasm here i guess im the only one.
posted by H. Roark at 9:30 AM on October 23, 2003


following up on the Wal-Mart mandate to its top 100 suppliers

Wal-mart backed out of that. Librarians have been hotly disputing the usefulness as well as the privacy implications of RFID daily now for weeks. Since we already have online catalogs with front covers for patrons to look at [in many libraries] there are librarians wondering if full-text searching in the library is the next vendor-addled feature that comes bundles with an online catalog software package. But.... you don't think the publishers [or Amazon] would give up such a lucrative database so that people could share books, do you? While this is definitely an interesting development in the full-text searching arena, I'll be more excited when 1) I'm sure it's a level-playing field w/r/t whose full-text can be searched [is it just big publishers, and public domain, or can everyone play?] 2) it's not just being utilized to try to sell me something.
posted by jessamyn at 9:42 AM on October 23, 2003


Yeah, I'm gonna do that. As soon as I finishing indexing all the dollar bills I've ever handled, sorted by serial number.

Crunchland, you know such people actually exist, right?
posted by zsazsa at 9:42 AM on October 23, 2003


Yeah, this is all okay, but can it find me the CD that has that song on it? You know, the one that goes dun dun dun-dun-dun da-dun doo dun?
posted by monkey.pie.baker at 9:45 AM on October 23, 2003


jessamyn: Okay, admittedly, I got slightly carried away with the RFID paranoia in my last post. But while the smart-shelf may be out, the pallet mandate is still on. The tracking fervor could easily apply to wholesalers, particularly places like Amazon where they cut out the middleman. Who's to say that Amazon WON'T include RFIDs with their packages? Given the amount of products they ship, simple warehouse pragmatism can easily be amalgamated with highly specific customer information. And you've been reading too much DFW. :)

mcquirk: One word. Networks.
posted by ed at 10:14 AM on October 23, 2003


monkey.pie.baker: is this your card?

(oh, and talos: score one for your reading comprehension abilities. ^_^)
posted by pxe2000 at 10:22 AM on October 23, 2003


monkey.pie.baker:

The melody search engine (java).
posted by mr_roboto at 10:22 AM on October 23, 2003


ed: One word. Networks.

I wonder how much capital it would require to create a network that will be able to capture a signal down to a range in the magnitude of 10 metres. I mean, cellphone towers can presumeably be kilometres apart and I still get flaky service, but these "RFID-capturing" towers would have to be on every lamppost in every town in America.
posted by mfli at 10:43 AM on October 23, 2003


As a lexicographer, I find this to be an amazing, beat-all citation resource. A good deal of dictionary-making is about verifying usage, which involves finding it in context, usually in printed matter. While the British National Corpus and the American National Corpus (just released!) are fine for this, they only include about 10 percent of each of the works included in the corpus, and those works are thousands but still, relatively few over all. The ANC is only 10 million words. It really isn't enough: it's but an electron on the molecule in a drop of a wave of an ocean. The problem with finding citations on Google (excepting Google News) has always been the signal-to-noise ratio. Here, with Amazon, we largely have professionally edited texts, meaning if the word is in there, it's probably not a typo (although it could be an OCR artifact).

So, yeah, this is good for me professionally, and I think it's good for me as a reader and a book-buyer. I don't know how they prevent people from downloading an entire work a page at a time, but I think it's good for the authors, too, as the metadata provided in Amazon synopsis and reviewes often is not enough to judge a work before buying. This takes the intermediation of others out of judging a work, and makes it more akin to flipping through the pages in the bookstore.
posted by Mo Nickels at 10:43 AM on October 23, 2003


Looks like the full public "behind-the-scenes" story will be in December's Wired Magazine, or so says this press release.
posted by kokogiak at 10:56 AM on October 23, 2003


A search for metafilter turns up the usual suspects, but check out the last result: holy crap, some of our comments showed up in books! (not saying that's a bad thing, just a bit amazing is all)
posted by mathowie at 11:00 AM on October 23, 2003


Wow! The Kaycee Nicole story is in an Introduction to the Internet book (I couldn't think of a better place for the story).
posted by mathowie at 11:03 AM on October 23, 2003


Step One: Make lots of books full-text searchable.
Step Two: Let you upload the UPC or ISBN codes of all the books in your library, so you can search just the books you own, your friends' libraries, etc. (Everyone already has their free CueCat, right?)


Where does "collect underpants" come in?
posted by mr_crash_davis at 11:24 AM on October 23, 2003


The magazine I work for, Wired, is publishing an in-depth article on this in its December issue. Here's an early link to Gary Wolf's story.
posted by digaman at 11:34 AM on October 23, 2003


Maybe it's just me, but it seems like Amazon's marketing is already scarily prescient. In my "Gold Box" was a monty python boxed set (I'm a fan, though I've never browsed for python paraphernalia there) and a logitech wireless mouse cover (I have a logitech wireless mouse, and I didn't get it from Amazon). In a recommended list was an album whose cover is posterized on my bathroom door -- never heard of the group, got the poster free at a record store because it looked cool.
posted by Tlogmer at 11:37 AM on October 23, 2003


I was just able to look up some technical information in a book I don't own. Spooky.
posted by SealWyf at 11:37 AM on October 23, 2003


It failed to find the two books that I am thanked in, I want my 15 bytes of fame!
posted by Mick at 11:38 AM on October 23, 2003


Holy crap, that is some seriously amazing technology. I mean a lot of times, I'll think of a quote I've heard from a poem or an essay, and I'll wonder who exactly is came from. Really I think kindall is right, beyond the competition indeed.
posted by patrickje at 11:39 AM on October 23, 2003


Unfortunately, it doesn't seem to work that well yet. I searched for a number of characters & phrases from Terry Pratchett novels & came up with nothing, except one short story anthology. I tried a couple of lines from poems by Gerard Manley Hopkins & got nothing. Either they need to scan in more books, or improve their indexing.
posted by tdismukes at 11:47 AM on October 23, 2003


It is a very small sub-set of books.. less than %1 -- Good Wired article digaman, as Bezos says, baby steps one step closer to a universal library.
posted by stbalbach at 12:27 PM on October 23, 2003


mcguirk, I'm reading as much of the Lovely Bones as I can, until they make it impossible to view more than a couple pages from the same ip. So far, so good.
posted by beth at 1:12 PM on October 23, 2003


Jesus! It took Project Gutenberg 30 years and they've only got 6500 books. Though (granted) they're not all bad OCR's... but still! Long live capitalism and its demon knight, Amazon.
posted by Civil_Disobedient at 1:33 PM on October 23, 2003


I've been waiting for a feature, or payoff of the promise of the Internet, like this since over a decade ago when I persevered through the DARPANet thicket and managed to dial up a remote library via 300 baud.

When I saw the "Welcome" banner appear from a university's book repository, which was geographically over half way across the country, I immediately thought, "Now..if I could only word search against that motha."

Well...this is a beginning.

*joy*posted by Dunvegan at 1:50 PM on October 23, 2003


One word. Networks.

Do you have even the slightest idea what you're talking about? Now that I read more about it: the kind of passive RFID tag that would be used in a book has no power source of its own. It relies on the energy it receives from an incoming radio signal, and therefore can only be read with a reader held within inches of the tag. That would have to be some network.

Simply put, an RFID tag cannot be tracked to a "geographic location" unless that location happens to be in the same room as you.

I'm reading as much of the Lovely Bones as I can, until they make it impossible to view more than a couple pages from the same ip. So far, so good.

It looks like they're just letting you do whatever you want, but monitoring the usage from each account. If they decide you're abusing the service, they'll presumably cut you off at some point, and since each account requires a credit card, it would be hard to keep registering for new ones.
posted by mcguirk at 1:57 PM on October 23, 2003


crunchland: And I can't seem to recall the last time I needed to find anything about lugubrious. I think you got my point. So stop being so excessively mournful...
posted by micropublishery at 2:13 PM on October 23, 2003


Isn't this very similar to Google catalogs? Down to the same look of the highlighting.
posted by smackfu at 2:19 PM on October 23, 2003


That was exactly my thought, Rusty. Especially if they do a little SVD and fudge the eigenvalues a bit for some LSA-like effect. The possibilities are fascinating.
posted by Fezboy! at 2:43 PM on October 23, 2003


mcguirk: I wonder if there is or will be a limit to how many pages you can view of a given book. So far I've been able to see at least 15 pages by doing consecutive searches. Who will be the first to write a script that downloads the whole book?

But wouldn't that script itself have to do OCR on the images?

This is cool, though. They give you the 2 pp. before your search terms' appearance, and the 2 pp. after—which is more generous than they could have been.

I don't see myself chaining searches to read a book online for free, but I did find out that Rilke uses the word "ausgesetzt" not once but twice in his uncollected poems...
posted by Zurishaddai at 5:19 PM on October 23, 2003


But wouldn't that script itself have to do OCR on the images?

Maybe, although on one book I tried I was able to just search for a number, and it would match that page number. (On another one I tried it didn't work.) While doing OCR would be a pain in the ass, it's not impossible.
posted by mcguirk at 5:25 PM on October 23, 2003


Mo Nickles is on to the reason I think is this is most exciting, and that's the ability to cite sources (via linking to) online. For those of us that like to, um, calmly discuss things on the internet, or post essays on weblogs, etc, Amazon becomes a huge repository of facts and knowledge that one can use to back up assertions or simply add knowledge to discussions.

When I first read this post this morning, this didn't occur to me. But then I was reading hawkmans' poker post, and realized that I wanted to cite something from James McManus's recent poker book, and voila!

Of course one man's "exciting" is another's "Oh great, longer and more pedantic discussions..."
posted by pitchblende at 5:36 PM on October 23, 2003


Way to go with the citation, pitchblende—that's a notable MeFi first!
posted by Zurishaddai at 6:53 PM on October 23, 2003


Aw, damn. After reading about 73 pages or so, I got this message:

Important Message

You've reached the page-view limit for this book or you've reached the monthly page-view limit for the Search Inside the Book feature. Feel free to return to the pages you've previously viewed. If you want to see more of this copyrighted material, you can purchase this book. You can also search inside other books. Click here for more information or continue shopping.

posted by beth at 7:00 PM on October 23, 2003


You know, I tried to follow mathowie's link, and I was required to create a profile, which I can live with, and give them a credit card number, which I cannot. I guess you all have accounts there already, huh?

So everyone -- with a credit card -- can text-search, huh? Fuck Amazon. I loathe them as much as I ever have.
posted by stavrosthewonderchicken at 8:19 PM on October 23, 2003


As a literary agent for a wide range of authors, I have VERY mixed views of this.

Oh? Well, I'm sorry to break it to you that in a few years you and your author clients might find yourselves facing a rather urgent threat of extinction.

The IP revolution is coming. All forms of intellectual communication will greatly liberalize and pluralize their ways of distribution, and new, probably less profitable (at least for the middle-men) methods of compensation will have to be devised.
posted by azazello at 9:38 PM on October 23, 2003


Even still facing OCR imperfections, this service is WOW! It's a brand new era in the bookstore life. I wonder if the libraries and bookstores are going to bookmark Amazon.com for the times when they have to look for passages and books inside their own places. Heh.
posted by nandop at 5:49 AM on October 24, 2003


stavros, I too was confused by how you get to this functionality. Turns out it's built in. Just put it in the main search box. The credit card stuff is for those who want to go the extra step to browse entire pages, but you can still search and identify passages without it.
posted by soyjoy at 7:41 AM on October 24, 2003


Nah. What'll be really cool is when you can search imdb.com in the same kinda way. It's owned by Amazon, so doubtless someone is thinking about this.
posted by tapeguy at 2:09 PM on October 24, 2003


The word discubitory only appears in one book in Amazon's index.

You're welcome.
posted by ceiriog at 3:11 AM on October 25, 2003


I'm so happy. Typing in "jello gantry duck firecracker" brings up one of my favorite books.
posted by borkus at 12:52 PM on October 27, 2003


« Older That's one juicy video...   |   World Scrabble Championship Newer »


This thread has been archived and is closed to new comments