Join 3,375 readers in helping fund MetaFilter (Hide)


Google makes another killer app?
August 21, 2002 11:02 PM   Subscribe

Google makes another killer app? Rackmounted servers devoted to googling your own intranet or website. Just look at those specs and features. Google is selling 1 server, retail $28,000, and they are marketing especially for corporate intranets. But imagine the power that would be at the fingertips of archivists, students, and researchers everywhere with a dedicated, customized Google for their own website. Imagine being able to do a detailed search that would literally comb the content of every page published by Project Gutenberg. In seconds, you could call upon thousands of years of writing for any and all information on any specific subject. What kind of implications will this technology have long-term for students, researchers, and archivists?
posted by insomnyuk (21 comments total)

 
This is especially useful for any institution which has large amounts of data behind a firewall.
posted by insomnyuk at 11:08 PM on August 21, 2002


Best of all, it's yellow.
posted by swift at 11:17 PM on August 21, 2002


*faint*
posted by tweebiscuit at 11:18 PM on August 21, 2002


Google is a cadre of dark witches running on hamster wheels. Swear.

Oh err... can someone who knows things tell me if there is a reason for the price changes? "The GB-8008 is now $450,000 up from $250,000." Sounds like an interesting story.
posted by RJ Reynolds at 11:26 PM on August 21, 2002


Google is a cadre of dark witches running on hamster wheels. Swear.

Well, it does seem like black magic sometimes, but that perception is the trump card the techie has over the computer illiterate. Of course, they fight back with invective such as geek and nerd, which is in turn worn as a badge of pride. Interesting story, that.

Relevant Slashdot Interview with the Google Director of Technology (and the thread which determined the questions to be asked)
posted by insomnyuk at 11:41 PM on August 21, 2002


This is old news (coolish but old), pretty sure there was a MeFi post and I found stuff about it in metatalk dating back to February.
posted by zeoslap at 11:44 PM on August 21, 2002


Point taken, zeoslap, but this isn't *cough* NewsFilter, and I thought it was neat, and I did a search on MetaFilter (not MetaTalk) for the Google Appliance and couldn't find it as the topic of a post, and I thought it was 'something cool I found on the web', and I thought it would be fun to speculate about the possibilites, so here I am, here you are, and good night.
posted by insomnyuk at 11:48 PM on August 21, 2002


Does anyone know how Google combs the content of millions of websites and returns results so quickly? I mean, try doing a search on MeFi with it's built in engine, and if you go far enough back it can time out, and that's just one website.

Oh yeah, this here is a preemptive strike.
posted by Grod at 11:48 PM on August 21, 2002


I feel silly. I just realized it's because MeFi's search engine must physically pull each page, whereas Google has some complex way of indexing the content of sites to keywords, creating a catalogue, and that is what gets searched. It's still amazingly fast though.
posted by Grod at 11:55 PM on August 21, 2002


also google isn't run by a server in some guy's house ;)

because when a server is in a house it is lazy and watches television too much
posted by rhyax at 12:00 AM on August 22, 2002


I don't think it would be possible to use the server to comb content in the way implied by the Project Gutenberg example.
The spec page states that it uses Google's PageRank relevance sorting engine, which prioritizes relevance base on the number of links to any given page. Without converting all the text in the Gutenberg archive and then manually making links to pages and concepts, I don't see how it could be used to search for general concepts occurring in literature. I didn't see any reference to being able to analyze content, per se.

It's just their search engine in a box.
posted by rhizome23 at 12:02 AM on August 22, 2002


They have clusters of computers, numbering at least 10,000 (basically tons of redundancy), and who-knows how much bandwidth they have. I know mathowie toured Google, maybe he has the straight dope.. They use Linux to manage the massive distributed computing system, too. (it's all in those Slashdot links I posted)

One thing to keep in mind is that the majority of information has yet to make it into digital form, so it will be several years before this kind of technology can really search "anything".

Also, I have been using the Google IE toolbar for 6 months now, and it is utterly indispensable. I'm coming off as a sycophant, I admit...

rhizome: I don't think the searching is completely dependent on PageRank, plus the search techniques can be customized site by site, I'm sure there is a solution to such a question.
posted by insomnyuk at 12:04 AM on August 22, 2002


I use the Google IE toolbar too. It works fine, although after the last time I ran Ad-Aware it has shown up as "broken" in IE's objects folder.

Now I'm getting curious. How are the computers organized? Something like this?

It occurs to me that PageRank is how they do the relevence thing, but they must also scan the content of each page, and have some algorithm for ranking words, whether its simply by occurance or something more complex I don't know, but if they didn't Google bombing wouldn't be possible, nor would Google Groups art (can't find the link now, even using Google) where you use a string of words in a posting so that when a search is done the automatic highlighting forms an image.
posted by Grod at 12:22 AM on August 22, 2002


How do they catalog page content and rank pages by relevance? How do they do it so fast? Well, let me see . . . could it be . . . Satan???

Seriously -- I too, Insomnyuk, just love the Google toolbar in IE6, and it's easily my most-used appliance/feature on the IE configurable toolbar. And no, PageRank is an optional extra; you can download & use it or not, but it's not required in order to use the full search functionality (as I understand it, anyway).

And, ya know, I'd almost be tempted to sell my soul to the Morning Star if he'd help me come up with the next Google . . .
posted by wdpeck at 1:34 AM on August 22, 2002


One utility I've found that's even more helpful than Google's toolbar is Dave's QuickSeach Deskbar, which not only launches Google searches in a new browser window (Mozilla OR IE), but also launches Merriam-Webster searches on strings suffixed with a colon, and performs mathematical string evaluation on strings suffixed with an equivalence operator. Those are the main three I use, but there are several score more (stock lookups, anagram lookups, phone number lookups, thesauruses . . .).

Fits alongside the addressbar on my taskbar - due to it being distributed under the GPL, those of you with a compiler handy can also add your own functionality. Now if I could just figure out how to get it to launch my Google searches in a new Mozilla *tab* instead of seperate window, the world would be perfect.
posted by Ryvar at 3:13 AM on August 22, 2002


I don't think the Google Appliance is the killer app people are making it out to be. What sets Google apart from its competition is its PageRank system (Google's explanation or Dennis Ford's hypothesis). The details of the system are a company secret, but the basic idea is that every link to a site is a vote for that site's popularity. Google somehow weighs the votes along with the anchor text so that if a bunch of sites link to www.espn.com like this (soccer) then it must follow that www.espn.com has something to do with "soccer" and it gets a higher ranking. This method harnesses the true nature the web and (so far) appears to do a great job at preventing people from fooling the search engine. It's a lot tougher to convince thousands of sites to link to you with a specific word than it is to put a few hundred repeating key words at the bottom of all your pages to up your ranking in the search (this used to work back in the olden days - it still might on some search sites but I stopped keeping track of the others once I started using Google).

So, the value of Google is it's ability to figure out the relevance of a particular page based on a huge sampling of interconnected data (the entire web for argument's sake). The web, by its very nature, and they way people use it is greatly interconnected and cross-referenced. Internal company documents, on the other hand, are not and as such I don't see the huge value of a Google search appliance. Sure, you can just turn it loose on your intranet and it will index all the documents it can find, but in the end you're not going to have much more than a very expensive keyword search since the vast majority of the documents it finds will not contain links to other documents. Internal data, which is likely to largely be contained in word/word perfect documents, pdfs, emails and html documents are simply not as referential as information found on the web.

Google feeds on the fact that there is a great deal of redundancy on the web. For example, if a major event happens a handful of stories will be written for major publications. A few of those will be picked up via the AP and distributed to hundreds of sites. Subsequently, thousands of smaller sites (like metafilter) will then link to one, if not several, of those stories. When Google comes across something like this it can have a field day analyzing all the links and figuring out the appropriate rank each site should have for specific search terms. Now contrast this with information you are likely to find on a company intranet - is the creation of a single Word document outlining the sales forecast for next quarter going to spawn the creation of a bunch of documents that link back to the original? Unlikely. Sure, it is likely that documents created in response to said document will contain similar words and phrases. I anticipate that Google's search would group these documents together simply based on their similarity, but so would most any other basic indexing/searching tool that costs considerably less than a Google appliance. So, you're likely to get similar results with free tools like SWISH-E or htdig as you would with a cool looking yellow rackmount server.

Of course you'll get some bells and whistles with the Google appliance, but I don't see it as the killer app on the intranet that it is on the internet. I'd like to be wrong about all of this so maybe someone will write about it once they get their hands on one. For some more quick reading here's Google's answer to people like me.

And, of course, none of this detracts from the simple truth that I still want one!
posted by stoic at 3:40 AM on August 22, 2002


Google is a cadre of dark witches running on hamster wheels. Swear.

Actually, it's pigeons, but good guess.
posted by GersonK at 4:12 AM on August 22, 2002


I'm a researcher, spending a lot of time browsing through articles, mostly science ones, and I cannot keep up, there is too much information. I'm waiting for the day when it will be possible to have Google, or something else, do that for me and let me do what I want: build on what's already available, not wasting time reinventing the wheel.
posted by MzB at 6:32 AM on August 22, 2002


Imagine being able to do a detailed search that would literally comb the content of every page published by Project Gutenberg. In seconds

You might be able to already do this with the normal google website. Just go to the advanced search and you can tell it to only return results from a specific domain (i.e. the Project Guttenberg website). I think it can even search .pdf files as well.

I see the Google appliance being useful on an internal network, but if the network is public then you can already use google.com to specifically search it.
posted by jsonic at 6:44 AM on August 22, 2002


Googling your own Intranet for $28,000? Whatever happened to grep?
posted by gimonca at 9:23 AM on August 22, 2002


With regards search engines (and the one on MeFi) I've used Lucene in a number of projects and it's very impressive. Very flexible and very fast results.
posted by zeoslap at 10:28 AM on August 22, 2002


« Older I wandered lonely as a cloud That floats on high ...  |  A sort of inadequate memorial ... Newer »


This thread has been archived and is closed to new comments