Any sufficiently advanced technology is indistinguishable from magic
December 2, 2004 12:50 PM   Subscribe

Google's sorcery You use it, I use it some 30-40 times a day, but did you ever wonder exactly how they do it? The numbers are staggering:
# Over four billion Web pages, each an average of 10KB, all fully indexed. # Up to 2,000 PCs in a cluster. # Over 30 clusters. # 104 interface languages including Klingon and Tagalog. # One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue. # Sustained transfer rates of 2Gbps in a cluster. # An expectation that two machines will fail every day in each of the larger clusters. # No complete system failure since February 2000.
Is Google God? (via /.)
posted by daHIFI (39 comments total)
 
And I always thought it was pigeons.
posted by daHIFI at 12:53 PM on December 2, 2004


Enough with the Google-stroking already. The redundancy they have built into the system is the very reason why it doesn't crash. It's called effective distributed network design.

Decidedly not magic.
posted by piedrasyluz at 1:21 PM on December 2, 2004


And before this thread becomes an "AllTheWeb-indexes-more-pages-yeah-but-none-of-them-touch-the-Deep-Web-Oh-but-Check-out-this-search-engine-its-much-better-than-google circle-jerk, let's be clear: Google is a company that provides access to information. Nothing less, nothing more. It is not god, and it isn't even your friend, especially and even if you work there.



Yet another sycophantic googly-eyed slashdot re-post...ugh
posted by piedrasyluz at 1:26 PM on December 2, 2004


"He turned to face the machine. 'Is there a God?'"
posted by DevilsAdvocate at 1:28 PM on December 2, 2004


I've very confused by the people who seem to believe that Google is their friend.
posted by Caviar at 1:30 PM on December 2, 2004


Disclaimer: I am a software engineer for Vivísimo, which launched Clusty on 30 September 2004....

The most impressive part about Google is its servers. And they are very impressive. Wasn't there some discussions around here a few months ago about the secret source of Google's power, and the GooOS posting on Kottke? I'm not sure, the search fails me (yes, Google search too), and I'm known for such memory lapses.

At any rate, the challenge of search is less the actual search algorithm itself than the computing power needed to handle that kind of load. You need to hold the entire internet in main memory, and still have the processing power left over to do all kinds of sorting. In Google's case, you also have the traffic issues of one of the world's most popular websites (not #1, though, that goes to Yahoo!--last I checked, I think Google was #4)

Actually, at this point, there's almost no way to distinguish search results from one engine or another. One report I read showed search results to people who insisted Google had the best results, but stripped the results of all images and formatting. When faced with just the results and nothing else, they couldn't tell Google from Yahoo or even Ask Jeeves.

The reason itself is simple. The search problem itself is not terribly complicated, but the space is so competitive that everyone has pretty much optimized everything that can be optimized. So what makes Google so much "better"? Simple: branding. Marketing. There once was a time when Google really was better than everyone else, and even though that time is long past, we still have all kinds of warm fuzzies associated with those bright balls. In Google we trust.
posted by jefgodesky at 1:35 PM on December 2, 2004


Not only that but remember that Google operates on a good-enough philosophy. That is, if it loses the index of lots of web pages - who cares? Most queries to Google are looking for results and pages that are "good enough." This gives them a lot of wiggle room in designing their architecture.

Contrast that with the distributed, heterogenous systems operated by large financial companies or even by Ebay, where you cant afford to lose any transaction and people querying your system are looking for a specific thing which needs to be consistent and available 24/7 or people go crazy. Much more impressive, to my mind.
posted by vacapinta at 1:36 PM on December 2, 2004


My grandchildren will never believe that there was no internet when I was a child.
posted by Specklet at 1:37 PM on December 2, 2004


including Klingon and Tagalog

There are more than 20 million speakers of Tagalog worldwide, more than those who speak Hungarian or Czech. Substantially more than speak Klingon and hardly an obscure language....now Bork, Bork, Bork on the other hand.

heeeeeure a leetle cheeken cheeken cheeken.
posted by m@ at 1:38 PM on December 2, 2004


I'm just surprised it hasn't yet gained sentience. Seriously, is the resurgence in Google-worship a response to MSN's efforts to re-enter the search engine arena?
posted by FormlessOne at 1:42 PM on December 2, 2004


But most Taglog speakers don't have computers, whereas computer usage is much higher among Hungarian and Czech speakers ... and obsessive computer usage is found among 99.9% of Klingon speakers.
posted by jefgodesky at 1:46 PM on December 2, 2004


Is Google God?

Google doesn't know all and see all yet, but it’s working on it.
posted by Fuzzy Monster at 1:49 PM on December 2, 2004




obsessive computer usage is found among 99.9% of Klingon speakers.

jefgodesky, is that original? At minimum I'm tempted to put it in my fortune cookie file.
posted by George_Spiggott at 2:01 PM on December 2, 2004


i'm very confused by people that think that i think that any website, company or brand could be my friend.

but be that as it may, i find those numbers impressive. so do my employers. the lsst (large synoptic survey telescope, who are not my employers) will generate 300-600Mb/s. if we archive that, we will reach several petabytes pretty quickly. the head of the data management team is rob pike (that pike) from google. he's not the only smart guy working there. so does novig.

a pile of brilliant people also work for microsoft. people like simon peyton jones. i admire them too. it's not worship, it's professional respect.

finally, why on earth would you expect it to be sentinent just because it's big?
posted by andrew cooke at 2:01 PM on December 2, 2004


I'm very confused by people who think I'm being literal when I say "Google is your friend," when in fact I'm being metaphorical and mean something along the lines of "Google is a very useful tool--not just in general, but particularly in regard to the question at hand."
posted by DevilsAdvocate at 2:30 PM on December 2, 2004


Is Google God?

Not yet.

I'm just surprised it hasn't yet gained sentience

It will become part of something sentient, and probably by accident. Think virus-meets-grid-computing-meets-Google.
posted by ZenMasterThis at 2:35 PM on December 2, 2004


You need to hold the entire internet in main memory, and still have the processing power left over to do all kinds of sorting.

i'm no search engine sychophant or google groupie or anything but something about that is somehow, well, god-like.

godish?
posted by jellybuzz at 2:44 PM on December 2, 2004


George Spiggott: If nothing else, it's original in the Elisha Gray sense.

It is interesting to speculate what might happen with a neural network of sufficient complexity .... but even Google doesn't have the kind of network complexity found in the human brain.

Yet.
posted by jefgodesky at 2:48 PM on December 2, 2004


Google is not your friend?

When I was 17 (1997) I got a job at an ISP. I remember walking into the server room and being awestruck at all the cables and racks and such. When I read the part that said up to 2000 computer in a cluster I was awestruck again. The sheer logistics of the whole operation is amazing.

As far as I'm concerned Google is my friend. As a computer consultant for several firms I have to know the answer to anything and everything that might come up, (so rare error message to a specialized software program that I 've never used before. It amazes me how many of my clients have no idea how to type a few keywords into a search box but then again that's why I charge them $60/ hour. Google is the first place I look for answers and I have learned that if I can't find it then I'm probably not going to find it on any other search engines either. At that point it's message boards and telephone calls to look for answers.

Sure there are plenty of web pages that aren't indexed and not linked to it, but my experience has shown that it is superior to everything else out there. Yes, I remember when Alta Vista was the only search engine out there, and no, I don't touch myself when I'm using Google. Yes they might be turning into Big Brother, and yes there is some danger to being totally reliant on it.

Maybe I'll change my Firefox search bar over to Yahoo for a couple days and let you know if I see any difference.

(tounge -in-cheek)I for one welcome our new Googlebot overlords, but then again I'll be the first one signing up for intra eye monitor implants when they come out. (/tounge)

..on preview: Thank you Jellybuzz, that is exactly what I was trying to say.
posted by daHIFI at 2:55 PM on December 2, 2004


DevilsAdvocate, that short-short story reminded me of Asimov. As in, overwhelmingly so. Thus, I enjoyed it.

And jefgodesky, that was a pretty sweet bit of writing too. Googlebot, indeed.
posted by jenovus at 3:05 PM on December 2, 2004


Neat trick: Try typing "(3x8)+15" into the search field. Then try "15 oz in ml".
posted by rushmc at 3:08 PM on December 2, 2004


That's called a "keymatch," rushmc. The engine is smart enough to recognize certain types of queries and put something contextually useful up front. Pretty standard for all the search engines these days. I think it's one place where metasearch can really shine through if done intelligently, since it really is a metasearch problem, not a search problem. Exhibit A.

daHIFI, Google is excellent for finding an answer to a specific question. That's what it's built for, and it shines through the entire design. It's main goal is to put the best answer to your question first; that's epitomized in the whole "feeling lucky" thing.

For exploring a topic, not so good. If you want to know the answer to a specific question, Google is perfect. If you're trying to find out more about, say, "Google bombs," wading through a long list of links one at a time may not be the best way to do that. You'll spend hours on all those links before you begin to develop an idea of what's out there, and all that time you're bound by Google's criteria of relevance, which may or may not be entirely suited to your ideas of relevance (see bomb, Google). Something like clustering lets you explore a subject you may not be intimately familiar with by providing you a bit of a concept map up-front.

Right tool for the right job and all that.
posted by jefgodesky at 3:29 PM on December 2, 2004


This post brought back to mind a subject that intermittently nags me - ie why doesn't everybody use vivisimo? From the logs of my site I can see that 90%+ of hits come from Google,with Ask Jeeves and Yahoo the only significant others. I've never had a hit from Vivisimo. Yet - for me, Viv knocks anything else into a cocked hat. Is it badly marketed - or is it just that it suits me and next to nobody else?
posted by James_in_London at 3:49 PM on December 2, 2004


Awww, shucks, James ... but please, we're trying to get people to use Clusty these days. Easier to spell, easier to say, easier to remember, with less emphasis on being a cool enterprise software demo, and more stuff for Joe Q. Websurfingpublic.

I'd say it's that whole branding thing I mentioned before. Google's technology is the same as everyone else's at this point (well, they do have a lot more servers than anyone else, I'll give them that), but they have that ephemeral "trustworthiness." People are creatures of habit, and Google's not really "broken," so why suffer the trauma of changing routine when all you're getting is something better? That's a big challenge for us, and if you have any good answers, well ... we're hiring.
posted by jefgodesky at 3:58 PM on December 2, 2004


the goolgorithm gives more weight to words in boldface? that smidgen of knowledge alone will allow me to conquer the earth! i will see you all driven before steef! thank you, dahifi!
posted by steef at 4:04 PM on December 2, 2004


why doesn't everybody use vivisimo?

And there it is.
posted by linux at 4:04 PM on December 2, 2004


Uhm, I just talked to my friend who works for Yahoo!, and they only stopped using Google results for their search 6 months ago, when they acquired their own engine.

So I would be curious, jefgodesky, when that article/study you read was conducted... maybe they couldn't tell the difference because they were essentially the same engine.
posted by danny the boy at 4:07 PM on December 2, 2004


there is no doubt, google is god!
posted by clubmedia at 4:39 PM on December 2, 2004


I got offered a dj gig for a Google party - does this mean I'll be playing for God?
posted by iamck at 4:46 PM on December 2, 2004


hmm...i just never think about using anything other than google just because it's the default search engine for dave's quick search toolbar.
posted by juv3nal at 5:51 PM on December 2, 2004


You know what this means? This means the dyslexic agnostic insomniac now lies awake wondering if leg Goo exists.
posted by weapons-grade pandemonium at 7:16 PM on December 2, 2004


Heh. Sweet.

Power density: "There is an interesting problem when you use PCs," said Hölzle. "If you go to a commercial data centre and look at what they can support, you'll see a typical design allowing for 50W to 100W per square foot. At 200W per square foot you notice the sales person still wants to sell it but their international tech guy starts sweating. At 300W per square foot they cry out in pain."
posted by A dead Quaker at 8:09 PM on December 2, 2004


My grandchildren will never believe that there was no internet when I was a child.
posted by Specklet at 3:37 PM CST on December 2


My children don't believe there wasn't an internet when I was a kid. Aahh... the failing of history class.

(Trust me, there wasn't)
posted by kamylyon at 8:34 PM on December 2, 2004


Is Google God? If you've got 8 minutes, these guys have an entertaining but cautionary tale..
posted by paulsc at 11:36 PM on December 2, 2004


I am a Googlebot!

I AM A GOOGLE BOT.
I AM HERE TO PROTECT YOU.
I AM HERE TO PROTECT YOU FROM THE TERRIBLE SECRET OF GOOGLE.

posted by blacklite at 4:11 AM on December 3, 2004


Vivisimo is ugly, not to mention the fact that they have gone lengths to make it difficult to distinguish between advertising and search results.

Yes it was ugly, that was one of the reasons we made Clusty. As for the ads, we've actually gone to lengths to distinguish them and had long discussions on how best to do that. You're right we don't put them on the right, because we already have clusters on the left--and a three column layout on anything less than 1024x768 is a convoluted, congested mess. If you have a better idea, we're hiring. Or, just email us.

All that about asking google questions and you didn't try it for you bomb? If you need to know what a google bomb is, your best bet is to go to google.com, and type in "what is a google bomb?", and push "I'm feeling lucky". This nice explanation is what you'll get.

Which illustrates my point very nicely. If you just want a quick answer to a specific question, Google is great. If you just want to explore a given subject, and don't really have any specific question in mind, considerably less so.

It takes a good bit of searching to be able to "just know" when typing a question in quotes on Google will get you the answer you need and when it won't, but after you are pretty good at searching, you find that you "just know" almost every single time. And hey, once you know, you know!

This is a very common phenomenon among programmers that never ceases to amaze me. We glory in our knowledge of arcane keyboard commands and macros. Just look at the tools we've made for ourselves--vi, EMACS, Unix, etc. All of these put the onus on the user, with the basic philosophy that it's the user's job to learn to use the program properly.

This, to me, is the very definition of a poorly designed program. If a user can't figure out how to use a program properly, this should not be taken as the user's fault, but the program's fault for having a poor interface and overall design. A well designed program is self-explanatory and intuitive. The onus should always be on the programmer to meet the user's need, I think, not on the user to decipher what the programmer had in mind.

The amount of refinement and arcane knowledge of modifiers, conjunctive symbols and Boolean logic required to use a normal, one-dimensional search engine like Google is defended with the same logic with which programmers defend the Impenetrable Mysteries of EMACS: it's the user's own fault for being a dumbass. But maybe if we as programmers took our job to provide a useful tool to the world a little more seriously, we would make tools that people could, well, use!

Disclaimer: I'm also an open source fanatic. I have SuSE 9.1 at home, Mozilla everywhere and even when I'm forced to use Windows, I still use OpenOffice.org. And yes, I very often use vi.
posted by jefgodesky at 5:26 AM on December 3, 2004


Neat trick: Try typing "(3x8)+15" into the search field. Then try "15 oz in ml".

Oh, you can do a lot better than that. See this thread.
posted by DevilsAdvocate at 7:52 AM on December 3, 2004


Okay, so I'm late to the party.
posted by rushmc at 8:20 AM on December 3, 2004


« Older Can you take a picture of a thought?   |   Watching movies in a difficult year Newer »


This thread has been archived and is closed to new comments