"You've got...WTF?"
August 7, 2006 8:33 AM   Subscribe

AOL releases 3-months of queries from 500k users. AOL, either fairly or unfairly, is sometimes considered the internet with training wheels. So while parsing this data, keep that in mind. Some of these queries seem like spam email subjects, don't they? Don't forget, this is the same demographic that brought you the September that didn't end. AOL tried to retract the data, but it's of no use - it's out there, on the web.
posted by rzklkng (89 comments total) 10 users marked this as a favorite

 
Oh dear/no/boy.
posted by Dr-Baa at 8:41 AM on August 7, 2006


This is apparently the search history of a disturbed individual who enjoys looking at corpses and is planning to kill his wife. But in the middle of staring at rotting flesh and planning murder, he apparently got a mite peckish and decided to go out for some "steak and cheese".

Steak and Cheese - The choice of an axe murderer near you.
posted by Justinian at 8:42 AM on August 7, 2006


Maybe AOL can borrow some lawyers from Verizon and AT&T.
posted by Blazecock Pileon at 8:45 AM on August 7, 2006


In the digg comments (digg users have gotten MUCH better and more MeFi like, but more prone to FARK-ish behavior. I personally find the comments features to work perfectly by punishing the stupid) in seems some OB/GYN was ego-surfing, and his query betrays his name and address. Do AnonIDs translate to usernames for AOL...
posted by rzklkng at 8:49 AM on August 7, 2006


Need a link to the data please.
posted by The Jesse Helms at 8:50 AM on August 7, 2006


One point that shouldn't be overlooked; AOL search is just a rebranded GOOGLE search. So this is equivalent to AOL taking it upon themselves to release a metric buttload of Google's search data. The data that Google went to the plate to fight the US government over in court? The data that marketers would have sold their firstborn for? That spammers would sell other people's firstborn for?

That data.
posted by Justinian at 8:50 AM on August 7, 2006 [1 favorite]


Jesse Helms:

List of mirrors

Knock yourself out.
posted by Justinian at 8:51 AM on August 7, 2006


You know, this is horrible, and outrageous, etc. That being said, it's like a stolen celebrity sex tape. "I'm outraged that it happened, that poor person, her privacy is so violated - hey, can you email it to me?" This so needs to be datamined.
posted by rzklkng at 8:52 AM on August 7, 2006 [1 favorite]


Wow. Have they not been reading the newspapers? Court cases? This seems really bad. (I didn't see the searches for spam-type subjects, maybe I'm dense.)
posted by ClaudiaCenter at 8:53 AM on August 7, 2006


That ServerBench teases me by starting out at 4500KB/sec then quickly dropping. I demand faster download mirrors!
posted by geoff. at 8:57 AM on August 7, 2006


Man if I hadn't stopped using AOL 13 years ago, I'd be worried.
posted by StrasbourgSecaucus at 8:57 AM on August 7, 2006


A (purported) related white paper from AOL (pdf):
ABSTRACT: We survey many of the measures used to describe and evaluate the efficiency and effectiveness of large-scale search services. These measures, herein visualized versus verbalized, reveal a domain rich in complexity and scale. We cover six principle facets of search: the query space, users' query sessions, user
behavior, operational requirements, the content space, and user demographics. While this paper focuses on measures, the measurements themselves raise questions and suggest avenues of further investigation.
posted by rzklkng at 8:58 AM on August 7, 2006


(Trumpet Winsock AOL 4L)
posted by StrasbourgSecaucus at 8:59 AM on August 7, 2006


There's torrents out there as well.
posted by rzklkng at 8:59 AM on August 7, 2006


Good thing I know the Heimlich, or I woulda choked on techcrunch's hypocrisy.

someone there had the sense to realize how destructive this was... Either way, the data is now out there for anyone that wants to use (or abuse) it.

AOL has released very private data about its users without their permission...the abilitiy to analyze all searches by a single user will often lead people to easily determine who the user is, and what they are up to. The data includes personal names, addresses, social security numbers and everything else someone might type into a search box.

If you are an AOL customer, I feel sorry for you.

versus!

The data is available here (this link is directly to the file)

The direct link to the data is still live. A cached copy of the page is here.

Sometime after 7 pm the download link went down as well, but there is at least one mirror site.
posted by soma lkzx at 9:01 AM on August 7, 2006


Steak and Cheese - The choice of an axe murderer near you.

I think that's the name of a site for gorehounds. Similar to ogrish.
posted by pieoverdone at 9:01 AM on August 7, 2006


This is just a colossal screw-up. AOL's core demographic is probably not savvy enough to realize the implications (if they even hear about the release of data). Everyone else has just been given a very good reason to consider a different service provider.

I'm also really bothered by the number of mirrors that immediately popped up. I know the mirrors will help to publicize AOL's actions, but their poor defenseless users are the ones who are really getting screwed.
posted by Galvatron at 9:05 AM on August 7, 2006


Well, come on I only keep AOL around to make sure my AOL PUNTERS!!! web site remains up to date and pertinent.
posted by geoff. at 9:07 AM on August 7, 2006


Yes, they're AOL users. Haven't the been screwed enough already?
posted by Justinian at 9:07 AM on August 7, 2006


From the readme.txt: Don't blame me if the formats borked:

500k User Session Collection
----------------------------------------------
This collection is distributed for NON-COMMERCIAL RESEARCH USE ONLY.
Any application of this collection for commercial purposes is STRICTLY PROHIBITED.

Brief description:

This collection consists of ~20M web queries collected from ~650k users over three months.
The data is sorted by anonymous user ID and sequentially arranged.

The goal of this collection is to provide real query log data that is based on real users. It could be used for personalization, query reformulation or other types of search research.

The data set includes {AnonID, Query, QueryTime, ItemRank, ClickURL}.
AnonID - an anonymous user ID number.
Query - the query issued by the user, case shifted with
most punctuation removed.
QueryTime - the time at which the query was submitted for search.
ItemRank - if the user clicked on a search result, the rank of the
item on which they clicked is listed.
ClickURL - if the user clicked on a search result, the domain portion of
the URL in the clicked result is listed.

Each line in the data represents one of two types of events:
1. A query that was NOT followed by the user clicking on a result item.
2. A click through on an item in the result list returned from a query.
In the first case (query only) there is data in only the first three columns/fields -- namely AnonID, Query, and QueryTime (see above).
In the second case (click through), there is data in all five columns. For click through events, the query that preceded the click through is included. Note that if a user clicked on more than one result in the list returned from a single query, there will be TWO lines in the data to represent the two events. Also note that if the user requested the next "page" or results for some query, this appears as a subsequent identical query with a later time stamp.

CAVEAT EMPTOR -- SEXUALLY EXPLICIT DATA! Please be aware that these queries are not filtered to remove any content. Pornography is prevalent on the Web and unfiltered search engine logs contain queries by users who are looking for pornographic material. There are queries in this collection that use SEXUALLY EXPLICIT LANGUAGE. This collection of data is intended for use by mature adults who are not easily offended by the use of pornographic search terms. If you are offended by sexually explicit language you should not read through this data. Also be aware that in some states it may be illegal to expose a minor to this data. Please understand that the data represents REAL WORLD USERS, un-edited and randomly sampled, and that AOL is not the author of this data.

Basic Collection Statistics
Dates:
01 March, 2006 - 31 May, 2006

Normalized queries:
36,389,567 lines of data
21,011,340 instances of new queries (w/ or w/o click-through)
7,887,022 requests for "next page" of results
19,442,629 user click-through events
16,946,938 queries w/o user click-through
10,154,742 unique (normalized) queries
657,426 unique user ID's


Please reference the following publication when using this collection:

G. Pass, A. Chowdhury, C. Torgeson, "A Picture of Search" The First
International Conference on Scalable Information Systems, Hong Kong, June,
2006.

Copyright (2006) AOL
posted by rzklkng at 9:10 AM on August 7, 2006


While this case may be confined to AOL users, this same type of data could easily be stored for anyone. Search google, and notice that little cookie your browser stored. Since you probably weren't anonymized, that cookie and its searches can now be easily tied to your IP Address. Got a gmail account with info about your identity? Check your mail with that same browser, and now your gmail address can be tied to your search queries.

Not picking on google, this could happen with any search engine. I also don't know that google actually correlates this info, but they easily could.

tor.eff.org ftw!
posted by jsonic at 9:13 AM on August 7, 2006


So it contains social security numbers and credit card numbers. I guess I should not be surprised that people are stupid enough to enter their card numbers into a search engine to see what they can find.

The link about the person searching for "dead people" and "decapatated photos" had a good point - the goverment has new reasons and some nice new ammunition in the fight to get access to search information. It will be especially useful as a scare tactic to win over public opinion on the matter.

"Look at this, this man trying to find ways to kill his wife and looking for photos of murdered people. This is why we need this information. These are the people we are after, not you innocent folks."
posted by weretable and the undead chairs at 9:19 AM on August 7, 2006


whoever created this tar archive must not've been too bright:

tar -zxvf ../AOL-data.tgz
AOL-user-ct-collection/U500k_README.txt
AOL-user-ct-collection/user-ct-test-collection-01.txt.gz
AOL-user-ct-collection/user-ct-test-collection-02.txt.gz
AOL-user-ct-collection/user-ct-test-collection-03.txt.gz
AOL-user-ct-collection/user-ct-test-collection-04.txt.gz
AOL-user-ct-collection/user-ct-test-collection-05.txt.gz
AOL-user-ct-collection/user-ct-test-collection-06.txt.gz
AOL-user-ct-collection/user-ct-test-collection-07.txt.gz


Yeah, let's gzip some gzip's.
posted by StrasbourgSecaucus at 9:26 AM on August 7, 2006


Yeah, let's gzip some gzip's.

I have the Internet on my laptop. I just zipped it up over and over until it was a single byte. It's "Ü".
posted by Plutor at 9:31 AM on August 7, 2006 [4 favorites]


my fav so far:

2178 inducing dog vomiting 2006-05-26 08:42:31 2 http://dogs.about.com
posted by StrasbourgSecaucus at 9:32 AM on August 7, 2006


I'm intrigued by Greg Linden's take. He initially saw it as a boon for academic researchers, and bemoans the fact that AOL removed it after "inflammatory blog posts" made "outlandish claims" about privacy violations. He continues:

Nevermind that no one actually has come up with an example where someone could be identified. Just the theoretical possibility is enough to create a privacy firestorm in some people's minds.

I am as concerned about privacy as any tech geek, but most of my concern is focused on things like millions of credit cards being leaked and millions of social security numbers being lost.

If someone comes up with a clear example of a privacy violation from this AOL data, I would be convinced. Until then, this looks to me like the mob of the blogosphere getting distracted in the shadows and missing the big privacy picture.

Unfortunately, the research community now will be denied a tool that could have helped push forward the state of information retrieval. Research that could have been accelerated will now be stalled. We all will suffer from the loss.


Seems to me he's overlooking the many ways spammers/data miners will be able to pull identifying info from such rich search information, but I'm curious if others feel the same way.
posted by mediareport at 9:32 AM on August 7, 2006


This is apparently the search history of a disturbed individual who enjoys looking at corpses and is planning to kill his wife.

Um, or a writer doing research for a mystery novel. You know, to not be so scare-mongering about it.
posted by mediareport at 9:39 AM on August 7, 2006


Has anyone tried loading this into ms access? should I maybe install mySQL to handle it?
posted by joecacti at 9:52 AM on August 7, 2006


Thanks Justinian.
posted by The Jesse Helms at 9:53 AM on August 7, 2006


I use a program called "LTF Viewer" which does the job well enough for me not to try to get another text viewer. It'd be nice if someone were to format this in an easier to view mode. I think I found the recent influx in AskMetafilter questions:

42590 not my daughters jeans 2006-03-25 11:36:41 4 http://ask.metafilter.com
76810 chemosphere house from body double 2006-03-23 01:38:30 7 http://www.metafilter.com
163942 benny hinn's wife holyghost enema 2006-05-11 10:12:22 4 http://www.metafilter.com
258435 teen guys circle jerks jacking off 2006-03-31 15:17:16 5 http://www.metafilter.com
528269 sexual movies porn 2006-05-11 23:36:12 206 http://ask.metafilter.com

Well it keeps on going ... I have yet to come across any sensitive information -- I do not doubt it is there.
posted by geoff. at 9:59 AM on August 7, 2006


even if he was planning on killing his wife -- theres not much anyone could do about it until she was dead

you can arrest someone for looking up "how to kill your wife" -- so the internet changes nothing right? ;)
posted by Satapher at 10:02 AM on August 7, 2006


cant, jeez
posted by Satapher at 10:02 AM on August 7, 2006


"Unfortunately it's also trivial to pull SSN's and tel #'s from these files:

example - count SSN's:

$ grep (...)

(191 SSN's found)"

(the (...) part of the code is in the comment.)
posted by merelyglib at 10:05 AM on August 7, 2006


This is apparently the search history of a disturbed individual who enjoys looking at corpses and is planning to kill his wife.

Um, or a writer doing research for a mystery novel. You know, to not be so scare-mongering about it.


And he's looking at pictures of dead people so as to accurately describe them instory. Pretty plausible I think.
posted by bob sarabia at 10:08 AM on August 7, 2006


a good grep expression to use for SSNs is '[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]'. i am finding boatloads! some even associated with names!
posted by StrasbourgSecaucus at 10:33 AM on August 7, 2006


let's gzip some gzips

This isn't that uncommon. You don't gain any additional compression but you get it all into one package. The CPU overhead of the unproductive attempt to compress is not that important, and it's more than offset by the server-side gain because the mime/type informs (intelligently written) web servers that the file isn't compressible so they don't try to compress on the fly to browsers that have said they can take compressed data.
posted by George_Spiggott at 10:37 AM on August 7, 2006


What makes someone fire up their AOL search and say "Hmm, I know what I'll do today! I'll search for my social security number." Seems a little odd.
posted by reklaw at 10:42 AM on August 7, 2006


"Man if I hadn't stopped using AOL 13 years ago, I'd be worried."

One reason you should be worried - From the ease at which random websurfers are able to use a person's Google searches to figure out their real world identity. We keep hearing from those seeking and releasing this sort of data, "We just want the anonymous data. There is no identifying information included." That claim is now invalid since we can demonstrate how easy it is to turn such anonymous data into a name and address.
posted by Wizzlet at 10:45 AM on August 7, 2006


Who in their right mind would search for their social security number?
posted by caddis at 10:46 AM on August 7, 2006


What a fun data set to play with. Thanks, AOL! By the same token, you'll never get a red cent of my business because you absolutely cannot be trusted.
posted by Fezboy! at 10:48 AM on August 7, 2006


This isn't going to end well. It's not just fodder for nerds, it's fodder for PACs and the like. 1.5% of all searches from AOL users for 3 months. Cheesey pete.
posted by cavalier at 10:48 AM on August 7, 2006


a good grep expression to use for SSNs is '[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]'. i am finding boatloads! some even associated with names!

That presupposes that people with SSNs that begin with 0 actually reference the beginning zero.
posted by thanotopsis at 10:50 AM on August 7, 2006


[thanotopsis, why wouldn't you reference the beginning zero? It's part of the nine-digit number. Most folks born in northeastern US states have SSN's that start with zero.]
posted by mediareport at 10:59 AM on August 7, 2006


a good grep expression to use for SSNs is '[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]'. i am finding boatloads! some even associated with names!

That also presupposes that you'd put dashes in your web search
posted by clearlynuts at 11:02 AM on August 7, 2006


plenty of people seem to.
posted by StrasbourgSecaucus at 11:17 AM on August 7, 2006


Well, AOL is laying off a few thousand people. Maybe they're trying to provide them exciting new careers as identity thieves.
posted by Justinian at 11:32 AM on August 7, 2006


I just searched for my SSN... I really didn't expect a result, and there was none, but I used dashes... Without the dashes, the most interesting result was Venelles, France. I've got to get my hands on some mashed potatos and make a sculpture of the area...
posted by joecacti at 11:41 AM on August 7, 2006


Why would they save this?

Seriously, what can be gained (by aol/google) from logging the searches?

The value of a search engine is in the searching algorithm itself. The search terms have no value after the results have been delivered.

It's not like google will email 3 days later and say "oh, i found some other results for you".

Hmm.

*scribbles idea*

It seems like a huge PR risk for basically no return.

Everyone says this is so valuable to spammers or to marketers.

The only thing valuable to spammers is a valid email address.

Today I have done google searches for a few things, among them: FICA percentages, an area code I didn't recognize, and quotes from the movie Airplane. I think some people vastly overestimate the mad scientist potential of marketers. "OMG! Someone not in area code 802 is interested in federal withholdings and bad movies! PROFIT!!!!"
posted by Ynoxas at 11:54 AM on August 7, 2006


Why would they save this?

advertising
posted by notswedish at 12:03 PM on August 7, 2006


Who in their right mind would search for their social security number?
posted by caddis at 10:46 AM PST


Not necessarily their own.

People might be putting in other people's SSNs, looking for names, etc.
posted by dontoine at 12:08 PM on August 7, 2006


I just started looking through results from metafilter. Favorite so far "raw chicken smells very bad".
posted by bob sarabia at 12:24 PM on August 7, 2006


I was pretty pissed off with that search for "weretable sucks." From an AOLer! The nerve.
posted by weretable and the undead chairs at 12:35 PM on August 7, 2006


My wife has a theory that this is just AOL's way of complying with the government's demands that they turn over this data while being able to maintain that it is a mistake and that they did not intentionally turn over any data to the government.
posted by nlindstrom at 12:50 PM on August 7, 2006


Hmmmm...I may go to the library -- not saying which township's -- and put in somebody else's SSN to see what turns up.
posted by pax digita at 12:56 PM on August 7, 2006


My wife has a theory that this is just AOL's way of complying with the government's demands that they turn over this data while being able to maintain that it is a mistake and that they did not intentionally turn over any data to the government.
posted by nlindstrom at 2:50 PM CST on August 7 [+] [!]


Oh my. I like how her mind works.

*sends flowers to nlindstrom's wife, looks away innocently*
posted by Ynoxas at 1:05 PM on August 7, 2006


"My wife has a theory..."

Actually, if they were being privately coerced by the gov't, putting it out in public would be a way of giving the gov't the info they asked for while also levelling the playing field, but if that were the case they'd come forward and say they were being coerced right? I think they just goofed. Though it might be nice for people wanting to use this as another excuse to diss the current administration? I don't see enough evidence to that. I just see somebody goofed. Accidents happen among people.
posted by ZachsMind at 1:29 PM on August 7, 2006



36723 a bathing ape 2006-05-14 22:43:53
36723 a bathing ape 2006-05-14 22:54:50
36723 a bathing ape 2006-05-14 22:56:20
36723 a bathing ape 2006-05-14 22:57:19
36723 a bathing ape 2006-05-14 22:59:12
36723 a bathing ape 2006-05-14 23:00:46
36723 a bathing ape 2006-05-14 23:01:42
...
96077 chased by dinosaurs 2006-03-16 21:57:14


This is pretty funny, privacy issues aside.
posted by cmonkey at 1:51 PM on August 7, 2006 [1 favorite]


Ynoxas - With search query logs you can see how people use your service, and you can use the information to try to improve it. You might be able to do things like find patterns of sequential search terms that people are using and notice how the first sets of results aren't giving them what they want. So Google won't email you in three days with better results, but they might be able to update their algorithms (the important part, as you say) using the information in these logs. So that next time you or anyone else searches for something in a similar way, they can get the better results.

And Greg Linden's take is either disingenuous or he's not thinking very hard. We've already mentioned here an ego surfing OB/GYN in the data. I'm pretty sure entering your name into a search engine is common practice for a lot of people. A search for Ignatius P. Aldersquiggle isn't guaranteed to come from the person himself, but in many cases it will, and it won't be too hard to bring in other information you might have about the person to find their particular search queries. Know where they live and have linked access to {local/maps}.searchengine.com queries? You'll know without question.

Once you have a link from a uniqueID/IP/username/whatever to a person's name or other identifying information, you then have a log of all sorts of things that person is interested in. And even if none of it is illegal to enter into a search engine, a lot of that information is still very private. Just for example, how many of you use search engines to find information on medical conditions you'd rather not tell the world about? Financial issues? Job searches while employed? (And yeah, porn.)

So I wouldn't be so worried about spammers, marketers, or even the government - the issue is individuals learning things about other individuals. I promise you I'd be at least embarrassed if someone I knew looked at my complete search query log, and there's a reasonable chance someone I didn't know could cause me some trouble as well.
posted by whatnotever at 2:47 PM on August 7, 2006


According to a Nielsen NetRating presentation in today's Search Engine Strategies Conference in San Jose, AOL search is used mainly by elderly singles. They skew dramatically to the 50+ crowd. I wonder if that explains some of the crazy searches we are seeing....

Under 50? Keep using Google and Yahoo. Everyone else is.
posted by Dantien at 2:51 PM on August 7, 2006


After skimming one of the test sets one thing is clear: An alarming number of AOL users are Pete Townsend.
posted by justkevin at 3:00 PM on August 7, 2006 [1 favorite]


torrent link.
posted by delmoi at 3:03 PM on August 7, 2006


I wonder the total number of people who search their own name or a personal name. Hmm.

Is there a place I can download a list of common first and last names to search for?
posted by delmoi at 3:08 PM on August 7, 2006


It's not like google will email 3 days later and say "oh, i found some other results for you". Hmm. *scribbles idea*

Oh god, this would be a nightmare. Looking at my Search History, I've made 167 searches in the past week. If they emailed me about every single one.. ugh..
posted by Plutor at 4:09 PM on August 7, 2006


(a) There's got to be some percentage of these users who inadvertently submitted these searches due to the browser auto-forward to search when a non-URL is entered into the browser. For example, I'm on a bank website, I get an IM, I switch back to the window in some way that ends me up in the URL bar instead of the form. One normally makes two assumptions when they find themselves in such a situation -- security through obscurity and protection of privacy. There goes that concept... People should start demanding the default disable of this feature. It's one thing to submit a search knowingly, with certain basic understandings. It's another to do so accidentally because of "helpful" browser features.

(b) I'd assume that the release of this information provides reasonable suspicion to subpoena the actual user information. In the grand scale of "low hanging fruit" this certainly jumps to the front regardless of what AOL has already released under narrow or broad subpoenas. So... good luck, AOL folks.

(c) Criminal negligence. These are the research folks... if anyone knows what data was in there before it was released, it was the research folks.
posted by VulcanMike at 4:53 PM on August 7, 2006


Ooh. A lot worse, per Wired.
posted by VulcanMike at 5:37 PM on August 7, 2006


Who in their right mind would search for their social security number?

Until this little fiasco, I would have just to see if it somehow got leaked all over the damn internet.
posted by dirigibleman at 5:47 PM on August 7, 2006


Holy shit. This is someone's search history (edited down slightly):

"revenge tactics"
"marblehead massachusetts library"
"the woman's book of revenge"
"how to torment someone"
"alt.revenge"
"dirty tricks for chicks"
"21st century revenge"
"stories or samples of revenge"
"encyclopedia of revenge"
"voice changer"
"underground help for revenge"
"real-time spy"
"how to drive someone crazy"
"how to get revenge on an old lover"
"i hate my ex boyfriend"
"how to really make someone hurt for the pain they caused to someone else"
"free email addresses"
"lolitampegs.com"
(Street address removed)
"map quest"
"names of dogs"
"makehimsweat.com"
"makehimsuffer.com"
"a desert terrain where fifteen different salts crunch underfoot where is this place"
"mailman who delivered in the clutch."
"sounds of different voices"
"voices i can download"
"anonymous sms text messenger"
"hurting from an old lover"
"to send anonymous text"
"how to report child neglect in the state of new hampshire"
"free articles on gay life that can be mailed to me"
"messages on gay life that can be emailed to me"
"sites for men haters"
"things to send to your old lover via email"
"gay phone numbers"
"free christian things"
"free gay articles i can get in the mail"
"free mature old people things"
"how to move on from a broken heart"
"how to send junk mail to someone else"
"catchhimandkeephim"
"free stuff for contractors"
"how to send things anonymously"
"how to ruin someone's credit"
"re booting a computer"
"how to permantlly delete information from your hard drrive"
"permanently erase hard drive data"
"how to send alot of junk mail to someonne without them knowing where it came from"
"how to send email anonymously"
"how to stop loving someone"
"www.match.com"
"the worst thing to send someone via email"
"tnt.com"
"lolitampegs.con"
"love letters"
"hate letters"
"hate love letters"
"what can i do to an old lover for revenge"
"free gay information that can be mailed to me"
"free gay information that can be mailed to someone's home"
"tnt in salem new hampshire"
"evil things to do to someone"
"disgusting things people do to an old lover"
"finding people absolutely free"
"licence plate numbers who do they belong to"
"new hampshire licence plate ending in (REMOVED)"
"licence plates numbers in new hampshire can i pick them out"
"where to register in new hampshire for a vanity plate"
"director of home contractor registration"
"home contractor background check in masssachusetts"
"contractors phone numbers in plaistow new hampshire"
"nextel phone numbers"
"harley davidson rentals in new hampshire"
"buildingtrades.com"
"a list of massachusetts contractors with licence number (REMOVED)"
"mean revenge tactics"
"divorce pranks'"
"deaths in hampstead new hampshire"
"death records in hampstead new hampshire"
posted by anonymous_k at 6:55 PM on August 7, 2006


anon_k: Holly fuck indeed. WTF was AOL thinking? (as if that hadn't been said enough).
posted by delmoi at 7:04 PM on August 7, 2006


Someone should write a novel based on those searches, anonymous_k.
posted by Jimbob at 7:06 PM on August 7, 2006


Those might be some of the stupidest searches I've ever seen. You're not going to get anything near useful if you type fifteen random words in Google. What do you think it is, eliza?

Eliza: Hello. I am ELIZA. How can I help you?
You: how to send alot of junk mail to someonne without them knowing where it came from
Eliza: Does that question interest you?
You: a desert terrain where fifteen different salts crunch underfoot where is this place
Eliza: Please go on.
You: how to really make someone hurt for the pain they caused to someone else
Eliza: Does that question interest you?
You: god you're an idiot

posted by Plutor at 7:25 PM on August 7, 2006


anon_k: I was just looking at that, pretty fucking scary. #2708 in collection 2.
posted by bob sarabia at 7:56 PM on August 7, 2006


There are some really interesting stories in the logs. Accounts where it's clear that a child is doing his homework earlier in the evening and a parent is doing health research later in the evening. Most heartbreaking is a query about one's husband being unable to uninate -- the query is phrased in a way to show desparation.

For those keeping track for their legislative agendas, so far it's:
Queries about healthcare, including the diagnosis of health issues and the identification of afforadable healthcare services:
Many

Queries about families being destroyed by gay marriage:
None
posted by VulcanMike at 8:13 PM on August 7, 2006 [2 favorites]


Regarding the Wired article, there is positively no way the government would fine AOL 2/3 of a billion dollars for a log file of anonymous search terms.

The person in anon_k's post clearly has no idea whatsoever of the proper way to get search results.

Plutor: I was thinking more along the lines of newly spidered info that would match highly on your previous request. Also, it would have to be a toggle on/off, etc. I don't work for a search engine, so there's precious little chance my idea would ever assault you, heh.

whatnotever: I see what you're saying, but I think getting through the garbage to anything worthwhile would be prohibitive. I think the info gathered by, say your credit card company, is almost infinitely more valuable. This strikes me as very "bottom of the barrel" desperation type marketing. Which perhaps is fantastically lucrative.
posted by Ynoxas at 8:44 PM on August 7, 2006


To all the naysayers out there I say this:
-Within 15 minutes of downloading this, I located 300 searches made by my best friend. I am certain it was him.
-Within 30 minutes I had located the search strings of the lady who was the interviewer for my Harvard interview.

This is really quite dangerous. And incredibly fun.
posted by matkline at 10:02 PM on August 7, 2006


on a related note:

I did a Google Trends search on "kill wife."
Actually, I did dozens of searches, adding the phrase "kill wife" to other phrases to try to get some perspective on this epidemic of awful "kill wife" searches.

Well..."kill wife" was much more popular than "kill husband" everywhere and almost as popular as "itchy anus" in New York, but much less popular than "itchy anus" in Toronto and London (although "kill wife" did beat "itchy anus" in Chicago, but only by a hair.)

However, "kill wife" was less popular than "kill self" (they almost tied in NY).

"kill wife" was around 1/2 to 2/3's as popular as "kill children" and all of the kill options (wife, husband, self, children) were less popular than "kill bush." (except in Toronto, where "kill children" won, but barely).

"killing joke" came close to "kill bush" in Chicago and New York, and beat "kill bush" in Toronto, but (predictably) dwarfed all others in London.

Finally(and I had no time for the work I was supposed to be doing tonight, go figure)...to get even more perspective...every one of these searches was made to look impossibly tiny by the volume of searches for "britney spears."

So. What does this mean?
I have no idea.
posted by mer2113 at 10:10 PM on August 7, 2006


"makehimsweat.com"
"makehimsuffer.com"


I like the fact that someone searched for these rather than just typing the URLs in.
posted by mazola at 10:27 PM on August 7, 2006


"deaths in hampstead new hampshire"
"death records in hampstead new hampshire"


If someone ever audited my search records, they'd find a lot of similar queries to the above...seeing as I'm doing genealogy research. Of course, why I was looking for "leather chaps for horses" can't be so easily explained away.
posted by thanotopsis at 11:00 PM on August 7, 2006


The thing I find most interesting is that sometimes the search histories are like little tiny stories.
270228: 672368 you're pregnant he doesn't want the baby
270325: 672368 can christians be forgiven for abortion
270355: 672368 abortion clinic charlotte nc
posted by borkencode at 11:20 PM on August 7, 2006 [1 favorite]


I like the fact that someone searched for these rather than just typing the URLs in.

The Google search bar is the new address bar.
posted by caddis at 11:39 PM on August 7, 2006


How come no web interfaces yet?
posted by runkelfinker at 2:19 AM on August 8, 2006


runkelfinker, you ask, the internet provides: aolsearchdatabase.com. The site is of unknown ownership (spam site?) Same problems and caveats - do not put your name in, your VISA number, your SSN, etc.
posted by rzklkng at 5:41 AM on August 8, 2006


Hmm, there doesn't seem to be a database behind that site.
posted by runkelfinker at 6:18 AM on August 8, 2006


quick perly frequency analysis (first thing that popped into my head; boring I know).

of 1125468
in 936272
the 837298
for 698426
and 692358
to 470245
free 449102
a 367171
google 363020
new 270158
http 251125
on 250902
pictures 236473
county 231574
yahoo 217750
how 208600
lyrics 188800
my 186861
school 182564
myspace 176714
sex 166225
ebay 160174
florida 159871
com 155790
sale 145346
with 144472
city 144242
home 140896
american 138907
state 136993
www 127214
is 124918
.com 121237
what 120478
games 119897
texas 118366
music 117860
york 115674
yahoo.com 110711
bank 109857
black 108839
beach 108607
nude 108228
i 106995
high 104911
online 102669
aol 102016
by 101968
news 101424
map 101226
pics 101138
girls 98942
college 96254
you 96063
2006 92866
car 92701
real 91862
mapquest 89954
from 89768
university 88975
jobs 87936
center 87283
google.com 86572
at 86512
myspace.com 86083
posted by primer_dimer at 7:07 AM on August 8, 2006


mediareport writes "or a writer doing research for a mystery novel. You know, to not be so scare-mongering about it."

That was my guess, a guy planning to kill his wife doen't need to see pictures.

reklaw writes "What makes someone fire up their AOL search and say 'Hmm, I know what I'll do today! I'll search for my social security number.' Seems a little odd."

How else are you going to find out if your SIN is on the internet somewhere?

mazola writes "I like the fact that someone searched for these rather than just typing the URLs in."

A fundemental disconnect on how a web browser works.

primer_dimer writes "myspace 176714
"sex 166225"


Christ, MySpace is more popular than sex.
posted by Mitheral at 8:07 AM on August 8, 2006


mazola writes "I like the fact that someone searched for these rather than just typing the URLs in."

Mitheral replies A fundemental disconnect on how a web browser works.

Probably, but it might be worth noting that links you click won't show up in the immediate pull-down addressbar history, unlike URLs you type in. I did that too, sometimes, when I was still using the family computer. *ahem* Of course, they'll still be in the browser history, and you have to turn off auto-complete. Ah, the illusion of secrecy.
posted by mumble at 9:14 AM on August 8, 2006


Back before MS bought hotmail I had a user who got to hotmail by:

1) typing a specific search into altavista
2) navigating to a specific result
3) scrolling to the bottom of the result and then clicking a link to hotmail.

It was like some weird six degrees of separation game every time she checked for mail.
posted by Mitheral at 10:03 AM on August 8, 2006


Back before MS bought hotmail I had a user who got to hotmail by:

That was in a similar era where one had to click the icon for Winsock in order to surf the web. Customers could sometimes get a little crazy with what they put in the recycle bin.

Back then, it wasn't uncommon for customers to call the support line for the ISP at which I worked and say: "I deleted the Internet. What do I do?"
posted by thanotopsis at 2:30 PM on August 8, 2006


And now, in new and improved, easy to search format!
posted by dejah420 at 12:01 PM on August 9, 2006


how to tell your 5 year old daughter your pregnant and giving the baby up for adoption 2006-05-07 02:58:46
free government money 2006-05-05 09:08:31
used clothing stores in tucson az 2006-05-05 06:07:03
emergency care for stray animals in tucson az 2006-05-05 09:26:58
what sells at swap meets 2006-04-21 15:29:07
cheap decorating ideas for large rooms 2006-04-22 05:58:48
arizona state prison inmate data research 2006-05-22 18:52:22

God. It's like looking right into the trailer.
posted by GooseOnTheLoose at 8:14 AM on August 11, 2006


« Older The Bureau of International Expositions....  |  The Importance of Punctuation.... Newer »


This thread has been archived and is closed to new comments