80 Million Tiny Images
January 15, 2009 5:46 PM   Subscribe

A visualization of all the nouns in the English language arranged by semantic meaning. [NSFW words included!]
Currently computers have difficult recognizing objects in images. While practical solutions exist for a few simple classes such as human faces or cars, the more general problem of recognizing all different classes of objects in the world (e.g. guitars, bottles, telephones) remains unsolved. Computer Vision researchers are currently investigating methods that can recognize and localize thousands of different object categories in complex scenes. A key component of these algorithms is the data used to train the computers' model of each object. Current approaches use collections of images gathered by hand. Our research explores how the billions of images available on the Internet can be used to train models for object recognition. With overwhelming amounts of data, many problems can be tackled with simple algorithms. We gathered from the web 79 million images. We are using this massive dataset to train a computer to recognize objects within an image and to understand the scenes depicted in photographs.

You can help... get better training data for computer vision algorithms by labeling some of the images.
WordNet previously
posted by carsonb (40 comments total) 24 users marked this as a favorite
 
That is really cool. I tried to find "sex" but couldn't. I assume it's one of the pink parts :)
posted by sevenyearlurk at 6:06 PM on January 15, 2009


Oh, here we go... top right, about a third of the way down. In the early 16000s.
posted by sevenyearlurk at 6:10 PM on January 15, 2009


"Sorry! This page only works with Mozilla Firefox and Internet Explorer. We may support other browsers in the future." - Message when one clicks on the images

...Computer Vision researchers are currently investigating methods that can recognize and localize thousands of different object categories in complex scenes browsers such as Safari, Opera, and Chrome.
posted by terranova at 6:10 PM on January 15, 2009


I started randomly clicking and the first one a hit had a bunch of pictures of spread eagle women. Now I clicked away before I had time to realize what I saw (so I may have been mistaken), but I think it's in there.
posted by Midnight Rambler at 6:11 PM on January 15, 2009


Go fig: the internet search-based visual dictionary is sort of inherently NSFW. Sorry!
posted by carsonb at 6:13 PM on January 15, 2009


Turns out Spock is next to otologists...cause they're both doctors, right? Oh dear.
posted by Sova at 6:17 PM on January 15, 2009


On the first random click, I hit "ex-spouse", which is precisely between "important person" and "Handel." It only stands to reason.
posted by ricochet biscuit at 6:17 PM on January 15, 2009


(PS: The sex pictures are down near the bottom left corner, between Wales and bracken.


Indeed, that is where sex is usually found.)
posted by Sova at 6:21 PM on January 15, 2009 [1 favorite]


We tessellate the poster using the hierarchy so that the proximity of two tiles is given by their semantic distance.

Without reading the WordNet link: Bullshit. "Mammary gland" is immediately adjacent to "fjord"? But forget anecdotes, how would you even do this in two dimensions especially with words that have multiple meanings?
posted by DU at 6:21 PM on January 15, 2009 [1 favorite]


My first click landed on "molester" (lovely) and when I clicked the image box it took me to this. Absolutely SFW, I promise, but does anyone have any idea what the fuck it is? A speaker on a 9 volt? Huh?
posted by barnacles at 6:21 PM on January 15, 2009


Didn't see sevenyearlurk's discovery of sex, but did find: 16742) Organ Grinder -- > 16743 Mata Hari -->16744 Talent Agent --> 16745 President Ford. Ah.
posted by ricochet biscuit at 6:23 PM on January 15, 2009


After noticing the proximity of "grey hen" and "black cock," I'd say computers still have a long way to go in semantic recognition.
posted by neroli at 6:25 PM on January 15, 2009 [1 favorite]


This is really cool to play with, but shouldn't they be looking at relationships between points on edge boundaries, the number of T, Y, and X intersections, or somehing more informative than simple color averages of often unreated images? And the words they use! Bostonian, nonbeliever, bugger, opthamologist, Aconitum Napellus?

I think I would train a vision system on a few simple images to begin with (e.g. hand, face, square, tree, pen, table, etc.) and build from there. Are they really laboring under the delusion that a heap of different images with some vague relation to often nebulous terms is going to uncover some useful piece of the puzzle for computer vision research?

Okay. That said the "Label Me" project seems like it would produce some useful material on which to train vision systems. So Kudos there.
posted by Avelwood at 6:34 PM on January 15, 2009


I think I would train a vision system on a few simple images to begin with (e.g. hand, face, square, tree, pen, table, etc.) and build from there.

You're not alone.
posted by carsonb at 6:40 PM on January 15, 2009


The 39000s are strange. You got your Moron, Dunderhead, Schlemiel, Scatterbrain, Nebbish, Loggerhead, Space Cadet, Goof, Tomfool, Schnook, Cuckoo, Flibbrertigibbet, Numskull, Ninny, Putz, but you also have Vegan, Cigarette Smoker and Snacker. Shit Head and Goofball, on the other hand, are in the 40,000s.
posted by Secret Life of Gravy at 6:45 PM on January 15, 2009 [2 favorites]


homosexual --> cocksucker --> philatelist

hmm.
posted by Durn Bronzefist at 6:55 PM on January 15, 2009


homosexual --> cocksucker --> philatelist

Collecting stamps blows?
posted by maxwelton at 7:04 PM on January 15, 2009


I'm just thinking his stamp-licking technique is all wrong.
posted by Durn Bronzefist at 7:09 PM on January 15, 2009 [1 favorite]


Very cool, although I don't understand how they decided on semantic meaning or relationships regarding the same.
posted by wastelands at 7:19 PM on January 15, 2009


PS: The sex pictures are down near the bottom left corner, between Wales and bracken.

Except somehow "spunk" (which was my third totally-random click, after "man of the cloth" and "thickener") landed in the lower-center right.

Oh, crap, I've set myself up for a pun.
posted by kittyprecious at 7:33 PM on January 15, 2009


Bad HTML, wouldn't even flow the text to fit my 1024 wide screen. Buh bye!
posted by intermod at 7:59 PM on January 15, 2009


I assume someone just misspelled fellatelist.
posted by graventy at 8:18 PM on January 15, 2009 [2 favorites]


My first click landed on "molester" (lovely) and when I clicked the image box it took me to this. Absolutely SFW, I promise, but does anyone have any idea what the fuck it is? A speaker on a 9 volt? Huh?

Barnacles, that device is called "molester". It's a small circuit attached to a piezo speaker, that will emit a short, annoying high-pitched BEEP every few minutes.

The idea is, you hide it in someone's house. The beep is too short and intermittent for them to be able to locate the sucker. And the 9V battery means it lasts for ages. Cheap way to send someone batshitinsane.
posted by Jimbob at 8:22 PM on January 15, 2009 [1 favorite]


Google -> molester "9 volt" -> http://gadgetfind.com/mindmolester.html
posted by CaseyB at 8:58 PM on January 15, 2009


First click? Gonad.
posted by six-or-six-thirty at 9:30 PM on January 15, 2009


Jimbob, CaseyB: many thanks! That's fascinating -- I'd never heard of such a thing before and yet I ... I think I need one now!
posted by barnacles at 9:30 PM on January 15, 2009


The semantically significant colours seem to be chlorophyl, water and blood.
posted by jouke at 11:02 PM on January 15, 2009


15076 is Sex. Sexual Intercourse is 15732. Bestiality is 17369. Now if you will excuse me, I'm going to go scrub my brain with bleach.
posted by robtf3 at 11:35 PM on January 15, 2009




First lead me to 'bacon rind'. I approve of this.
posted by slimepuppy at 2:39 AM on January 16, 2009


I got sphincter first go. Welcome to my life.
posted by nthdegx at 3:13 AM on January 16, 2009


On the first random click, I hit "ex-spouse", which is precisely between "important person" and "Handel." It only stands to reason.

My brain parsed that sentence as precisely between "important person" and "Grendel", which actually makes more sense.

My first click was 13239 Chauvinist, by the way.
posted by Enron Hubbard at 6:12 AM on January 16, 2009 [1 favorite]


I got "cubs" and there was no picture of Ernie Banks. I call bullshit.

"Welcome to Wales!" "Thank you. Er, could you tell me where I could find some... bracken?"
posted by languagehat at 6:17 AM on January 16, 2009 [1 favorite]


I love stuff like this. But the images themselves seem to be dominated by two things: the average colour and how well framed the picture is.
posted by Nelson at 7:11 AM on January 16, 2009


I'm disturbed by how often my random-clicking hit "beastiality."
posted by COBRA! at 7:35 AM on January 16, 2009 [1 favorite]


Which is not a word, even!
posted by kittyprecious at 8:24 AM on January 16, 2009


"Mammary gland" is immediately adjacent to "fjord"?

Hmm... how about "fjord" is close to "mountain" is close to "Grand Teton" is close to "Mammary Gland"?
posted by sevenyearlurk at 8:33 AM on January 16, 2009


homosexual --> cocksucker --> philatelist

Collecting stamps blows?


Nah. Blowing stamp collectors.
posted by Guy_Inamonkeysuit at 9:04 AM on January 16, 2009


Speaking of molesters, did I ever mention my Great Idea?

Which is, namely, to hid a half-dozen of them around an airport. Imagine the chaos!
posted by five fresh fish at 6:35 PM on January 16, 2009 [1 favorite]


The first item in the visual dictionary is "blind."
posted by flatluigi at 9:18 PM on January 16, 2009


« Older Baby, Please Don't Do Drugs   |   A blame game of language and depth-first search. Newer »


This thread has been archived and is closed to new comments