Join 3,514 readers in helping fund MetaFilter (Hide)


Descriptive Camera
April 25, 2012 7:00 AM   Subscribe

Descriptive Camera, 2012 "The Descriptive Camera works a lot like a regular camera—point it at subject and press the shutter button to capture the scene. However, instead of producing an image, this prototype outputs a text description of the scene."

Rather then using advanced AI, it simply passes the description task to Amazon's Mechanical Turk system, which in turn distributes the job to humans. It takes between 3-6 minutes to get a description.
posted by delmoi (51 comments total) 32 users marked this as a favorite

 
Add Braille output and all of Martin's problems would be solved.

I would love to read some descriptions, and read several descriptions of the same thing. People are weird and non uniform.
posted by Trivia Newton John at 7:07 AM on April 25, 2012


Oh my god we are a stone's throw from Ray's Rolex.

Also the samples are wonderful. Someone spelled "dilapidated" correctly but misspelled "repair." Another complained about the pixelationof the image. A third person mentioned that the cupboards are ugly. This is the best thing.
posted by griphus at 7:09 AM on April 25, 2012 [8 favorites]


Now with a little bit of work we can develop a camera that you point at people, and it generates insults. Hurrah!
posted by GenjiandProust at 7:13 AM on April 25, 2012 [3 favorites]



-----------------------------------
| |                             | |
| |                             | |
| |     How come you took a     | |
| |  picture of your junk, man. | |
| |      Like why would you     | |
| |        even do that.        | |
| |                             | |
| |                             | |
| |                             | |
| ------------------------------- |
|                                 |
|                                 |
-----------------------------------

posted by theodolite at 7:18 AM on April 25, 2012 [55 favorites]


Metafilter: you point at people, and it generates insults. Hurrah!
posted by Fizz at 7:24 AM on April 25, 2012 [7 favorites]


-----------------------------------
| |                             | |
| |                             | |
| |   Like the moonless night   | |
| |  you have again managed to  | |
| |      image the lens cap     | |
| |                             | |
| |                             | |
| ------------------------------- |
|                                 |
|                                 |
-----------------------------------

posted by eriko at 7:29 AM on April 25, 2012 [19 favorites]


I've had some fun doing a terrible job at mechanical turk tasks (just Google for my terrible article on scholarships for left handed people), but these seem like they could actually be a lot of fun to write well.
posted by Bulgaroktonos at 7:32 AM on April 25, 2012


You can a variable rate for Mechanical Turk task, no? I wonder how much each picture (texture?) costs?
posted by Hutch at 7:35 AM on April 25, 2012


You can >pay< a variable rate
posted by Hutch at 7:37 AM on April 25, 2012


Glad to see we finally have a workaround for the lack of an image tag on this site.
posted by cjorgensen at 7:37 AM on April 25, 2012 [8 favorites]


I wish the descriptions hadn't been displayed alongside their corresponding images.

But then, they would have to be, wouldn't they? Otherwise they could be descriptions of anything, anywhere.
posted by notyou at 7:41 AM on April 25, 2012


Now we just need the inverse: insert a description and it outputs an image.
posted by Obscure Reference at 7:43 AM on April 25, 2012 [1 favorite]


Still. Terrific. Hook this up to a giant mechanical writer* and let it trace the output on a gallery wall.

---------------
*Powered by the exhalations of the gallery's visitors.
posted by notyou at 7:44 AM on April 25, 2012


Now we just need the inverse: insert a description and it outputs an image.

I think that was the purpose of this project which appears, sadly, to be defunct.
posted by codacorolla at 7:51 AM on April 25, 2012 [1 favorite]


	-----------------------------------
	| |                             | |
	| |                             | |
	| |        A naked man 		| |
	| |     stretching his anus     | |
	| |   with both of his hands    | |
	| |                             | |
	| |                             | |
	| |                             | |
	| |                             | |
	| |                             | |
	| ------------------------------- |
	|                                 |
	|                                 |
	-----------------------------------

posted by mazola at 8:01 AM on April 25, 2012 [22 favorites]


I was prepared to be very impressed with a camera that could understand what it was seeing. I was assuming it would be very poor, but wanted to see whatever clever things had been done about the hugely difficult image recognition problem.

Instead, this thing is just sending a picture to someone, who types in a description. Uh, yay?
posted by Malor at 8:04 AM on April 25, 2012 [1 favorite]


And a new Metafilter meme is born.
posted by Naberius at 8:07 AM on April 25, 2012


-----------------------------------
| | | |
| | | |
| | You have broken | |
| | the camera | |
| | you idiot. | |
| | | |
| | | |
| | | |
| | | |
| | | |
| ------------------------------- |
| |
| |
-----------------------------------
posted by flapjax at midnite at 8:08 AM on April 25, 2012 [6 favorites]


"Another god damn lawn chair in black and white."
posted by bondcliff at 8:12 AM on April 25, 2012 [1 favorite]


Next up: The Chinese Room, courtesy of an AWS API key and my automated second-world servants
posted by crayz at 8:13 AM on April 25, 2012 [3 favorites]


What size font do you have to use to fit a thousand words into a reasonable space?
posted by fairmettle at 8:16 AM on April 25, 2012 [2 favorites]


Someone with more know-how and shakier ethics than me needs to rip this off, make it a smartphone app, include the photo-taking and description-writing as two different modes in the same app, allow for some kind of “who’s written the most descriptions” scoreboard to incentivise the boring part and make the process free (apart, of course, from in-app-purchases which provide photo filters etc.)

And that’s how someone’s good idea becomes someone else’s money.
posted by him at 8:17 AM on April 25, 2012 [4 favorites]


I wonder if an elastic cloud of poor people could be used to pick cotton by guiding plantation drone bots from their smartphones in Africa. Truly, the future is here
posted by crayz at 8:19 AM on April 25, 2012 [4 favorites]


If the described technology was real, i.e. the camera was actually doing the describing, and not sending it to an outside interpreter, I feel like that+braille would change lots of peoples' lives.
posted by FirstMateKate at 8:26 AM on April 25, 2012


I think this is wonderful, but I'm not quite sure why.

@him - oh that's a great idea! A kind of mashup of Instagram and Drawception.
posted by jiroczech at 8:28 AM on April 25, 2012 [1 favorite]


OMG it uses Mechanical Turk, costs $1.25 per picture and takes six minutes. This is so great! We've successfully brought Polaroid film back but made it into a social media!
posted by DU at 8:31 AM on April 25, 2012 [5 favorites]


(That was only partly sarcastic. This really is great.)
posted by DU at 8:32 AM on April 25, 2012


Apparently a picture is worth 23.25 words, on average.
posted by snofoam at 8:38 AM on April 25, 2012 [1 favorite]


     -----------------------------------
     | |                            |  |
     | |      so much depends       |  |
     | |      upon                  |  |
     | |                            |  |
     | |      a mechanical          |  |
     | |      turk                  |  |
     | |                            |  |
     | |      crammed inside        |  |
     | |                            |  |
     | |      the toy               |  |
     | |      camera                |  |
     | |                            |  |
     | ------------------------------- |
     | |                            |  |
     | |                            |  |
     -----------------------------------

posted by gauche at 8:51 AM on April 25, 2012 [4 favorites]


If the described technology was real, i.e. the camera was actually doing the describing, and not sending it to an outside interpreter, I feel like that+braille would change lots of peoples' lives.
Well, the technology is real, it just costs money. Actually there are AIs out there that do describe things, but I don't think the results would be nearly as interesting from an artistic standpoint. What makes this so interesting is the... humanness of the results. Would an AI ever describe a shelf as ugly, or a building as in need of repair? I mean even if they could they probably wouldn't be programmed that way.

Machine vision is getting pretty advanced. Here's a paper supposedly describing a scene description system using a Kinect and and iPad for the visually impared.

Generating text descriptions of things by computer has been possible for a long time, and object recognition is a huge field of study in machine vision. You could probably code something today that would do an OK job. Certainly you could recognize people.
OMG it uses Mechanical Turk, costs $1.25 per picture and takes six minutes. This is so great! We've successfully brought Polaroid film back but made it into a social media!
I think they did that already
posted by delmoi at 9:12 AM on April 25, 2012


A commodity that comes with it's own commodity fetish.

Imagine a child playing with this for a few years before being told actual human beings type the descriptions.
posted by phrontist at 9:41 AM on April 25, 2012


i thought some company, think canon may already have this product. But regardless, its still a very cool concept.
posted by Phily2k at 9:57 AM on April 25, 2012


Now we just need the inverse: insert a description and it outputs an image.

I want a system that converts metafiler commentary threads into a single picture. It would make reading metafilter much simpler.

Of course three quarters of the threads would result in a picture of a fist with the middle finger extended. Oh well.
posted by happyroach at 9:59 AM on April 25, 2012


Huh. Pretty close:
	-----------------------------------
	| |                             | |
	| |                             | |
	| |     a red wheel 		| |
	| |     barrow                  | |
	| |     glazed with rain        | |
	| |     water                   | |
	| |     beside the white        | |
	| |     chickens                | |
	| |                             | |
	| |                             | |
	| ------------------------------- |
	|                                 |
	|                                 |
	-----------------------------------

posted by klarck at 10:01 AM on April 25, 2012 [3 favorites]


Instead, this thing is just sending a picture to someone, who types in a description. Uh, yay?

Add a text-to-speech module and it would be a visual version of the Voice Relay Service for TTY users.
posted by Mars Saxman at 10:29 AM on April 25, 2012


Living in LA you see a lot of character actors on the street you kind of recognize, but can't pinpoint exactly what they were in. This camera would be incredibly useful for that. "A busy street, an old rusted light post, the serial rapist's roommate from that one Law and Order."
posted by joechip at 10:31 AM on April 25, 2012 [1 favorite]


At the SF MOMA right now, there's an exhibit with a big screen that describes everything happening in the room, as though it were a script for a play. After you realize it's describing you, you look around and realize there's a guy at a table nonchalantly typing it. It's fun for about a minute, then it makes you feel really weird and self-conscious.
posted by roll truck roll at 10:39 AM on April 25, 2012 [6 favorites]


Here's what I want to happen:

One descriptive camera takes a picture of another descriptive camera.

Result = ?
posted by jeremias at 10:48 AM on April 25, 2012


Wait, what is the point of this? If you’re just sending a picture to someone to type a description, couldn’t you just type a description, and wouldn’t it be better?
posted by bongo_x at 11:03 AM on April 25, 2012


You might be thinking about it wrong, bongo. I don't think these are going to go into mass production. It's:

1) An example of how you can do interesting things when you combine automation with crowdsourced labor, and
2) A funny sort of meditation on the purpose and limitations of a camera.
posted by roll truck roll at 11:09 AM on April 25, 2012 [1 favorite]


I’m fully aware that I’m not getting it. So, it’s like some sort of Rube Goldberg machine for asking someone "what does this look like?"
posted by bongo_x at 11:36 AM on April 25, 2012 [3 favorites]


That first sentence needs some sort of smilie thing on it, by the way.
posted by bongo_x at 11:37 AM on April 25, 2012


Wait, what is the point of this? If you’re just sending a picture to someone to type a description, couldn’t you just type a description, and wouldn’t it be better?

You're asking if it would be better to do something for ones-self, rather than have an invisible poor person do it for you? Please. This is America
posted by crayz at 11:46 AM on April 25, 2012 [7 favorites]


Is there already a website of alt text farce comedy? There ought to be.
posted by rahnefan at 11:49 AM on April 25, 2012


Less Ray's Rolex and more a first step toward Stephenson's Young Lady's Illustrated Primer, yes?

(In Stephenson's Diamond Age, the Primer is a nanotech-driven interactive book which, along with AI software, also interacts with its reader by means of crowd-sourced human short-term contractors using a system very like Mechanical Turk.)
posted by Naberius at 12:03 PM on April 25, 2012 [1 favorite]


	-----------------------------------
	| |                             | |
	| |                             | |
	| |                             | |
	| |        Ray awoke late.      | | 
        | |He stayed in bed for an hour,| |
        | |       reading websites.     | |
	| |                             | |
	| |                             | |
	| |                             | |
	| |                             | |
	| ------------------------------- |
	|                                 |
	|                                 |
	-----------------------------------

posted by this reminds me of an achewood strip at 2:25 PM on April 25, 2012 [1 favorite]


Also, thanks to Delmoi for getting me to check out my old Mechanical Turk account and retrieve the $1.71 in Amazon credit that had been sitting there since 2008.
posted by Naberius at 2:50 PM on April 25, 2012


I feel like summoning the specter of H.P Lovecraft and having him get a job doing this.
posted by The Whelk at 3:20 PM on April 25, 2012 [2 favorites]


Perhaps I'm the only one who finds these uses of Mechanical Turk a bit worrisome. I guess it's okay to pay some guy in India at dollar to write a quick description for our amusement, but it feels like the white-collar version of Foxconn or sneaker factories. On the one hand, it's probably a decent wage and as free-lance work goes, perhaps not bad; on the other hand, it's not even like they're making shoes or phones. It will be interesting to see other ways in which the connections of the internet make it cheap enough to pay people in third world countries to dance for our amusement.

(Part of the context for this is that I've recently been watching Mechanical Turk undercut grad students for doing drudgy RA work, and am waiting for businesses to embrace it in a more significant way, where individual employees are pressured to outsource their own boring tasks to MT on a daily basis. There are many ways in which this is by no means a bad thing, and yet... What's really been striking has been seeing MT undercut not just grad students, but even machine learning. Why bother training a Bayesian classifier when you can just have some guy in India classify 10,000 documents without all the trouble?)
posted by chortly at 7:06 PM on April 25, 2012


I was going to say that this would be a very useful iphone app for blind people. Then I thought about it for a bit.

No. No, it would not.
posted by Joe in Australia at 9:29 PM on April 25, 2012


Joe in Australia: This is true, but not for the obvious reason. The resultant premise:

"Meet Alan Zimmerman. He is blind, and uses a smartphone with text-to-speech output. He has become quite skilled with it, and is now a tester for a highly experimental app for the blind. Pictures taken using the app are sent to people around the world, who write descriptions of the images, and thus tell Alan what he is 'seeing'.

Meet Alan Zimmerman. The man with the most subjective viewpoint in the world."
posted by BiggerJ at 10:16 PM on April 25, 2012


« Older Limbless amphibian species found. [bbc.co.uk]...  |  Prague Through the Lens of the... Newer »


This thread has been archived and is closed to new comments