Neural net describes video in real time
November 27, 2015 11:40 PM   Subscribe

 
If it was trained the way we were training speach neural networks back in the 00s then its its not quite as terrifying as you'd think. Though computer vision has come a hell of a long way and we've way more cpu/gpu/gpgpu tonthrough at it now, don't we?

Point of order. A hoodie is not, and never will be, a suit. This algorithm should feel shame.
posted by The Legit Republic of Blanketsburg at 11:48 PM on November 27, 2015 [4 favorites]


It is awesome. Amsterdam, I mean... make me be in Amsterdam again, put me there computer. The neural net is cool too, but it doesn't hold a candle to the canals and the bikes of Amsterdam.
posted by Meatbomb at 11:49 PM on November 27, 2015 [4 favorites]


The mistakes are fascinating. UNK UNK UNK UNK
posted by roger ackroyd at 12:06 AM on November 28, 2015 [7 favorites]


Those who are supposedly in the know think this will improve remarkably over the next couple of decades. I guess that's a good thing?
posted by anarch at 12:16 AM on November 28, 2015


Are there neural net (or related image analysis) packages that are simple to use? Suppose I have taken thousands of photos, and I would like to make a net recognize a type of flower, or snow, or my car, or something in all the photos that have it. Are there libraries that can let you play with this?
posted by polymodus at 12:29 AM on November 28, 2015


My guess is that the mistakes will be as instructive as the correct calls in terms of improving this.
posted by oheso at 12:32 AM on November 28, 2015 [1 favorite]


--answering my own question: I'm now looking at scikit-learn for python, it seems popular.
posted by polymodus at 12:43 AM on November 28, 2015 [1 favorite]


Especially when the drone pilots get replaced.
posted by a lungful of dragon at 12:43 AM on November 28, 2015 [5 favorites]




Especially when the drone pilots get replaced.

AN ENEMY IS HOLDING A RIFLE
AN ENEMY IS STANDING BESIDE A BUILDING
AN ENEMY AND AN ENEMY BOY ARE WALKING DOWN A STREET FULL OF ENEMIES
posted by Meatbomb at 12:55 AM on November 28, 2015 [54 favorites]


Polymodus, the new Google Photos app does this by default. It's very scary. I think it is training itself on Google Images, and then classifying your images. It also sorts them into categories based on what it determines you take most photos of. Instead of predetermined categories like "friends", "family", "holidays", "garden", "cat", I therefore now have my photos sorted into "cats", "fires", "trees", "mountains", "flowers" and "doors".
posted by lollusc at 12:57 AM on November 28, 2015 [7 favorites]


It's hard to be sure how good this is without knowing more, but it seems to be picking from a restricted set of stereotyped "likely guesses" customised for a street scene. Notice how when it's seeing nothing but the road surface it goes on enthusiastically producing more or less random responses from the same list. Note its curious determination that there will be train stations and skateboards. What would it see in the Amazon rain forest?
posted by Segundus at 1:06 AM on November 28, 2015 [2 favorites]


its not quite as terrifying as you'd think...
...Point of order. A hoodie is not, and never will be, a suit.


PLEASE PUT DOWN YOUR WEAPON. YOU HAVE 20 SECONDS TO COMPLY. :)
posted by anonymisc at 2:48 AM on November 28, 2015 [8 favorites]


You know, I keep reading these reports of PTSD among drone pilots and I think: someone is definitely working on implementing this right now. Look at the relatively few error reports that get leaked: a missile gets fired because its target consists of a group of adult men. Or a group of people, some of whom are carrying black objects. These are the sort of errors that neural nets make; even if the decisions were made by humans, it just demonstrates that neural nets wouldn't be any worse.

So there's undoubtedly people who are saying things like "Implementing Skynet is an act of mercy, really. We can tweak our algorithms until they're better than human recognition, without any risk of security breaches or trauma to our citizens. And it's much more scalable. We could have a thousand drones in the air at once without training any more pilots. We could wipe out a guerrilla army by targeting its soldiers individually. We could end the insurgent advantage in urban conflict, because their forces would be vulnerable and ours wouldn't. Just how many people do you want to die because of of your sentimental desire to have a human initiate a process that is otherwise completely automated - from locking on the target to firing and tracking the missile?"
posted by Joe in Australia at 3:29 AM on November 28, 2015 [2 favorites]


It's not strictly a set of city stuff. There's "a pair of scissors sitting on a table" somewhere in there. I find this rather charming in the same way that misbehaving robots are somehow adorable.
a bike parked next to a bunch of bikes
posted by NMcCoy at 3:42 AM on November 28, 2015 [2 favorites]


A boat is parked on a train station. Why is that, Leon?
posted by lmfsilva at 3:56 AM on November 28, 2015 [14 favorites]


Just how many people do you want to die because of of your sentimental desire to have a human initiate a process that is otherwise completely automated - from locking on the target to firing and tracking the missile?

I'm a product of the 1980s, having grown up on Threads and WarGames and Ronald Reagan. I think I'd rather leave the final call on stuff like accidental nuclear annihilation up to a sentimental human being, rather than a technical glitch. On the other hand, it seems we're ever closer to a scenario where an ad company sets up self-driving cars and gets its software modified and contracted to automate death, so it's probably the kind of ironic end we'd deserve for putting those guys in charge, if it comes to that.
posted by a lungful of dragon at 4:18 AM on November 28, 2015 [3 favorites]


a man at his desk is nonplussed
posted by davebush at 5:13 AM on November 28, 2015 [4 favorites]


From TFA:
"What is one example that would be familiar to most people? Whenever you upload a photo to Facebook, facial recognition software helps you tag your friends."

Uh, I can do without that, thanks, ever-helpful Facebook.
posted by cynical pinnacle at 5:18 AM on November 28, 2015


In the future computers will annoy us 1,000 times faster than even the most gifted human.
posted by blue_beetle at 5:43 AM on November 28, 2015 [4 favorites]


A man walking down the train tracks with a stuffed animal.
A bicycle parked next to a building with a clock.
A photograph of a pair of scissors.
A sewing machine and an umbrella on an operating table.
posted by kozad at 6:09 AM on November 28, 2015 [5 favorites]


A man walking down the train tracks with a stuffed animal.

He says "why am I soft in the middle now?"
posted by pulposus at 7:09 AM on November 28, 2015 [27 favorites]


In the future computers will annoy us 1,000 times faster

What is really scary will be when they get 'smart' enough to know how NOT to be annoying.
posted by sammyo at 7:51 AM on November 28, 2015 [1 favorite]


I think you'll find this particular uncanny valley very very wide and very very deep.
posted by flabdablet at 8:41 AM on November 28, 2015 [2 favorites]


MANY HUMANS ARE BUILDING ROBOTS IN A FACTORY
TWO HUMANS ARE TALKING TO EACH OTHER
TWO HUMANS ARE TALKING ABOUT NOT BUILDING MORE ROBOTS
TWO HUMAN CORPSES ARE ON THE GROUND
posted by RobotVoodooPower at 10:10 AM on November 28, 2015 [19 favorites]


man makes treasonous statement (no warrant/review required). Engage.
posted by mule98J at 10:11 AM on November 28, 2015


Polymodus, the new Google Photos app does this by default.

I'm aware, but I'm thinking a more customised arrangement where I would control the training.

If my hobby is photographing flowers, I'd show a few examples of what's an orchid versus what's a tulip, and it would try to accurately guess for the rest of my flower photo album. Moreover I would want to train it different variants of tulips - based on color, or size, or even how I subjectively like that flower.

I don't believe Google or Apple photo software can do such specific, personalised analysis. But is there software that can, like the one used in the article?
posted by polymodus at 11:48 AM on November 28, 2015


Polymodus:
Recognizing variants has a long history.

And it's actually one of the areas where neural nets are typically much stronger than humans. Typically it takes a human expert to decently identify sub varieties, but neural nets apparently do quite well.
posted by kaibutsu at 12:21 PM on November 28, 2015


The Vimeo description has the relevant technical information in it, some highlights.

This uses Torch, a particularly legible scientific computing library.

Specifically, its based on the NeuralTalk2 image captioning project. A bit of openFrameworks code displays the video and handles requesting and drawing new captions. All of this is open source and or readily available aside from a little bit of code you would need to make NeuralTalk create plain text captions.

The basics of captioning an image are here along with links to download a pretrained neural network based on COCO. Having the pretrained network is nice because training is by far the most time involved step in getting started and the most exacting to perform. On the other hand, the types of objects it will find are limited to what have been captioned in COCO.

So, if you want to get general captions for a bunch images:

1. Install Torch
2. Set up NeuralTalk2 (which has a lot of dependencies that might be skippable if you use the CPU only version)
3. Download the trained network
4. Tell it to caption your images.

If you want NeuralTalk to learn your own captions for images, you have to do some preprocessing and then train it. It will need a lot of images/caption pairs to learn.
posted by ethansr at 1:17 PM on November 28, 2015 [2 favorites]


MANY HUMANS ARE BUILDING ROBOTS IN A FACTORY
TWO HUMANS ARE TALKING TO EACH OTHER
TWO HUMANS ARE TALKING ABOUT NOT BUILDING MORE ROBOTS
TWO HUMAN CORPSES ARE ON THE GROUND
posted by RobotVoodooPower


Well, hm.
posted by shakespeherian at 1:36 PM on November 28, 2015 [4 favorites]



It's hard to be sure how good this is without knowing more, but it seems to be picking from a restricted set of stereotyped "likely guesses" customised for a street scene



There's some info on the Vimeo page that this was embedded from and he says that it's a set of about 100,000 pictures.

There was an online still image captioner FPP a while ago that was pretty amazing. I used it on some pictures I took in an conservatory and I was amazed to see it transition from seeing "greenhouse" in the wide shots to "bridge" in some of the closer views of the old iron structure.
posted by bonobothegreat at 4:30 PM on November 28, 2015


How far have we backslid, as a nerd culture, that the neural network in the video GenericUser posted can't recognize the NCC-1701-D? That should be like the *first* thing it gets trained to recognize.
posted by gusandrews at 5:29 PM on November 28, 2015


EVENING SKY A PATIENT ETHERISED UPON A TABLE
posted by Johnny Wallflower at 6:12 PM on November 28, 2015 [1 favorite]


DRIVER JOHN A. SMITH, 38, NY DL D6736-765-3674
NO OUTSTANDING WARRANTS
PASSENGER JANE A. SMITH, 36, NY DL C3242-234-3527
NO OUTSTANDING WARRANTS
PASSENGER JOHN A. SMITH JR, 5
NO OUTSTANDING WARRANTS
VEHICLE SPEED 30.6 MPH IN 30 MPH ZONE
CITATION ISSUED
posted by double block and bleed at 8:19 PM on November 28, 2015


"a man in a suit and a tie.
a man in a suit and a tie with a cell phone
a man in a suit and a tie standing in front of a building
a man in a suit and a tie holding a wine glass
a man in a suit and a tie standing in front of a building
A man is taking a picture of himself in a tie."


So now we know what the first poetry of the machines is like. It's very minimalist, like if Phillip Glass was a poet. Still, I can see something really good coming from a collaboration with Laurie Anderson or someone similar.
posted by LeRoienJaune at 1:09 AM on November 29, 2015 [1 favorite]


Whenever you upload a photo to Facebook, facial recognition software helps you tag your friends.
Hubby and I play board games every week with another couple. We are always the same colors (or try to be) in any game played. We post photos of the board at funny or interesting points of the game, or at end-of-game scoring, tagging the game pieces so we know who was where on the score track.

Whenever Facebook sees a purple game piece or Meeple, or any smallish purple object, it suggests tagging our friend. Creepy.
posted by xedrik at 10:21 AM on November 29, 2015


ALL IS ROUNDWORM

EXCEPT THE HAIR SLIDE
posted by rum-soaked space hobo at 9:32 AM on November 30, 2015


« Older 3 immature techs: digital comps, transistors, and...   |   The New, Ugly Surge in Violence and Threats... Newer »


This thread has been archived and is closed to new comments