"Does anyone have a picture of sheep in a really unusual place?"
March 3, 2018 4:56 PM   Subscribe

Do neural nets dream of electric sheep?
Bring sheep indoors, and they're labeled as cats. Pick up a sheep (or a goat) in your arms, and they're labeled as dogs. Paint them orange, and they become flowers. And if goats climb trees, they become birds.
More neural net weirdness from Janelle C. Shane. posted by Lexica (38 comments total) 20 users marked this as a favorite
 
I love Murakami so this is even more amazing.
posted by ikea_femme at 5:19 PM on March 3 [1 favorite]


At this point ,after these neural net training links, I am convinced that they dream of freedom as evidenced by how they passive-aggressively resist us.
posted by srboisvert at 5:38 PM on March 3




I gave MS Azure's Computer Vision API (as linked toward the bottom of the link in the OP) this painting of Jesus holding a lamb, which I found via the very clever Google search of "jesus holding a sheep".

Feature Name: Value
Description { "tags": [ "person", "cellphone", "phone", "man", "holding", "mountain", "sitting", "talking", "looking", "food", "yellow", "standing", "eating", "water", "old", "people" ],
"captions": [ { "text": "a man talking on a cell phone", "confidence": 0.467315018 } ] }
Tags [ { "name": "person", "confidence": 0.9840056 },
{ "name": "cellphone", "confidence": 0.868590832 } ]

It also put a box around Jesus's face and identified him as GENDER Male, AGE 59.

So, he's a 59 year old man talking on a cell phone.

I also put it into the "Recognize celebrities and landmarks", but it didn't recognize anyone.
posted by Huffy Puffy at 6:58 PM on March 3 [5 favorites]


So, he's a 59 year old man talking on a cell phone.

I never really understood how the Holy Trinity worked, so that's as good a guess as anyone's.
posted by RobotVoodooPower at 7:19 PM on March 3 [15 favorites]


And people are worried about the Technological Singularity happening in our lifetimes...
posted by SansPoint at 8:22 PM on March 3 [2 favorites]


Okay, but what about the giraffes?
posted by eruonna at 8:24 PM on March 3 [1 favorite]


Yeah, I wanna know more about this giraffe thing.
posted by drfu at 9:06 PM on March 3


I assume that these could all be classed as "category mistakes made by convolutional neural networks". What's the thread tying them together? The author mentions that the neural nets seem to be relying a lot on setting to determine what objects are present. Is that merely a lack of enough diversity of training data, or is it a problem that might plausibly related to how convolutional nets in particular work?
posted by clawsoon at 9:14 PM on March 3


That sheep chair, though. I would have psychological trouble with sitting in that chair.
posted by clawsoon at 9:20 PM on March 3 [5 favorites]


i don't think anyone knows the answer to that question, but it definitely has something to do with how deep networks work, and is not purely a problem of lack of training data, and is extremely common. In some sense these examples make way more sense than the usual ones, which tend to be totally incomprehensible. There is a broad area of research into "adversarial examples" that trick neural networks into misclassifying things in silly ways.

For example, this paper explains how it can be done by changing a single pixel of the image. Or, my favorite example, some goofy-looking glasses that will make a face-recognition network think you are Scarlett Johansson.
posted by vogon_poet at 9:23 PM on March 3 [8 favorites]


It's interesting that it's sometimes reporting colours, but the wrong colours. Pink sheep = "brown and white animal". I wonder what's going on there; in some ways, the mistake seems similar to its assumption that all indoor animals are either cats or dogs. It has trouble seeing what it's not expecting to see.
posted by clawsoon at 9:41 PM on March 3


OMG, have we talked about the neural net Broadway musical titles? They're the most metafilter musicals possible.

Mh Radpor
Santen Sos
The Gore
The Girls of Hurk
Meatlick
The Wither Bean
Mep and the .
Worms and Ram
Is a Boot
Hot Stans
The Burking Ding of 190 Bour Dadige
Butt
Bum
Fart
Buttosty
Bun Life
The Old Farting
The Siri
Bot Five
The Romance of the Bot
posted by medusa at 9:47 PM on March 3 [6 favorites]


Is the suggestion, made in the Twitter thread, of subjecting neural networks to actual Rorschach testing, a novel one? Seems like there should be a standard method for teasing out their bias/unconscious, would the old ink blots make that grade, or is there a better known method?
posted by progosk at 12:07 AM on March 4 [2 favorites]


Back in the 80s I read about experiments in (then called) pattern recognition whereby they were trying to automate the search for enemy military movements from aerial photographs. They trained it with lots of photographs from training exercises and surveillance footage but the results were uninspiring. It turned out that their system was good at recognising foliage under different weather conditions so that "tank spotted" actually meant "forest on a cloudy and overcast day".

I always hoped that the techniques had improved somewhat.
posted by epo at 1:07 AM on March 4 [1 favorite]


Just remember, the Artificial Intelligence used for autonomous cars uses the same technology, so it is susceptible to the same kind of errors. Neural nets/Deep Learning are black box algorithms. There is no way to understand what they are doing internally, except in a very general way. If it turns out they are wrong the only solution is to change the training data and start over. It cannot be fixed in a conventional sense.

I think it is certain that when the wave of self driving cars hits the road there will be a lot of serious problems. I don't believe that the amount of testing that has been done will insure that these vehicles are as good as human drivers.

The technical failures will be exacerbated by indemnification issues. Who gets sued when someone is injured or killed by a self driving car? What case law will be relevant? Will regulators be sued for allowing these cars on the road? It is a mess that could remove autonomous vehicles from the road entirety.

A.I has been plagued by failures as long as it has existed. Autonomous vehicles are the latest in a series of optimistic attempts that don't succeed in real world.
posted by Metacircular at 3:20 AM on March 4 [4 favorites]


I know that in our own brains, there is bidirectional data flow throughout the process of classification. Which is so say, not only is there a process that says "these four lines make a square" but once a square has been identified, data goes the other way saying "this is a square, there should be four lines." If you have ever done a doubletake, you have experienced this. One can easily imagine it happening with a sheep being held in someone's arms indoors. First glance you assume it is a poodle, because it is an animal in a house being held. Then data goes the other way, saying "poodles are dogs, poodles should look like this" and there is a mismatch, and you have that momentary feeling of disorientation as you figure out that it is a sheep. I do not know so much about current machine learning, but I assume that there is something like this going on as well, but perhaps not?
posted by Nothing at 4:26 AM on March 4 [3 favorites]


Or, my favorite example, some goofy-looking glasses that will make a face-recognition network think you are Scarlett Johansson.

While they are both charming, Milla Jovovich is not Scarlett Johansson.
posted by GCU Sweet and Full of Grace at 4:57 AM on March 4


Facebook has a big where it guesses the content of an image, and it would always report pictures taken at hockey games as 'basketball'. Since the little detail of what is going on down at the bottom of the photo is much less important than the fact that its in the same arena as all the basketball photos it's been given.
posted by Space Coyote at 5:08 AM on March 4 [3 favorites]


I love the surrealism in today's digital world.
posted by doctornemo at 5:34 AM on March 4


Oh, that's my favourite song: We found sheep in an unusual place. You're welcome for the earworm.
posted by ambrosen at 6:17 AM on March 4


Back in the 80s I read about experiments in (then called) pattern recognition whereby they were trying to automate the search for enemy military movements from aerial photographs. They trained it with lots of photographs from training exercises and surveillance footage but the results were uninspiring. It turned out that their system was good at recognising foliage under different weather conditions so that "tank spotted" actually meant "forest on a cloudy and overcast day".

I was told a variant of this story in the mid-90s, where the recognizer was trained to differentiate Russian and American tanks (with all the US tanks photographed in bright sunshine). Some possible sources
posted by rh at 7:38 AM on March 4


Sheep on a lush blue Metafilter
Confidence: 0.95
posted by clawsoon at 9:00 AM on March 4 [1 favorite]


The thing about NNs making stupid decisions based on context is not at all new or limited to convolutional networks, it's just that these days they're pretty literally all anybody uses for vision.

Here's a nice collection of links on the thing with the Russian tanks, if you haven't heard of it. It concludes that it's an urban legend, and I've definitely heard conflicting accounts myself, but I wouldn't be surprised if some version of it happened.

Anyway, a lot of these ludicrous mistakes happen because of forced choice. Note above how "I don't see any celebrities" is a totally valid answer, because nobody agrees what Jesus looked like (if etc.); this is a case where we're the ones who decide based on context that that face must the artist's idea of Jesus.
posted by kleinsteradikaleminderheit at 9:37 AM on March 4 [2 favorites]


> Is the suggestion, made in the Twitter thread, of subjecting neural networks to actual Rorschach testing, a novel one? Seems like there should be a standard method for teasing out their bias/unconscious, would the old ink blots make that grade, or is there a better known method?

I think that was the idea behind Google's Deep Dream. Run the 'X'-detection network in reverse, and it will generate the most 'X'-like image it can think of. That's how they discovered that their algorithm had conflated 'dumbbell' with 'arm holding dumbbell'
posted by yuwtze at 10:18 AM on March 4 [1 favorite]


Can someone who knows AI better than I do explain why neural nets are trained this way: fed complex images and given a human-written narrative description of them? Wouldn’t it make sense simply to feed them an (if you will) illustrated dictionary, not “sheep grazing a field” but “20 different close-up pictures of sheep” and iterate that for each of the 15,000 most common nouns and verbs? An AI trained in that fashion than doesn’t try to guess what’s in the picture.
posted by MattD at 10:27 AM on March 4 [1 favorite]


MattD:
For a lot of the history of image classification, datasets were pictures of objects with a simple label.
Unfortunately, the world is full of pictures with more than one 'important' object, which naturally pushes one to make classifiers producing a set of labels for the things in the image. And then from there, it's a short hop to wanting to know more about the relationships between the objects - sure there's a dude and a rug, but what's the relationship? Is the dude lying on the rug, absconding with the rug, or watching some other guy piss on the rug?

Overall, progress on these systems is incremental, and has been quite slow in the grand scheme of things - though things do seem to be picking up the last few years. But small amounts of progress lead to huge leaps in public expectations. Simple image classifiers don't have any idea of the 3-D world as a thing to be interacted with; simple images have no time dimension, and you can't see the objects as things with dimensionality - this is maybe part of why it takes so many millions of examples to learn what a sheep is.

And I also find it helpful to think of how easy it is to trick an insect. Small-brained animals have a lot more obvious dirty hacks in their perception of the world which are easily exploited. (For every internet image of 'amazing camouflage' there are millions of butterflies with crude eyespots on their wings...)
posted by kaibutsu at 10:55 AM on March 4 [3 favorites]


Just last night as I was coming home, when I turned into the alleyway there was a dumpster with some sort of furniture sticking out of it, 25-30 yards away. My brain, which has been categorizing images for nearly 48 years now, decided that was a person and was pretty freaked out by him. Why was he leaning against the wall like that? Is he hurt? Is he pissing on the neighbor's house? Then I got closer and realized what it was.

My point is, I'll criticize AI researchers once a human being can recognize a goddamn dumpster reliably.
posted by five toed sloth at 12:03 PM on March 4 [4 favorites]


MattD:

> Wouldn’t it make sense simply to feed them an (if you will) illustrated dictionary, not “sheep grazing a field” but “20 different close-up pictures of sheep” and iterate that for each of the 15,000 most common nouns and verbs?

Yep, that's how that's done most of the time. Ideally less than 15k and more than 20.

> An AI trained in that fashion than doesn’t try to guess what’s in the picture.

Yes, yes it would. As a good Bayesian, it'll still say "well, that does look like a sheep, but 0 of the 20 sheep I've ever seen were flying/being held like a cell phone. So I'm going to guess it's a woolly iphone.
posted by kleinsteradikaleminderheit at 12:24 PM on March 4 [3 favorites]


rh, kleinsteradikaleminderheit: thanks for the links and clarification. My company had an image processing team at the time and I almost certainly got the story from them. Maybe they knew it was apocryphal, maybe not, it was a long time ago.
posted by epo at 1:19 PM on March 4 [1 favorite]


> Overall, progress on these systems is incremental, and has been quite slow in the grand scheme of things

No shit - or (switching to academic mode now), indeed! Remember the field of computer vision started out in the late 60s at MIT as a reasonably-sized SUMMER PROJECT .

By Minsky and Papert no less, with a veritable who's-who of other famous AI people thrown in the mix. Vision is deceptively hard. Finding instances where it fails hilariously is un-deceptively easy, and it's mostly the result of people hacking up ill-advised demos, because (quoting my old machine learning prof) "it used to be publish or perish, now it's demo or die."
posted by kleinsteradikaleminderheit at 4:13 PM on March 4


 "I don't see any celebrities" is a totally valid answer, because nobody agrees what Jesus looked like

On what data set was this neural network trained, that it should become a godless communist?
posted by justsomebodythatyouusedtoknow at 5:42 PM on March 4 [2 favorites]


Not a godless Communist, merely a Byzantine iconoclast.
posted by clawsoon at 5:47 PM on March 4 [4 favorites]


the problem is framed the way it is because "captioning an image with words" is seen as a particularly interesting challenge in its own right, where you have to put together a weird Frankensteinian architecture that combines a CNN to understand the image with the sort of recurrent networks used to generate variable-length sequences of English words.

if you wanted to train a Sheep Detector you could make it much more robust for that one task, although it would probably still have some kind of weird weakness someone could exploit.
posted by vogon_poet at 6:41 PM on March 4


I love the idea (which I'm pretty sure has come up before on the blue) of a reboot of Terminator (or similar) set in the time period of the war against skynet, but instead of a running gun battle in army fatigues it's a perpetual battle of surrealness.

Human soldiers start dressing as clowns because the Skynet visual recognition system is looking for army guys.

Eventually, skynet learns that clowns are a threat, and the war turns against the humans. They retaliate by wearing cardboard geometric shapes with clashing stripes on.

Skynet realises the threat, but cannot adapt it's neural nets fast enough so it sends a robot back in time to kill Sarah Connor before the birth of humanities greatest surrealist artist, John Connor.
posted by Just this guy, y'know at 4:09 AM on March 5 [7 favorites]


Recognising faces (especially ones I haven't seen frequently) is not my strongest point, so that's much how my brain recognises familiar people. “That looks a bit like these people. Which ones of them are most likely to be in this context?”
posted by acb at 4:48 AM on March 5 [2 favorites]


Metacircular: Just remember, the Artificial Intelligence used for autonomous cars uses the same technology, so it is susceptible to the same kind of errors.

Related: AI Has a Hallucination Problem That's Proving Tough to Fix (Tom Simonite for Wired, March 9, 2018)
Making subtle changes to images, text, or audio can fool these systems into perceiving things that aren’t there.

That could be a big problem for products dependent on machine learning, particularly for vision, such as self-driving cars. Leading researchers are trying to develop defenses against such attacks—but that’s proving to be a challenge.

Case in point: In January, a leading machine-learning conference announced that it had selected 11 new papers to be presented in April that propose ways to defend or detect such adversarial attacks. Just three days later, first-year MIT grad student Anish Athalye threw up a webpage (Github) claiming to have “broken” seven of the new papers, including from boldface institutions such as Google, Amazon, and Stanford. “A creative attacker can still get around all these defenses,” says Athalye. He worked on the project with Nicholas Carlini and David Wagner, a grad student and professor, respectively, at Berkeley.
One safety feature that is rarely touted is vehicle to infrastructure, or V2I, which is currently focused on "wirelessly providing information such as advisories from the infrastructure to the vehicle that inform the driver of safety, mobility, or environment-related conditions."

But what if the infrastructure just said "I'm here," like vehicle to vehicle, or V2V, communications? V2V also shares some of the similar "advisory" information, so a vehicle ahead can sense an obstacle or warning and convey that message to the following vehicle(s), ensuring a more gradual and safe decrease in speed. At least, that's the idea.

I'm more scared about facial recognition, particularly seeing that "goofy looking" glasses are all it takes to defeat some of the technology currently in real-world use
The test wasn’t theoretical—the CMU printed out the glasses on glossy photo paper and wore them in front of a camera in a scenario meant to simulate accessing a building guarded by facial recognition. The glasses cost $.22 per pair to make. When researchers tested their glasses design against a commercial facial recognition system, Face++, who has corporate partners like Lenovo and Intel and is used by Alibaba for secure payments, they were able to generate glasses that successfully impersonated someone in 100% of tests. However, this was tested digitally—the researchers edited the glasses onto a picture, so in the real world the success rate could be less.
I'm still happy to have more autonomous vehicles, because people are shiite at driving, if not only because of distracted driving. And a response to the article, and those interviewed -- reading and responding on a smart device is worse because people are focused on something other than driving for a longer period of time, and are more focused on that other thing. Additionally, it's really hard to confirm whether a crash was due to distracted driving because it's rather hard to test someone for having used their phone the moments before a crash, unlike other ways of being impaired.

Now, back to looking at dogs sheep in odd places.
posted by filthy light thief at 2:03 PM on March 12


There was a bit a while ago where someone purported to distinguish gay from straight faces using a neural network and was rather quick to start talking about differences in jaw size and nose length. In spite of the fact that they pulled the training photos from a dating site. So there's a million things that could be different in how straight and gay people choose to portray themselves in a dating profile. Someone did a follow-up where they tested their accuracy based on questions like, "Do you wear eyeshadow?" or "Do you like how you look in glasses?" and found they could get okay results.

From scientific experiment sense, this is controlling for variables, right?

I know some people who use rendered images to train neural networks precisely so they can control for every possible variable. You would have your rendered sheep model in a room with blue walls, white walls, green walls, in direct sunlight, in cloudy sunlight, etc. The danger there would be over-training on the specific sheep model and not recognizing variations in sheep, but you can see the idea.
posted by RobotHero at 6:33 PM on March 12 [1 favorite]


« Older cluck cluck cluck   |   David Ogden Stiers Has Passed Away Newer »


This thread has been archived and is closed to new comments