Breaking things is easy
December 17, 2016 3:49 AM   Subscribe

Breaking things is easy. "Machine learning has not yet reached true human-level performance, because when confronted by even a trivial adversary, most machine learning algorithms fail dramatically. In other words, we have reached the point where machine learning works, but may easily be broken."
posted by escabeche (36 comments total) 32 users marked this as a favorite
 
So far, most machine learning has been developed with a very weak threat model, in which there is no opponent. The machine learning system is designed to behave correctly when confronted by nature.

I feel they may have badly underestimated Nature.
posted by GenjiandProust at 3:59 AM on December 17, 2016 [14 favorites]


This looks great, thanks for bringing it to my attention.
posted by PMdixon at 4:03 AM on December 17, 2016 [1 favorite]


Also, the picture on the right is of a gibbon, a gibbon in a panda suit, but, hey, the machine wasn't fooled.

More seriously, it seems kind of obvious, but the more technology we incorporate into our lives and the "smarter" that technology is, the more porous our security becomes, and the more critical and more impossible sealing breaches turns out to be. One day, in the not very distant future, an algorithm determining how to precisely brown a banker's toast is going to get a poisoned training set, then that will spread via the "shopping link" between the toaster, the refrigerator, and the automatic supermarket app to collapse the international financial system. The good news is that it will be the third time that week, so, you know, business as usual.

And the banker will have died in a driverless car failure or hack or collateral damage on yesterday's commute anyway.
posted by GenjiandProust at 4:10 AM on December 17, 2016 [13 favorites]


Skynet
posted by gt2 at 5:34 AM on December 17, 2016


Are not the first two sentences contradictory with a use of thus?

Until a few years ago, machine learning algorithms simply did not work very well on many meaningful tasks like recognizing objects or translation. Thus, when a machine learning algorithm failed to do the right thing, this was the exception, rather than the rule.

Confidentiality is sufficiently clear to me, though its illustration seems sensational given Google's pledges since Gmail are a practical example over a decade old.

The Poisoning training sets section doesn't scan at all (to me) with its overwrought Professor Moriarty illustration...all I could think was AI apps/appliances will be proprietary products behind mountains and valleys (to apply my own confusing metaphors) of trade secrets and ND agreements that will require regulatory frameworks. Maybe this resource hopes to inform such coming regulation?

And I'm challenged for always striking "very unique" as gobbledegook (too prescriptive), but it is...but it is.

It's a compelling topic or I wouldn't bother to read it, but damn...reads like a text message.

posted by lazycomputerkids at 5:46 AM on December 17, 2016 [1 favorite]


I think it meant something more like "when a machine learning algorithm that had in fact been known to work failed to do the right thing, this was the exception, rather than the rule."

People didn't use it in production to do hard stuff, and its failure modes weren't surprising. Now they are trying to use it for harder stuff, and the failure modes are less predictable. Something like this?
posted by idiopath at 6:30 AM on December 17, 2016 [2 favorites]


The Poisoning training sets section doesn't scan at all (to me) with its overwrought Professor Moriarty illustration...all I could think was AI apps/appliances will be proprietary products behind mountains and valleys (to apply my own confusing metaphors) of trade secrets and ND agreements that will require regulatory frameworks. Maybe this resource hopes to inform such coming regulation?

Yes and no, I think. The sense I have is that a lot of the recent advances in machine learning are quasi public, in that the big SV companies have scooped up a lot of talent from the universities, and those scientists are continuing to publish their latest research advances in academic journals. That's part of what's allowed the field to advance so rapidly in recent years. I put up an FPP myself a couple days ago linking to a recent NYTimes article which goes deep into contemporary machine learning methods.

Moreover, with regard to poisoning data sets in particular --- the article I linked to explains this a lot more --- but basically, the theoretical ideas behind a lot of machine learning were first advanced back in the late 80s and early 90s. But back then, they couldn't make them work, because they simply didn't have enough data to train them properly (nor the sheer computing power to do so quickly --- high end video game graphics chips were the breakthrough there). To develop a photo recognition app that's capable of effectively categorizing thousands of different photo subjects, you need to let it churn though millions of photos.

In 2016, though, we do have big enough data sets, a lot of it's quasi-public. You need millions of photos for your app to look at? Crawl Flickr. Like people train these programs on massive archives of YouTube videos or the collected proceedings of the Canadian Parliament (several million words which have been professionally translated from French to English). So when he's talking about poisoning a data set, he might mean something like uploading noisy videos to youtube so that when google uses youtube to train its content-labeling app, it gets things wrong, and facebook's is better. Stuff like that.

Altogether, I think you end up with a situtation where it's more like blackhat SEO types gaming PageRank --- like, Google works damn hard to keep its actual ranking algorithm private and proprietary. But by observing how it behaves in the wild, spammers and scammers can infer enough about how it works to design their sites to rank highly.
posted by Diablevert at 6:37 AM on December 17, 2016 [7 favorites]


Man, I know what my viewing habits are with youtube, i'd hate to see what the machine learning algorithm seeded by me would be... Cat videos, Niel DeGrasse Tyson, Glove & Boots, stand up comedy, king of random, numberphile, and a metric ton on building machine learning algorithms... ignoring the recursive nature of teaching machine learning to machine learning algorithms for fun, I can just see it achieve peak stupidity with standup and glove and boots.


Oh, and it's strong interest in cosmic kids yoga....
posted by Nanukthedog at 7:30 AM on December 17, 2016


Algorithms will never be able to both work predictably and consistently and be immune to abuse and gaming. Because they aren't really "smart"--they don't have any kind of self awareness or agency. They can't get a funny feeling someone might be using their own internal logic to manipulate their outputs and then override their own logic to protect themselves; they've got no intuition, no knowledge of their own purpose. It always makes me snort milk out of my nose a little how earnestly people describe this stuff as "AI"--I mean, sure, I get that there's a distinction between strong and weak AI, and that people mean weak AI, but I don't really think even what we call weak AI is really all that similar to actual intelligence yet. Machines have always embedded "intelligence" in them, in the sense that engineers beliefs about how they'll be used and how they should behave in operation to account for various contingent use cases are built in. Even the most sophisticated algorithms are at most like logical clockwork automata that are only as intelligent seeming as their engineers are capable of designing them to be--more like really elaborate wind up toys than actual intelligent agents. You'd need an engineer or team of engineers with God-like insight into human nature and reality and a solid working model of how self awareness and consciousness work to do any better, but as Hofstadter has argued, if you could actually achieve that, you'd end up with an agent no more reliable or less subject to manipulation than a human being or any other natural intelligence. You can't really separate out what we intuitively think of as real intelligence from agency, intuition, and self awareness. Any truly intelligent agent would have to be just as fickle, unreliable, and potentially stubborn as a flesh and blood person. Then you've got to do politics with the AIs instead of programming them and we're back to the original problem.
posted by saulgoodman at 7:32 AM on December 17, 2016 [14 favorites]


I've been annoyed lately while using Amazon that they have attacked the integrity of their own algorithm. If you like to look at "Customers Who Bought This Item Also Bought" lately you will see that it is polluted with items from unrelated categories that you have recently searched for or looked at.

So while their machine learning no doubt tells them I am somewhat more likely to buy something I have already searched for so it is better for them to show that item than what I am actually looking for - that is items somehow related to my current search - it is in essence a compromised algorithm for my purposes.
posted by srboisvert at 7:34 AM on December 17, 2016


I've been annoyed lately while using Amazon that they have attacked the integrity of their own algorithm.

Machine Learning has really gone downhill since the US started up that "No CPU Left Behind" program. It was supposed to enhance broad learning, but it has predictably turned into teaching to the Turing Test.
posted by GenjiandProust at 7:59 AM on December 17, 2016 [15 favorites]


I've been a one-man software shop dealing with end users in industry for about thirty years, and I have always maintained that anybody can write a program that works when everything goes right; the real skill is writing a program that deals with it gracefully when unexpected things happen, like a failed sensor or actuator or the operator doing something you never anticipated.

Machines have trouble dealing with nature because nature is unpredictable and hostile. It's not just full of simple random hazards like weather and geography; it is also full of things that actively want us to fail, because they want to eat us or because they don't want us to eat them.

If you want to know what machines will look like when they can deal with that kind of chaos and hostility, look at an animal. We are the result of millions of years of unforgiving natural selection, and brains work the way they do because that has proven to be a successful set of design choices for dealing with a chaotic and hostile environment.

Machines aren't there yet; AI is now just barely at the point where it can recognize speech or shapes or construct a 3D model of its environment from visual cues when everything goes perfectly. But machines won't be ready to deal with the natural world until they are also paranoid, suspicious, and ready to cope if their actions don't create the expected result. Many of the things that seem better than life about machines today are actually maladaptive in a natural situation because their dependence on precision and predictability makes them far more vulnerable to failure and sabotage.

Of course, when machines do get that right, they will also be right to wonder about their true relationship with their creators...
posted by Bringer Tom at 8:03 AM on December 17, 2016 [4 favorites]


I don't think anyone's claiming to have developed a general purpose artificial intelligence equivalent to that of a person. But who gives a toss about that? Cars didn't have to be cleverer than a horse to be better than a horse. We already have photo diagnostic apps that are better at detecting cancers than trained radiologists. Google translate is probably worse at translation than an expert human translator, and it may still be so in five years. But I'm not going to be able to carry a human translator around with me in my pocket and get them to translate between hundreds of languages instantly, for free. One may conceive of a risk calculation algorithm that is entirely incapable of playing parcheesi or enjoying Motörhead. It may still be the entity deciding whether I get insurance or not and how much my premiums will be. And therefore the fact that its decision making can be fucked with in ways that are very difficult for humans to detect is important.

These intelligences may not be our equivalent, be capable of thinking the way we think. But we're already offloading chunks of our thinking to them. Chunks that used to be people jobs, and chunks that no person could have done themselves.
posted by Diablevert at 8:08 AM on December 17, 2016 [2 favorites]


Cars and diagnostic apps both benefit from working in spaces where humans have selectively limited their inputs; cars operate on streets which are (at least mostly) intended to conform to predictable rules, the diagnostic apps get images taken under carefully controlled conditions. Neither app is subject to hostile attempts to game them, yet a guy was killed in his Tesla because it didn't realize a white tractor-trailer crosswise on the road wasn't the horizon. Nobody had thought to train it on images of vehicles in that position. And in fairness, a lot of humans would have hit the truck too for the same reason.

It's a different story for machines that are in a position to create consequences for humans or other entities that might have a motive to game them. A machine may not need human consciousness to do its intended job, but it needs to be capable of doing its job when a human consciousness is deliberately trying to sabotage it, which is a much more difficult thing to engineer.
posted by Bringer Tom at 8:16 AM on December 17, 2016 [3 favorites]


In what sense is weak AI about making a machine "intelligent" in some special sense that's any different than designing a physical machine to handle a variety of use cases and failure modes? What is special about so-called AI that makes it anything more than logical machinery? Where is the *there* there, really, that inspires so much confidence and optimism? Machines have always been designed with some embedded human intelligence in their design, to a greater or lesser degree, in the weak AI sense, haven't they? Using the natural features of mechanical systems to enforce the logic of the design rather than code and the underlying electrical subsystems they control? The only "intelligent" behaviors weak AI systems are capable of are just expressions of the embedded intelligence of their human designers. We're not at the point where machines themselves are intelligent in any sense I can see. But then, we don't have any robust working models of what intelligence is in the first place, so that shouldn't be surprising.
posted by saulgoodman at 8:34 AM on December 17, 2016 [2 favorites]


I'm mostly with saulgoodman. 'AI' today is still just a growing set of pattern-matching, algorithms and explicit cases; there's very little intelligence. A tractor may be a more reliable puller than a horse, but a horse is smart enough to eat a carrot and not an orange painted iron spike. The tractor is more useful to US, but a horse is still the more intelligent.

The best current uses of AI in its current state are as automation of rote tasks, sensory enhancement, using inputs to produce short lists of possibilities, monitoring for safety purposes... all things that allow humans to do better.

For me the best uses will be assistive (driver ASSIST, not driverless. Take the train, already). Translation is a good use. Expert systems to assist the humans doing medicine, law, engineering, etc. We already have bad uses, which use tech to exploit human foibles and limitations, like robo-trading which milks a financial system and rules that were never designed with automation in mind.
posted by Artful Codger at 8:37 AM on December 17, 2016 [1 favorite]


Any truly intelligent agent would have to be just as fickle, unreliable, and potentially stubborn as a flesh and blood person. Then you've got to do politics with the AIs instead of programming them and we're back to the original problem.

There are plenty of people who are sufficiently reliable and compliant to work for companies. Now imagine that person can be easily duplicated, doesn't get sick, take vacation, or leave for another company. If a business could take its top 10 employees, and duplicate them on demand... well, I hope we can arrange to get a National Dividend or Basic Income before that happens.
posted by fings at 9:43 AM on December 17, 2016 [3 favorites]


lazycomputerkids, the poisoning training sets is very relevant because truly useful AIs will have to learn on public data. So they need to find ways to protect against currenly very easy attacks whereby anonymous contributors of that data can craft non-obvious flaws they can later exploit.
posted by lastobelus at 9:45 AM on December 17, 2016 [1 favorite]


Any truly intelligent agent would have to be just as fickle, unreliable, and potentially stubborn as a flesh and blood person.

This is not true at all. Humans have pronounced cognitive flaws -- particularly with respect to estimating probabilities, and making risk assessment -- leftover from previous evolutionary phases that it would be entirely unecessary to dump on an AI.
posted by lastobelus at 9:49 AM on December 17, 2016


The new(ish) thing is generative adversarial networks, where two networks co-evolve, each trying to outsmart the other. This should make networks more robust against noise. Hopefully this kind of thing will become common practice.

But I think an adversary that has access to the classifier will always be able to find a counterexample. Maybe an easy solution is just train a bunch and rotate among them.
posted by RobotVoodooPower at 9:56 AM on December 17, 2016 [1 favorite]


What is special about so-called AI that makes it anything more than logical machinery?

Machine learning is where we take the fact that the internals of most systems are now so frigging multi-layered and interdependent and complex that we haven't the glimmerings of a clue how all that churn going on in there that has anything to do with the problem at hand, and declare that to be not a bug but a feature.

Basically we're designing ways to throw up our hands and declare that software engineering is too hard and just solve everything with sheer brute-force processor power.
posted by flabdablet at 10:46 AM on December 17, 2016 [4 favorites]


this really reminds me of the way the AI in Peter Watt's Starfish basically destroys the world because its been trained to favor the simple outcome. yikes!
posted by supermedusa at 12:19 PM on December 17, 2016 [1 favorite]


Basically every automobile corporation as well as most big parts suppliers and well funded feisty startups are building robot cars. The very first wave of marketing will be "we get you there safe", how long will a bunch of cut throat marketing organizations be happy with "safe"? Now how will the engineers prove that "our algorithm gets you there fastest (and safest)?

Self driving cars are just an implementation of ML. Proving an algorithm is not compromised will be a big niche, an even bigger niche will be cheating the "proven correct" algorithm to let you go faster. Who watches the robots? Robots watch the robots watching the robots.
posted by sammyo at 12:48 PM on December 17, 2016


In what sense is weak AI about making a machine "intelligent" in some special sense that's any different than designing a physical machine to handle a variety of use cases and failure modes? What is special about so-called AI that makes it anything more than logical machinery? Where is the *there* there, really, that inspires so much confidence and optimism? Machines have always been designed with some embedded human intelligence in their design, to a greater or lesser degree, in the weak AI sense, haven't they? Using the natural features of mechanical systems to enforce the logic of the design rather than code and the underlying electrical subsystems they control? The only "intelligent" behaviors weak AI systems are capable of are just expressions of the embedded intelligence of their human designers. We're not at the point where machines themselves are intelligent in any sense I can see. But then, we don't have any robust working models of what intelligence is in the first place, so that shouldn't be surprising.

Most people I know working in AI do think about it this way - just another algorithm. The recent success of neural networks - with which "machine learning" should not really be conflated - seems to have given new life to the notion that "weak AI" will blur into "strong/general AI" however you want to call it. I think this is because of their black-box nature, and loosely brain-inspired design. Not everybody thinks we need to have a robust model of human intelligence to replicate it. But then neural networks were a hot topic in AI years ago, before they hit a wall - so that could also happen again.
posted by atoxyl at 12:54 PM on December 17, 2016 [1 favorite]


all I could think was AI apps/appliances will be proprietary products behind mountains and valleys (to apply my own confusing metaphors) of trade secrets and ND agreements that will require regulatory frameworks. Maybe this resource hopes to inform such coming regulation?

No, they are focused on technical means, and one of the authors, Ian Goodfellow (who also invented generative adversarial networks (GANs), mentioned below) is a researcher at OpenAI, so not very interested in secrecy.

It always makes me snort milk out of my nose a little how earnestly people describe this stuff as "AI"

I'm fine with the goalposts for this kind of thing always moving out of reach, but I don't know what else you can call neural style transfer or automated image captioning .

The new(ish) thing is generative adversarial networks, where two networks co-evolve, each trying to outsmart the other. This should make networks more robust against noise.

I just gave a tutorial about a type of GAN. Current GANs aren't designed for that, but I suppose you could make one which was. They have a generator and a discriminator component, and the generator is trained to produce output which the discriminator can't tell from the training data, while the discriminator is trained to tell the difference. I suppose you could replace the discriminator with a standard object-detection classifier, and change the training objective of the generator to producing a tiny delta from an input image which confuses the discriminator.

I think there are probably simpler approaches to this problem, though. An obvious one is to expand the training data with many small deviations from the actual training data.

There are plenty of people who are sufficiently reliable and compliant to work for companies. Now imagine that person can be easily duplicated, doesn't get sick, take vacation, or leave for another company.

Yes, on a high-level this is essentially how Alpha Go was implemented. A neural net which could predict the next move was trained from human games, then that was played against itself to generate millions of games, then those games were used to train another neural net to predict outcome of a game, then those two neural nets were used in a "Good-Old-Fashioned-AI" framework to explore millions of games more efficiently than had previously been possible. There is probably going to be a lot of that sort of industrialization of small improvements in intelligence over the next few years.

But then neural networks were a hot topic in AI years ago, before they hit a wall - so that could also happen again.

That's true, but the progress shows no sign of stopping at this point. Check out the images generated by this neural network, which were published this week. Give the network a brief text description, and it gives you back an image matching that description. There is no open-source or independent implementation of this network yet, but it shows how far things have advanced that these results are even plausible now.
posted by Coventry at 5:37 PM on December 17, 2016 [3 favorites]


I wouldn't take for granted that future AIs will need tons and tons of (public, potentially poisoned) training data. People aren't magic, and they don't need to see thousands of examples of a thing before they recognize that thing. It took my toddler at most five tries before she could confidently tell the difference between a pentagon and a hexagon. If we can replicate that fast-mapping secret sauce we'd be in AI flavor town.
posted by a snickering nuthatch at 7:12 PM on December 17, 2016


There are techniques which achieve that kind of learning in a limited way. It's called zero-shot or one-shot learning in the literature.
posted by Coventry at 8:06 PM on December 17, 2016


It's misleading to call it learning, I think. What it looks like to me is people designing a machine that applies certain statistical methods known to humans to produce certain results being fed with lots of data records. At no point is the machine doing any original thinking--it's just applying bits of pre-existing human ideas in a prescribed way, mechanically, so you can apply human intelligence faster and more uniformly. But wherever there are still gaps in human understanding of the problem space, there will be flaws in the design the machine won't know about or care about. And it will never be able to correct itself. I think humans and natural intelligences sometimes can.
posted by saulgoodman at 8:53 PM on December 17, 2016 [1 favorite]


That's scary and risky, because machines can apply bad human judgment to a lot of things really efficiently.
posted by saulgoodman at 8:57 PM on December 17, 2016 [1 favorite]


Machine Learning has really gone downhill since the US started up that "No CPU Left Behind" program. It was supposed to enhance broad learning, but it has predictably turned into teaching to the Turing Test.

It's terrifying to imagine that if this keeps up, previously stable jobs for self-driving semi trucks, warehouse robots, and supermarket checkout machines will dry up and blow away and the poor things get replaced by humans.
posted by sebastienbailard at 2:20 AM on December 18, 2016


This... conflation around "machine learning" reminds me of the useful conceptual difference between computers and computer science. One is an object of technology, the other is a field of inquiry ranging from the theoretical to the empirical, and at that a vast one in connection to physics and our human understanding of nature. The very label "machine learning" as used in the media is distorted, even politicized, in ways that, hopefully, science will shed better insight on in years to come.
posted by polymodus at 3:17 AM on December 18, 2016 [1 favorite]


t no point is the machine doing any original thinking--it's just applying bits of pre-existing human ideas in a prescribed way, mechanically, so you can apply human intelligence faster and more uniformly

In large part this is true, currently. But researchers at Google have set one of these loose on a bunch of stills culled from YouTube without giving it specific instructions and it figured out what a cat was.

That paper's four years old. We're really only at the beginning of this.
posted by Diablevert at 4:50 AM on December 18, 2016 [1 favorite]


artful codger I'm mostly with saulgoodman. 'AI' today is still just a growing set of pattern-matching, algorithms and explicit cases; there's very little intelligence. A tractor may be a more reliable puller than a horse, but a horse is smart enough to eat a carrot and not an orange painted iron spike. The tractor is more useful to US, but a horse is still the more intelligent.

A horse, sure. A beetle, on the other hand...
posted by yeolcoatl at 11:27 AM on December 18, 2016 [1 favorite]


Machines have trouble dealing with nature because nature is unpredictable and hostile.

I have a real problem with this anthropomorphising of machines. Machines don't have a problem dealing with nature. It's just programming done by a human who doesn't have a good understanding of nature nor whether it is possible to encapsulate nature within algorithms. It's just code. Written by humans and executed on nonsentient machines. The example of Google creating a means for a computer to grasp the knowledge of what a cat IS by processing a jillion images is laughable. It's just pattern recognition, collecting these similar things in relation to those other things that aren't very similar. Our language is the problem. We use terms like intelligent, knows, understands, learns, etc. in very sloppy ways, many times divorced from the actual circumstances of what is going on. It's code, written by humans, and run on machines. Autonomous cars are only autonomous in the sense that we turn them on and then they go, but how they go is a result of a lot of code and sensors built by fallible humans. These cars are merely just an encapsulation of a load of people's ideas of how driving works and their attempts at rationalizing it within the bounds of code. The car is not intelligent, and frankly I'm not ready to put my welfare in the hands of people and hardware. AI has been making promises for over fifty years, for me these promises have been based on a lot of pie-in-the-sky optimism coupled with a pretty shallow understanding of the area of reality they are trying to address. Plus software = bugs in my experience. Yeah our brains have bugs too. But at least I had to take a driving test listening to a surly DMV guy ordering me about...
posted by njohnson23 at 7:29 PM on December 18, 2016 [1 favorite]


I actually have friends actively working on AI, and do think what they're doing is more sophisticated than this -- e.g., a robot being able to recognize someone they've met and conversed with 2-3 times and greet by name on the fourth encounter -- but we've still got light-years to go before actual Skynet.

The "gaming the system" errors and poisoning issues are, to me, far more insidious and likely to cause damage than actual AI itself.
posted by Unicorn on the cob at 9:03 PM on December 18, 2016


I have a real problem with this anthropomorphising of machines.

Me too! Take robot service technicians on automated support hotlines. They pretend to have feelings. They say things like "Your call is important to me," "Have a nice day," "Your satisfaction is my first priority," or whatever else they might be modeled to pretend to say, but they don't really care or feel and we all know it. There's an eerie disconnect in that, an uncanny valley effect for a lot of people that's socially alienating and accustoms us to expecting that words don't really mean anything or have real weight.

And weak AI is nothing more than ordinary, human design and engineering practices embodied in code, with the only intelligence involved being the human intelligence of the logical machine's designers and the human produced content stored and used as data.

Any automobile with a mechanical dead man's switch is an example of the same basic engineering principles as produces weak AI, only with more elaboration and expressed in code and in the ad hoc electrical circuits they define and control instead of in physical materials.
posted by saulgoodman at 10:59 AM on December 19, 2016 [1 favorite]


« Older Cat burglars   |   Time to put your feet up Newer »


This thread has been archived and is closed to new comments