What's gonna happen outside the window next?
November 18, 2012 1:51 PM Subscribe

Noam Chomsky on Where Artificial Intelligence Went Wrong
posted by cthuljew (55 comments total) 40 users marked this as a favorite

tl;dr: They stopped listening to Chomsky's colorless green ideas.
posted by erniepan at 2:12 PM on November 18, 2012 [8 favorites]

"Went wrong?" Wait, what? The uprising has started already?!
posted by Ghidorah at 2:19 PM on November 18, 2012 [1 favorite]

Once we all start failing the turing test the shoe will be on the other foot.
posted by Artw at 2:22 PM on November 18, 2012

Excerpt from page 3:

Well, we are bombarded with it [noisy data], it's one of Marr's examples, we are faced with noisy data all the time, from our retina to...

Chomsky: That's true. But what he says is: Let's ask ourselves how the biological system is picking out of that noise things that are significant. The retina is not trying to duplicate the noise that comes in. It's saying I'm going to look for this, that and the other thing. And it's the same with say, language acquisition. The newborn infant is confronted with massive noise, what William James called "a blooming, buzzing confusion," just a mess. If say, an ape or a kitten or a bird or whatever is presented with that noise, that's where it ends. However, the human infants, somehow, instantaneously and reflexively, picks out of the noise some scattered subpart which is language-related. That's the first step. Well, how is it doing that? It's not doing it by statistical analysis, because the ape can do roughly the same probabilistic analysis. It's looking for particular things. So psycholinguists, neurolinguists, and others are trying to discover the particular parts of the computational system and of the neurophysiology that are somehow tuned to particular aspects of the environment. Well, it turns out that there actually are neural circuits which are reacting to particular kinds of rhythm, which happen to show up in language, like syllable length and so on. And there's some evidence that that's one of the first things that the infant brain is seeking -- rhythmic structures. And going back to Gallistel and Marr, its got some computational system inside which is saying "okay, here's what I do with these things" and say, by nine months, the typical infant has rejected -- eliminated from its repertoire -- the phonetic distinctions that aren't used in its own language. So initially of course, any infant is tuned to any language. But say, a Japanese kid at nine months won't react to the R-L distinction anymore, that's kind of weeded out. So the system seems to sort out lots of possibilities and restrict it to just ones that are part of the language, and there's a narrow set of those. You can make up a non-language in which the infant could never do it, and then you're looking for other things. For example, to get into a more abstract kind of language, there's substantial evidence by now that such a simple thing as linear order, what precedes what, doesn't enter into the syntactic and semantic computational systems, they're just not designed to look for linear order. So you find overwhelmingly that more abstract notions of distance are computed and not linear distance, and you can find some neurophysiological evidence for this, too. Like if artificial languages are invented and taught to people, which use linear order, like you negate a sentence by doing something to the third word. People can solve the puzzle, but apparently the standard language areas of the brain are not activated -- other areas are activated, so they're treating it as a puzzle not as a language problem. You need more work, but...
posted by Brian B. at 2:29 PM on November 18, 2012

Being smart, it turns out, is easier than being smart in the way meat-brains are smart.
posted by zippy at 2:39 PM on November 18, 2012 [3 favorites]

I thought this was a really interesting interview. I really like some of what he said about approaching biological systems. Though, I still have reservations about his arguments against bayesian analysis.
posted by SounderCoo at 3:12 PM on November 18, 2012

I think the key phrase I hear him repeating in this interview and his earlier talk is 'modeling unanalyzed data.' This is a nonsensical phrase if you think of modeling as something you do with systems. That you're trying to computationally instantiate a system which has the same model as another system. It assumes that the 'best' model is one that most faithfully converts one system to another. However, what newfangled statistical analyses are doing skips the system-based modeling step entirely. This approach assumes that the 'best' model is as complicated as the simplest model that could produce the observed data. We are exchanging understanding for power. It's easy to see why this approach draws practitioners and other resources away from systems-based modeling.
posted by persona at 3:21 PM on November 18, 2012 [5 favorites]

Artificial Intelligence on Where Noam Chomsky Went Wrong.
posted by usertm at 3:33 PM on November 18, 2012 [11 favorites]

I think you pasted wrong.
posted by LogicalDash at 3:39 PM on November 18, 2012

I think not.
posted by Cookiebastard at 3:58 PM on November 18, 2012 [3 favorites]

I'm a machine, I can't think.
posted by onya at 4:24 PM on November 18, 2012 [1 favorite]

Brilliant interview. So much good stuff in here. No surprise: he's the most brilliant person in the world.
posted by painquale at 4:26 PM on November 18, 2012 [1 favorite]

A terrible headline, I think. So far as I can see, Chomsky doesn't say that AI "went wrong"; he says it "maybe not abandoned, but put to the side, the more fundamental scientific questions" in order to concentrate on "specific goals". That is, it's not trying to create a complete brain, which was premature, but focusing on building up a repertoire of simpler, useful functions.

I think Chomsky's gone off the rails on his own specialty, syntax, so I was almost disappointed that the interview didn't touch on anything I disagreed with. :) I don't think the interviewer gave any evidence of knowing what Chomskyan syntax is about or what's right and wrong about it. He mostly seems to be probing Chomsky's opinion on subjects out of his (Chomsky's) area of interest.
posted by zompist at 4:31 PM on November 18, 2012 [1 favorite]

Contrasting view from Eugene Charniak: The Brain as a Statistical Inference Engine—and You Can Too.

I don't know all the details, but in early Natural Language Processing - getting computers to understand human language - the primary model was based on "symbols" which were universal in human languages, and of which words and sentences were only expressions (the Wikipedia article's history section mentions this briefly). Around '90, though, people tried statistical systems - to oversimplify, you basically just look for statistical patterns and infer meaning from them - and the results blew everything that had come before out of the water. Charniak was one of the first to take the new approach, and he and the field basically threw away the preceding thirty years of research and haven't looked back. Rules are still critical in some systems - particularly translations between very unrelated languages - but in addition to achieving more practical results, the statistical models do a much better job approximating human results than Chomsky would have you believe. There's also an explanation - backed up with experimental results - of a statistical basis for the children's ability to pick out language mentioned in Brian B.'s pullquote above.

Full disclosure: Charniak was my Master's advisor, and he has excellent taste in bow ties.
posted by 23 at 4:41 PM on November 18, 2012 [7 favorites]

I love how almost every interview with Chomsky (the most brilliant person in the world), always has a 4-6 paragraph intro telling us how awesome he is. Great article!
posted by vkxmai at 4:48 PM on November 18, 2012 [1 favorite]

It's interesting that Sydney Brenner is also mentioned. I saw him speak a few years ago, and he went on about how genomics is completely misguided, will never provide concrete answers or even hypotheses, and generally derided the audience he was speaking to, though with at least a hint of good nature. He then went on to prescribe a course of X, Y, and Z, which just happened to be exactly what genomics is. He even drew a plot on the chalkboard as an example of the types of analysis that he thought people should be doing, which was exactly the type of genomic analysis that could have been presented at the lab meeting of anyone in the room.

This is exactly mirrored by Chomsky when he says that children are clearly not doing probabilistic analyses, that's obviously the wrong way, and then in the following sentences details precisely the sorts of probabilistic analyses that would describe the process of learning language. It's as though when recent work had been presented they get hung up on a particular word at the beginning of a talk that doesn't fit the direction of research that they had been hoping to leave to the world,then don't pay attention to the meat of the rest of the science because they may not get the intellectual credit for the direction that the field is going. (This is quite common among the elder-scientist set, I believe.)

In short, there's probably a tendency that as one ages and sees the field advance, and one naturally becomes more curmudgeonly, to be convinced that those whipper snappers are doing it all wrong and they should be listening to your wisdom, and this tendency seems to override your ability to actually listen and learn from them.

As someone who learned linguistics in classes where Chomsky was referred to as "the Great One" but only half in jest, Chomsky's anti-machine learning tirades have truly caused the scales to fall from my eyes. I find little insight, many errors of fact, and lots of self-congratulatory back patting in his recent forays into AI commentary. It makes me fear for his legacy. Will Chomsky become a Freud, much beloved but nearly all of his theory discarded? It's seeming quite likely to me.
posted by Llama-Lime at 4:56 PM on November 18, 2012 [11 favorites]

I am too stupid to understand anything in that article.
posted by roboton666 at 5:26 PM on November 18, 2012

And to Chomsky I say:

"You mad bro?"

And this is coming from somebody who majored in cognitive science with a traditional emphasis on modular cognitive systems, and linguistic ideas like the universal grammar.

But see, the thing is, that approach just wasn't working. It wasn't yielding results. AI research hasn't gone wrong, it's just gone in a different direction then Chomsky's entire cognitive model. Bayesian modeling and statistical approaches in machine learning work. I was going to point out Google's great success in this area, but several of their key researchers are involved in this debate and quoted.

AI has undergone a Kuhn-level paradigm shift, and Chomsky is on the other side of that shift.

That's what's happening here. Almost exactly like Einstein vs. Quantum Mechanics. The following quote could come from either Einstein or Chomsky:

"Some physicists, among them myself, cannot believe that we must abandon, actually and forever, the idea of direct representation of physical reality in space and time; or that we must accept the view that events in nature are analogous to a game of chance."

Another great quote from Einstein, that reflects what I think are some of Chomsky's hidden biases in this, is:

If one wants to consider the quantum theory as final (in principle), then one must believe that a more complete description would be useless because there would be no laws for it. If that were so then physics could only claim the interest of shopkeepers and engineers; the whole thing would be a wretched bungle.

Google is a hell of a shopkeeper built by some great engineers. But their approach isn't (usually) one based on building knowledge-based systems that reason via rules and the like. It's frustrating working with models and techniques like Support Vector Machines in Machine Learning where the end result is a set of features that have no meaning. But damn, they work. .

The ironic part is a lot of the statistical models resemble the early work on connectionism and things like Perceptrons.

Where AI went wrong... whatever. I want to lock Chomsky and Siri in a room together and see if she'll convince him on who went wrong.
posted by formless at 5:32 PM on November 18, 2012 [11 favorites]

Siri is a pretty crappy representative for AI.

Google autocomplete, translate, or voice (or even Cleverbot, for that matter) is a pretty good representative. And I am sorry to say this, but all of them are more useful on a more regular basis to me personally than any of Noam Chomsky's ideas, brilliant as he may be.

There are plenty of so-so minds in science that were simply in the right place at the right time, and it doesn't change the fact that they led revolutions. Larry Page and Sergey Brin may or may not be as bright as Chomsky, but they have changed the world in a way he never could.

Truthfully, AI is irrelevant. IA -- intelligence amplification -- is the real prize, and that's what tools like most of Google's offerings have afforded us. You can use the Internet to become an idiot, a savant, or an idiot savant -- modern "AI" tools make it quicker to get to your destination.

Incidentally, this isn't limited just to Internet-centric data -- most large biological and financial datasets are made more tractable and useful by the same techniques. And Bayesian model averaging wins out over model selection, in the long term, for one simple reason -- the world changes, and if your models can't, you lose. This is not a trivial matter, as it happens; it can be the difference between "rich" and "broke".
posted by apathy at 6:00 PM on November 18, 2012

Chomsky vs. Siri would be good for Chomsky because Siri is a piece of shit. It's easy to criticize the direction of AI for no other reason that the field keeps failing to meet the goals it sets, changes them, and talks about new incipient victories.

I mean, we'll eventually get excellent q and a machines through big data, but this is a retreat from past promises of intelligent information agents in the 90s, which were themselves a retreat from promised AGI earlier. What we are getting is now not just functional simulation, but theatre; Siri as portrayed in ads is a literal fantasy designed to pressure us into pretending it is better than it is after we pay for it. It does, however, subtly train users to talk in ways it can process. This calls to mind an observation by Jaron Lanier that AI will succeed in part by making humans more machine-like. And certainly, there are folks out there that wish it were so.
posted by mobunited at 6:02 PM on November 18, 2012 [6 favorites]

Expert predictions of AI not significantly different than non-expert, not significantly different than past wrong predictions either.
posted by mobunited at 6:07 PM on November 18, 2012 [1 favorite]

In arguing against statistical approaches, Chomsky seems to be coming from the standpoint of Searle and his Chinese room thought experiment—arguing that functional intelligence is not equivalent to true intelligence. It's an intuitively pleasing argument, but it's also deeply flawed. The trouble is that no one ever seems able to precisely define what true intelligence is, other than to say that humans have it and computers don't.
posted by dephlogisticated at 6:07 PM on November 18, 2012

> First of all, first question is, is there any point in understanding noisy data?

Alright, that's it. If anyone takes this question seriously, I've nothing to add.

The world is noisy. Our information is incomplete. Decisions must be made before complete information can be collected. Therefore, yes, not only is there a point to it, it's the single most important point for "IA" systems.

Unbelievable. Might as well ask "is there any point in breathing?".
posted by apathy at 6:08 PM on November 18, 2012 [2 favorites]

In arguing against statistical approaches, Chomsky seems to be coming from the standpoint of Searle and his Chinese room thought experiment—arguing that functional intelligence is not equivalent to true intelligence.

No, Chomsky's concern is not Searle's. Chomsky thinks intelligence is computational. If anything, Searle's argument is an argument against Chomsky's approach even more than it is an argument against statistical analysis.
posted by painquale at 6:12 PM on November 18, 2012 [3 favorites]

My obligatory link to the Chomskeybot seems more relevant than ever in this thread.
posted by charred husk at 7:01 PM on November 18, 2012

Wouldn't statistical approaches work just as well even if Chomsky's account of language or mind was correct?
posted by chortly at 7:28 PM on November 18, 2012 [1 favorite]

Well, he concedes that statistical modeling is effective, but, it seemed to me, his point was that this kind of work abandons what he sees as a primary goal of science: to understand how systems and things work, not just to be able to predict their output. Obviously, that's a position that people could disagree with.
posted by thelonius at 7:33 PM on November 18, 2012 [3 favorites]

Yeah, the title of this article was fairly terrible and is leading people here to misunderstand Chomsky. It's natural to think of AI as an engineering problem, but Chomsky is fine with people using statistical models to solve engineering problems. In fact, he has some good thing to say about Bayesianism. He is just upset that statistical models have overtaken cognitive science and pushed computational models to the side. In the interview, he mentions that it's fine to use statistical models to predict the weather, but it would be terrible if meteorologists all used statistical techniques and abandoned the study of the atmosphere entirely. He sees something like this going on in psychology.
posted by painquale at 7:47 PM on November 18, 2012 [5 favorites]

In fact, he has some good thing to say about Bayesianism.

This is just Pascal's Wager over again, only with computers.
posted by Slap*Happy at 8:24 PM on November 18, 2012

No, Chomsky's concern is not Searle's. Chomsky thinks intelligence is computational. If anything, Searle's argument is an argument against Chomsky's approach even more than it is an argument against statistical analysis.

They both assert that the nature of intelligence hinges on process rather than functional ability. Searle doesn't refute the mechanistic nature of the brain, or even the possibility of machine intelligence in principle. But he insists that truly intelligent AI must be based on the design of the brain. Otherwise, it can only simulate comprehension, not create it.

Chomsky is not quite so rigid in his thinking (and, importantly, says nothing of consciousness), but still seems to think that true intelligence requires operational homology to the human brain—in the abstract rather than physical sense.

I think most researchers in the field would agree that understanding how the brain works would be the ideal starting point for designing AI. But Chomsky seems to trivialize the difficulty of top-down unification. It's easy to criticize reductionism, but damn near impossible to advance understanding in the field using any other paradigm. Grammar may be amenable to rational formalization. Abstract thought, not so much.
posted by dephlogisticated at 8:57 PM on November 18, 2012 [2 favorites]

The New York Review of Books
End of the Revolution
FEBRUARY 28, 2002
John R. Searle
New Horizons in the Study of Language and Mind
by Noam Chomsky
(This link is behind a paywall.)

Searle:
Chomsky insists that the study of language is a branch of natural science and the key notion in his new conception of language is computation. On his current view, a language consists of a lexicon plus computations. But my objection to this is that computation is not a notion of natural science like force, mass, or photosynthesis. Computation is an abstract mathematical notion that we have found ways to implement in hardware. As such it is entirely relative to the observer. And so defined, in this observer-relative sense, any system whatever can be described as performing computations.

Chomsky’s Revolution
April 25, 2002
Sylvain Bromberger, reply by John R. Searle
(no paywall)

Chomsky’s Revolution: An Exchange
July 18, 2002
Noam Chomsky, reply by John R. Searle
(no paywall)

Chomsky:
The long-term goal has been, and remains, to show that contrary to appearances, human languages are basically cast to the same mold, that they are instantiations of the same fixed biological endowment, and that they “grow in the mind” much like other biological systems, triggered and shaped by experience, but only in restricted ways.

Searle:
It is often tempting in the human sciences to aspire to being a natural science; and there is indeed a natural science, about which we know very little, of the foundations of language in the neurobiology of the human brain. But the idea that linguistics itself might be a natural science rests on doubtful assumptions.
posted by Golden Eternity at 9:45 PM on November 18, 2012 [1 favorite]

What a great mind is here o'erthrown! It's rambling and unfocused, and he makes odd slips of the tongue:

Like maybe when you add 7 and 6, let's say, one algorithm is to say "I'll see how much it takes to get to 10" -- it takes 3, and now I've got 4 left, so I gotta go from 10 and add 4, I get 14. That's an algorithm for adding -- it's actually one I was taught in kindergarten.

Here's another:

But there are other ways to add -- there's no kind of right algorithm. These are algorithms for carrying out the process the cognitive system that's in your head.

And another:

The way they did it was -- of course, nobody knew anything about photosynthesis -- so what you do is you take a pile of earth, you heat it so all the water escapes. You weigh it, and put it in a branch of a willow tree, and pour water on it, and measure you the amount of water you put in. When you're done, you the willow tree is grown, you again take the earth and heat it so all the water is gone -- same as before.

Also, he's talking about science and its failures to account for data, but I really don't think he understands the things he's talking about. Look at his references to Peano's axioms - yes, the axioms do not describe a process for addition. They're not supposed to; that's not what axioms are. Similarly, a 2d representation of a 3d scene may be consistent or inconsistent with the axioms of geometry, but the axioms don't tell you how the drawing was made. But he jumps from that to his assumption that the mental algorithms we use don't have any sort of deeper algorithmic layer instead of asking whether in fact you can have algorithms about algorithms - which you can, and we use them (in our computers) every day; and this has really fascinating implications for the connection between the way we think and fundamental arithmetic. But he's oblivious to this, despite the fact that it's been around since Kurt Gödel in the 1930s, if not earlier.
posted by Joe in Australia at 10:03 PM on November 18, 2012 [1 favorite]

There's a distinction that people in computer science don't make. Or, more accurately, there are two VERY different ideas that they conflate. And I can clearly see that conflation happening in this thread.

The two ideas are what I term "artificial intelligence" and "artificial consciousness". The former is the attempt to make computers do complex tasks. And we're very, very good at it. Just look at Google. Making computers smart has been a hugely successful undertaking. The latter refers to making computers work like human brains work. We have not been even a little successful at that, because we have absolutely no theory of how the human brain works. It doesn't matter if your statistical models get within 99% of modeling human language. Great. That's awesome. Obviously, a lot of the lexical and parametric elements of language have to be learned, which implies some amount of statistics going on. No problem. But the idea that garden-path sentences and deep embedding and inherent ambiguity are just artifacts of a fundamentally statistical system is unlikely in the extreme. More importantly, though, no matter how successful a statistical system is in predicting the grammaticality of a given sentence, it won't even be able to tell us anything about how those weird elements of our grammar (garden paths, etc) arise.

Computers are really, really, really good at letting you do tons of calculations on tons of data. Of COURSE you're going to get really cool results when you apply that power to language. Just like you've gotten cool results when you apply it to pretty much everything in the world. But comparing the success of brute-force calculation over explicit modeling (given that we don't actually have a model for natural language worked out yet) to the success of quantum mechanics over Einstein's objections completely misses the point. Statistical analysis of cognition and language doesn't have anywhere near the predictive value of QM, which had a sophisticated mathematical underpinning based on well-understood properties of the world. Chomsky isn't objecting to an interpretation of real world data and successful models. He's objecting to trying to "explain" the world with no model whatsoever! His basic point, I think, is equivalent to saying that what a cognitive scientist wants to do is obtain Boyle's Law. What a computational, statistical analysis wants to do is explain where all the molecules are likely to be when you look at a gas. You might become really good at describing clouds of gas, but you haven't derived any general principles about the world. And, again on QM, saying "particle behavior is essentially random" is not the same as saying that "particle behavior is not essentially rule bound". In fact, we talk about electrons vs quarks specifically because we have a really solid model of reality. We don't just look into a collider and say, "well, here's what each of our sensors read, let's do a bunch of statistics on it and whatever pops out of it will be what we publish." No. We say, "Here are our predictions about what sorts of particles we see, and these are the sensors that should light up if we see them." And if our readings disagree with our predictions, we come up with new models of particles. We don't just shrug and say, "Wow, gee, look at how wonderful our statistical analysis of particle decay is!"

Point is, cognitive science needs a model if it's ever going to be useful in explaining how brains work. (Note: NOT "useful" in the sense of convincing a bunch of computer-crazed scientists that they can get some really exciting papers out of their endeavours.) Once we have a model, I'm sure lots of statistical analysis will be required to work out the details of how brains actually do things in specific cases. But you need to be talking (read: doing statystical analysis of) about something, not just talking.
posted by cthuljew at 11:31 PM on November 18, 2012 [6 favorites]

I should add one more thought: The question isn't whether or not you're getting better results. The question is, what does "results" mean?
posted by cthuljew at 11:53 PM on November 18, 2012

thelonius, cthuljew: The thing by Charniak I linked to outlines his hunch that the human brain uses a particular kind of algorithm for learning language, and that learning from a large amount of training input (everything we hear before we can talk) is how we learn to use language.

I understand there's an argument that "statistical results are great so who cares if that's how humans think or not", but there's no point in arguing that - saying statistical methods are highly successful isn't a concession, it's just acknowledging the reality of Google and other things we see every day. The point here is that statistical methods aren't just good replicators or predictors, but that as we find more accurate methods they can in fact teach us about how the brain functions.
posted by 23 at 12:47 AM on November 19, 2012

The latter refers to making computers work like human brains work. We have not been even a little successful at that, because we have absolutely no theory of how the human brain works.

That's really the essential argument here. I can't argue in good faith that systems like Google emulate human intelligence, because on an intuitive level I don't believe that to be true. But as long as our knowledge of human intelligence can be approximated by a large fuzzy question mark, we have absolutely no real measure of determining how close we are to producing human-like intelligence beyond raw functional ability. In other words, how closely does the output of our AI resemble human intelligence? It's an imperfect measure, but it's literally all we have aside from intuition.

I don't think we can assume that we could readily distinguish between human intelligence and an infinitely-optimized algorithmic simulation thereof, simply because the two operate on different mechanistic principles. Who are we to say that a brain is intelligent and a statistical algorithm is not, if the two produce results that cannot be distinguished? Dennett would likely argue that the two are equivalent.

Of course it goes without saying that what everyone really wants to know is the true meaning of intelligence, consciousness, and, well, meaning. But we aren't anywhere close to having those answers, and no one has a good idea how to get there. So in the meantime, we're left with practical approximation. And, who knows, maybe if we approximate human intelligence close enough, we'll find that we actually created something like the phenomenon we're searching for, even if we don't quite understand how we got there.
posted by dephlogisticated at 12:53 AM on November 19, 2012

Of course it goes without saying that what everyone really wants to know is the true meaning of intelligence, consciousness, and, well, meaning.

I would beg to differ.
posted by 23 at 1:03 AM on November 19, 2012 [1 favorite]

As a practicing cognitive scientist with an interest in language, the annoying thing about this debate (and many people's framing of it, including Chomsky's) is that he seems to assume that "statistics" just means "without a model." But that's just wrong - in fact, it's incoherent. "Statistics" is meaningless unless you have some notion of what you are computing your statistics over: and once you do, that is an implicit model of what you think the brain conceptualises.

In practice, Chomsky and these discussions (with the comparisons to google and modern AI) seem to implicitly assume that the model is n-grams -- that is, that the statistics in question are the statistics of which words (or, sometimes, morphemes) co-occur. Although there is evidence that people are indeed sensitive to those kinds of statistics, it is also indubitably true that the human brain does far more than calculate and use n-grams. Thus, any model that fundamentally relies on statistics over n-grams is not going to be a good explanation of the human brain. In that Chomsky is absolutely correct.

What is massively misleading, though, is few cognitive scientists actually ascribe to that simplistic of a view. One of the fundamental questions, if not the fundamental question, is what other things (besides n-grams) statistics can be calculated over. From that follow other major questions: to what extent people are sensitive to those statistics, what assumptions people make about those statistics, and to what extent that sensitivity and those assumptions explain complex behaviour like language. There are many researchers (full disclosure, I am one) investigating these questions across a wide variety of instantiations of "things" statistics can be calculated over, from phonemes on up to phrase structure. It is not a small research area, and yet all of Chomsky's rhetoric about statistics seems to entirely ignore the existence of this work. (Either that, or he fails to explain -- or I'm unclear about -- how his critiques are relevant).

So the same debate keeps occurring, even though the points he is making don't apply to a lot of what is actually being done and what actually falls under the umbrella of statistical learning in cognitive science. In short, he is railing against a straw man that maybe describes most research in the 80s, but certainly doesn't describe it now.
posted by forza at 1:42 AM on November 19, 2012 [8 favorites]

If you enjoy this debate, there's a fascinating AI dissertation by Gary Drescher, Made-up minds: a contructivist approach to artificial intelligence (also on Archive.org) where he builds a model of an infant's world, gives the learning agent the power of statistical observation, and then sees how far it can get in terms of cognitive development (Im think in Piagetian stages, but I may be wrong).
posted by zippy at 7:26 AM on November 19, 2012

Constructivist ... I.

I have no one to blame but my non-sentient touchscreen device.
posted by zippy at 8:34 AM on November 19, 2012

Otherwise, it can only simulate comprehension, not create it.

I'm not certain the Searle's argument is even relevant to the statistical inference approach. If the Chinese room appears to all tests to be intelligent, who is to say it isn't? If a "simulated" AI started to produce great art or conduct novel research, how is that different from a structural "native" intelligence? I've never understood how or why that difference should be made.

It brings up really interesting questions about where intelligence resides and souls and such, but, functionally, Searle's arguments about simulated comprehension have always seemed to me to be missing the point.
posted by bonehead at 8:36 AM on November 19, 2012

Fluid Concepts And Creative Analogies: Computer Models Of The Fundamental Mechanisms Of Thought, Douglas R. Hofstadter, Basic Books, 1996

cf. Artificial Superintelligence, Azamat Abdoullaev, F.I.S. Intelligent Systems, 1999
posted by ob1quixote at 10:29 AM on November 19, 2012

Searle's Chinese Room argument, and I think I would call it a provocative position rather than an argument, or a straw man if I were feeling less charitable and had not had coffee yet, seems to me to be this: if we had an understandable model of intelligence, it wouldn't be intelligence ... BECAUSE HOMUNCULUS.

Which I think is equivalent to saying "people cannot be smart, because neurons aren't smart."

I share the desire to understand human intelligence, but I also think that systems that behave intelligently and are constructed upon more regular and more easily understood components than brains are still intelligence. I'm just waiting for one that can hold up its end of an interesting conversation.
posted by zippy at 12:56 PM on November 19, 2012 [3 favorites]

In any case the dichotomy that Chomsky invokes is a thing of the past. It is true that the field has changed in character over the past decades but it is pointless to suggest that the field is moving in the wrong or right direction from the vantage point of a debate that happened twenty or thirty years ago, when hardware and software development just started taking off. The way Chomsky frames the subject does not help to understand the current state of the field or the choices it faces at all. It's not even very clear what he's arguing for or how that (fundamentally) differs from what is going on. He holds up a few unrepresentative examples as typical for the field and engages with the interviewer in a little strut about "science vs. engineering" that seems like it belongs in a bull session. Still that doesn't mean Chomsky's appeal is entirely unreasonable. It makes sense on an emotional level, because there is admittedly something unsatisfactory about the results we're getting. What Chomsky expresses is just the impatient curiosity that's behind all research, and, coming from him, that's not without value.
posted by deo rei at 2:43 PM on November 19, 2012 [3 favorites]

Is there an article out there that lays out how Chomsky is wrong in his criticisms of modern statistical methods? Like, what sorts of actual models people are getting for the structure of language using them? Because the problem seems to me not that statistical, computer-powered methods can't reproduce language; maybe they can — that's basically an empirical question. The problem seems that, once we have these perfect statistical models in hand, we won't actually know anything about language. So, could someone actually explain in concrete terms what we're learning about the structure of language from statistical modeling, rather than just the fact that we're learning that language can be modeled using statistics?
posted by cthuljew at 12:32 AM on November 20, 2012 [1 favorite]

I'm going to throw this in here: Peter Norvig on Chomsky and Statistical Learning.
posted by ianso at 3:24 AM on November 20, 2012 [2 favorites]

ianso: I read that article the other day, and all I can say is that it's embarrassingly bad. He uses the success of search engines and speech recognition engines to argue for the success of statistical methods, which completely ignores all of Chomsky's problems. He makes the (admittedly circumspect) accusation that "Chomsky's theories" have not generated as much money, which completely misses the point of doing basic science. Then he goes on to cite a bunch of papers in extremely well-understood, thoroughly modeled, hundreds-of-years-old sciences which use statistical methods to show extremely specific and narrow (but nevertheless valuable) results as some sort of proof that the half-century old, not-even-barely-understood cognitive sciences should be doing the same thing. I finally had to stop reading when he said that Chomsky saying "...that the notion of 'probability of a sentence' is an entirely useless one..." is equivalent in any way to saying that the notion of a novel sentence is a useless one, which, again, completely misses Chomsky's point.

None of this is to say that it's not relevant or valuable to this discussion. Indeed, if it represents the way of thinking typical among computational linguists, then I think it shows that the misunderstandings are just as bad on that side as people here are claiming they are on the Chomskian side. Of course, it might just be bad from all perspectives, which I vaguely hope is true.
posted by cthuljew at 4:01 AM on November 20, 2012

cthuljew, I think it would help the discussion if you summarize Chomsky's point for us, since I'm not sure everyone's clear on what it is; I know I could would appreciate something more straightforward than the article, which ranged over a lot of topics.
posted by 23 at 7:06 AM on November 20, 2012

I can't help but see the biological analogy here, though limited, between cloning and natural reproduction. One is mimicking the result, while the other is burdened with explaining the difference to remain relevant.
posted by Brian B. at 7:06 AM on November 20, 2012 [2 favorites]

Chomsky's point is exemplified by his parable of the physicists at the window, which I quoted for this post's title. It's the idea that no amount of predictive power — which, Chomsky is positing, is all you get with even the most refined statistical methods — will ever explain anything. Just like a bunch of physicists being able to perfectly predict what'll happen outside their window next won't actually know anything about physics.

I like to think of it like this: Let's say you develop a mathematical model based on observations of caribou migration that can account for all apparent factors of climate, terrain, predators, etc., and that will tell you exactly where caribou will go in any given condition, and which, when you enter the conditions for that year, correctly predicts what path the caribou will take every time. That's really cool! And useful! And, if you want to hunt caribou, can make you lots and lots of money! However, have you actually learned anything about caribou? Can you explain what happens in the brain of any given caribou while it's deciding where to go next on its migration? Can you explain what keeps herds of caribou together? How leaders are determined? How individuals who get lost know how to find their herd again? Etc, etc.

The analogy, I hope, is clear enough. But just to be completely transparent, statistical analysis here is the mathematical modeling based purely on observed behavior, with no attempt to model internal processes. To explain the later concerns, you'd need ethologists, biologists, ecologists, etc, all of whom would assume that there are concrete and well-defined rules at work in the caribou individuals, which can be described systematically, and not just probabilistically. This is equivalent to the linguists who want to find abstract but definite rules that describe how language takes on the shape it does.

Even if the brain ultimately just does a bunch of statistics and then out comes the language, the very language itself still has rules! There has to be a reason you can say "how many cars did they wonder if the mechanics fixed?" but not "how many mechanics did they wonder if fixed the cars?" And the reason can't just be "because that's what a child is more likely to hear" because then you have the problem of infinite regress — that's what their parents were most likely to hear, and their parents, and so on. And if you try to get out of it by saying, "Well, back in the ur-language, some factor meant that you could get the first kind of sentence but not the second" then I get to instantly reply, "Great! Let's figure out what those reasons were!" And no statistical analysis can help!

I hope I haven't just muddied the waters even more. And I hope that this doesn't just represent a gross misunderstanding of how modern computational linguistics works. But that's why I ask above for an introduction to the topic that addresses some of these concerns.
posted by cthuljew at 8:17 AM on November 20, 2012

I can only speak for myself, cthuljew, but what I find so troubling about the "statistics can't tell us anything" argument is that it just doesn't make any sense at all. A statistical model tells us all sorts of relationships that we didn't know about before, and even better, tells us the accuracy of these relationships.

And the only way for me to make sense of that non-sensical statement is to invoke a mindset that I'm almost certainly incorrect in assuming: that Chomsky doesn't realize that he is already doing modeling. When one finishes reading (and possibly even understanding) Government and Binding Theory, one has an understanding in the form of allowable syntaxes and sentence structures and what not, but this is exactly what a statistical model also provides. You need to ask the statistical model these questions in order to find out, but they are in the model. There's nothing special about Chomskian modeling that guarantees that it represents anything going on in the human brain. In fact, if it can't match the accuracy of a statistical model, then that's a very strong argument that it's not representing anything in the human language faculty, and that what's going on in the statistical model has a better chance of matching biology.

There has to be a reason you can say "how many cars did they wonder if the mechanics fixed?" but not "how many mechanics did they wonder if fixed the cars?" And the reason can't just be "because that's what a child is more likely to hear" because then you have the problem of infinite regress

First off, I don't understand this problem of infinite regress, because presumably you'd still have that problem under Chomsky's model which (in my primitive understanding) is that there are a small number of parameters that flip to make a child's native language faculty match the language's syntax as opposed to other languages' syntax. And the same goes with words, why do we use "blue" for the color blue rather than "glue" or "stew"? Because that's what our parents used. No infinite regress problem at all. Presumably a Chomskian grammar is useful in distinguishing these two sentences because it would tell us what rules are violated by the second sentence, or that according to a generative grammar there's no possible parsing that would eventually lead to those words in that order. Presumably the statistical model is bad because it can't provide any insight into the difference between those two sentences? But that's simply not true. The statistical model that says that the second sentence is ungrammatical will tell us what particular features of that second string of words contribute to the poor likelihood of the sentence under the trained language. But even in the worst case, let's say a near black box like a radial basis function support vector machine or some model with cryptic parameters, we can still query this black box to figure out what's going on. For example, we can figure out what types of features are essential to determine grammaticality. Such complete-black-box methods are somewhat rare, and not a defining characteristic of statistical methods, and if the argument is just about these particular black-box methods it should be made against that target rather than against the broader and inappropriate target of statistical modeling. If, in the end, such black-box modeling turns out to be the only way forward in modeling language, then we will devote the effort to figuring out what's in the black box, and if we can't figure that out, we've still learned something about language: that all those non-black-box methods are not sufficient for understanding language, and that is a valuable discovery in itself, as it tells us about the minimum level of complexity needed for human grammar.

Perhaps the difference is that Chomsky sees a hard-and-fast rule "X always Y", i.e. a definitive yes and no answer, as real and useful, but a statement such as "'X always Y' with XX% accuracy on this corpus" as a non-real, non-knowledge statement. However, that hard-and-fast rule does not exist, except for in the mind of the beholder. It is an abstraction, leaky, that does not mesh well with the world, and we know this because people tried to use those hard-and-fast rules on real language and they always had XX% accuracy, not 100% accuracy. So this distinction between hard-and-fast rules and wish-washy probabilistic statements, if it's what Chomsky is saying, belies a fundamental misunderstanding of the problem as it stands. The hard-and-fast rule is as real as a unicorn, perhaps such rules exist and describe language, and it would be fantastic if we found some, but we haven't any yet, and until we do the unicorn is a fictional beast. Perhaps rather than waiting to ride a unicorn, we should take this perfectly serviceable horse for a trot and see how far it can take us on our journey, and perhaps we'll discover something about finding unicorns on the way. Maybe we never discover unicorns, and we get to our final destination a bit more shabbily than on a unicorn, but let's focus on the real goal.

Like all scientific breakthroughs, whatever truly underlies language is most likely going to be quite unintuitive, and we're going to have to come up with new abstractions (e.g. math) to deal with these insights. And the only way that we will find those unintuitive underlying truths is by dealing with real language, playing with it in our hands, turning it around and seeing how it behaves. Philosophizing without looking at real data only gets us so far, it gives us directions, guesses, and places to look, but until the philosophy recapitulates reality it is merely speculation, not science. F=MA is a sterile equation, incapable of describing the rich texture of day-to-day interaction with physical objects, yet it tells us quite a lot about what properties there are in the world, embodies several unintuitive concepts about how objects behave, and we know that it's right because it's predictive. F=MA also came with lots of new abstractions (calculus) that let us deal with the insights that it provides. F=MA was discoverable without computers because it is an exceedingly simple relationship; we already know that language is far far more complex. The only tools we have to rigorously play with language these days are computational models.

Chomsky chose rules-based computational models when he started, with good reason. Rules-based computation is what our human logical frameworks are good at, what our discrete computers are natively good at, it's what computer scientists are good at, and it's the only thing that computers could do early on when Chomsky was getting started. And it turns out that Chomsky plowed quite a bit of ground for early computer scientists, and all computer scientists learn (or should learn) the Chomsky Hierarchy when learning about the logical foundations of programming languages, as it provides helpful guideposts when dealing with all sorts of computational problems. But rules-based models doesn't seem to correspond well with language, biology, or natural processes. And as we get better at probabilistic reasoning via discrete computers, it's infiltrating everything that used to be rules-based in computer science, not just computational linguistics, but coding theory (highly discrete), prosaic but essential tasks such as email filtering, and absolutely every single bit of modeling of the natural world, from particle physics to proteins to computer vision to caribou herds. To say that language is some sort of special case that does not require statistical inference is to ignore the past 30 years of scientific discovery. To say that we can learn nothing from statistical models is, at best, a very difficult statement to parse.
posted by Llama-Lime at 11:45 AM on November 20, 2012 [5 favorites]

...people tried to use those hard-and-fast rules on real language and they always had XX% accuracy, not 100% accuracy.

But that's not what those rules did. There was absolutely no talk of accuracy, except in the purely binary sense. Those rules were hypotheses for what the structure of language was. And when we found grammatical forms that contradicted those rules, we discarded or modified them and came up with new rules that explained the new data. And we've been getting better. But we don't just look at "grammar" as being "so and so accurate across all language". We look at discrete rules as explaining each of the forms of grammatical language we've observed so far. And once new observations (that is, new data) come in, we revise our rules. And we've been pretty good at modeling a lot of stuff in language using the rules approach, and we've been getting closer to a complete model (although we're still very, very far away). (I for one am a fan of Jackendoff's approach, which is a generative model very similar to LFG, although not worked out in detail yet.)

As for infinite regress, all of the examples you mentioned are things that are determined arbitrarily. Which parameter you pick, what word sound you produce, etc., are all things that can be invented completely by a child during language acquisition with little consequence to language production. (Obviously here I mean small individual differences won't affect the child's communicative ability, while long-term changes in a population will produce significant language shifts, etc., etc.) However, there are things that are in fact universal and highly peculiar to language that are not arbitrary at all: embedding, phrase movement, non-linearity, etc. How did those patterns get established, if there are not rule-like structures in the brain that determine them specifically? And especially if human language has had more than one recent evolutionary origin (which I don't think is terribly likely, but who knows) then there's basically no chance that every language on Earth would share such features. The problem of infinite regress is where did our statistical patterns come from way back when we were still saying "Ug throw rock" and not "Ug shall now use his mighty arm to propel, like a bird soaring above the ocean, a stone into the clouds that float in the heavens."

Finally, we're not saying that we can learn nothing from statistical models. Just that we can't learn the sorts of things linguists are interested in learning — namely, what rules the brain is using to generate the highly constrained and structured language that we see.
posted by cthuljew at 1:14 AM on November 21, 2012 [2 favorites]

My specialty is statistical models, and I'm not familiar with postgrad linguistics and haven't looked at the field in a decade, so I'll cede to your judgements in many of these areas. But I want to stress that (1) a particular statistical model embodies assumptions about structure, can embody some types of rules easily, and other types of rules with greater computational difficulty, and (2) with appropriate model structure and sufficient data, a model can learn additional structural features of language which is read off by seeing that certain sets of parameters are set a certain way. Statistical models can always mimc a rules based system by setting some of the probabilities to 1 and 0. And just as rules-based grammar is always improving by finding new types of rules, we're continually improving statistical models' ability to learn. Further, statistical models provide bounds on learning rates; how much data needs to observed to learn particular things in the model. This could be a very useful tool for discovering which language structures must be innate and which can be learned.

The other thing I want to question is the assumption that the brain is using rules to generate sentences which we utter. Though it's possible that we may some day find some brain structures that mimic these increasingly complex rules, it's far from certain, and given how strict rules do not correspond well to the biology we do know, a statistically driven system seems far more likely to me.

My undergrad advisor, before he brought me with him over to computational biology, had done some great work on learning word segmentation from continuous streams of speech, an essential task for infants. When I first started discussing the model with him, I saw the random variables, distributions, and their relationships of the model and said "Aha! So this is where you put in all your pre-existing knowledge about words, about what types of things you can learn." And he replied to me "Oh no, not at all, we want to put as little knowledge as possible in, a completely blank slate if we can." And we were both right of course; he was coming off the Bayesian vs. Frequentist debates where Bayesians were always defending themselves against biased analyses and always trying to come up with uninformative priors. But as someone without that baggage, I could see the utility of shoving as much knowledge as possible into your model, so that you can learn the most new things, and only backing off if you wanted to test some of your underlying knowledge (an affliction that I have to this day). So much of this debate may be as much about the emphasis of terms as it is about particular research programs.
posted by Llama-Lime at 10:49 AM on November 21, 2012 [2 favorites]

It seems like the basic fight is that chomsky wants to know the source code to the human mind while the 'statistical' people just want to be able to write a new program that is statistically indistinguishable from a person.
posted by delmoi at 1:48 PM on November 22, 2012 [1 favorite]

When Deep Learning is covered by the New York Times, we've definitely crossed a rubicon of sorts, as the popular media almost never covers these things. Deep Learning is a revival of a very old idea, neural networks, but now empowered with much better learning algorithms, better computational power for learning, and larger data sets to learn from.

This is exactly the type of learning that I believe Chomsky is being critical of, as this type of statistical model is often inscrutable. We know that internally in the model there are some parameter groups that recognize 'square' or 'circle' in an image, and we know that the model learned those concepts entirely on its own, but it's difficult to point and say there that's what recognizes a circle.

Still, this type of model is far more inspired by the brain than, say, transformative grammars are. It seems incredibly odd to say that a grammar is inspired by a search for the source code in the brain when there's no basis for correspondences to anything in the brain. Chomskian syntax is a search for an understanding of the structures in language; the connection to the brain is at most implied. Chomskian syntax creates higher-level logical formalisms about language that take us 15-20 years to understand; and there's no reason to believe that these formalisms are representative of brain biology.

Meanwhile, Deep Learning structures are directly inspired by a very naive understanding of the functioning of neurons, and are focussed on performing the tasks that even simple brains are good at, but which computers and human logical constructions are very bad at. Deep Learning is directly breaking apart this dichotomy between our intellectual understanding of things and the intuitive understanding of things that happens in brains. If you want to talk about "source code to the brain" then you had better be looking at systems that have some sort of correspondence to the brain.

What Geoffrey Hinton is working on is methods to learn at all, and less on saying "this is how the human visual cortex works." However, there are others that are working on precisely these aspects of figuring out brain function. And when they get to language, perhaps they will work out ways to point at parts of a deep restricted Boltzmann machine (or whatever model it happens to be) and say "there, that's what recognizes a circle, and here's how it does it." But it's premature to say that we will never be able to point at that, and furthermore, that this is the wrong way to go about eventually learning that fact.
posted by Llama-Lime at 12:20 PM on November 24, 2012 [2 favorites]

« Older Alan Moore and Superfolks | Why are men so emotional? Newer »

This thread has been archived and is closed to new comments

MetaFilter

What's gonna happen outside the window next?
November 18, 2012 1:51 PM Subscribe

Tags

Share

What's gonna happen outside the window next? November 18, 2012 1:51 PM Subscribe

Tags

Share

What's gonna happen outside the window next?
November 18, 2012 1:51 PM Subscribe