Alien Knowledge
April 19, 2017 7:14 AM   Subscribe

When machines justify knowledge. The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.
posted by 00dimitri00 (27 comments total) 27 users marked this as a favorite
 
"No one denies the many benefits of metahuman science, but one of its costs to human researchers was the realization that they would probably never make an original contribution to science again. Some left the field altogether, but those who stayed shifted their attentions away from original research and toward hermeneutics: interpreting the scientific work of metahumans."
posted by Iridic at 7:27 AM on April 19, 2017 [15 favorites]


The article's link to the Mississippi River Basin model was also interesting.
posted by Brian B. at 7:29 AM on April 19, 2017 [2 favorites]


I was going to post the same link :) Now I'd jut point out that the story's title was originally The Evolution of Human Science, later mangled by the editor of Nature.

About this article, well it's quite long, and the question raised is a good one. Still I don't see why this scenario, where "science can advance even without coherent models, unified theories, or really any mechanistic explanation at all," has to happen. It certainly might, and there's always the risk of bad science (machine-generated or not), but bad science is the daily reality of scientists. Scientists eat bad science for breakfast. They've been dealing with bad science far before the so-called "post-scarce computing" age, and they'll be dealing with it in the future.

There's a lot to be worried about the abuse of machine learning, but I don't think pure science is on the high-priority list. I'd even be more worried by the abuse of AI-assisted scientometrics in building an incentive structure for researchers.
posted by runcifex at 7:32 AM on April 19, 2017 [2 favorites]


The article makes a good point, but hides it under a lot of breathy nonsense.

The good point it makes is this: there is a connection between the following two things: (a) the lack of interpretability of neural network models and (b) the biased (and sometimes horrific) results we've seen lately. On the face of it, the latter stem from human biases that enter the preparation of training sets and overseeing the model - they don't seem to have anything to do with the lack of interpretability of the model. However, interpretable models would show the incorrect bias term very clearly to a human observer, whereas a neural network might not, meaning that the mistake may not be discovered, or discovered after a long while.

tl;dr: Neural networks are not fundamentally different from other scientific models in "our long Western tradition" (puke), but their opaqueness may foment bad models.
posted by splitpeasoup at 7:37 AM on April 19, 2017 [13 favorites]


tl;dr: gigo
posted by tspae at 7:44 AM on April 19, 2017 [5 favorites]


Yeah, as someone who teaches research, I often run into people who seem to think that as long as the fact turned up is correct, the research was successful, even if the source is otherwise biased, incomplete, or just largely wrong. If you don't know how the model works, how do you know when the result is an error? I guess you can wait for catastrophic obvious failure, but that's not a super workable approach…
posted by GenjiandProust at 7:45 AM on April 19, 2017 [3 favorites]


It's not just GIGO. A fairly significant part of science is identifying all the implicit I's in GIGO, where hidden, possibly fallacious assumptions may lie. Statistical techniques can in this regard be a force of good or evil, depending on how you approach it. You can lie or debunk lies with statistics, and so can you bury or uncover insights with data.
posted by runcifex at 7:49 AM on April 19, 2017 [3 favorites]


...whereas a neural network might not, meaning that the mistake may not be discovered, or discovered after a long while.

My worry is that neural networks will start being seen and used in circumstances where poor training data (or synaptic weights or whatever) will start getting people killed. The issue is that, unlike the Therac-25, we may not ever know that it's happening, because the underlying rule set for decisions is completely opaque.

Say a diagnostic neural net is consistently incorrectly prescribing treatment for a subset of the population that it serves. It may be serving the overall population at a rate equal to or better than another person or expert system, at a cost of consistently harming that subset. How would you ever know?
posted by leotrotsky at 7:50 AM on April 19, 2017 [6 favorites]


We need neurl network that can decide whether or not to believe the hype.
posted by thelonius at 7:53 AM on April 19, 2017


Hasn't science worked with completely incorrect models during most of history, anyways? This just seems like the equivalent of pre-quantum-mechanics chemistry, with quantum phenomena essentially performing calculations in the course of experiments which the scientists were completely unaware of. Just more layers of abstraction to get through before "true" "understanding" is arrived at and you realize that aether and vortex atoms were a dumb idea.
posted by XMLicious at 7:58 AM on April 19, 2017


Here's some half-bakery that I've been thinking about along these lines: At one point, all knowledge was encoded and passed on via DNA. Then some DNA figured out how to make neurons, and the accumulation and transmission of knowledge was gradually taken away from DNA and given to bundles of neurons. All that was left for DNA to do was to build the neuronal machinery and give it crude hormonal directions now and then.

We're at the edge of a similar transition; if it keeps going, all that'll be left for us to do is to build the computational machinery and give it crude instructions now and then.

On the other hand... I've been to a bunch of bioinformatics seminars, and it seems that we've got a long way to go. Mostly in data collection. There'll be one or two clear-ish results - or at least "this method is slightly more efficient than that older method" - but there's always a large area of results where the machine says "dunno". Or it says, "I know for sure!", but then another machine says, "No, you're definitely wrong." The problem is that both machines are making what amount to educated guesses based on insufficient data. It might seem that we have a lot of data - billions of nucleotides per cell, millions of SNPs! - but those are independent(-ish) input variables, and the more input variables you have the more samples you need to make sense of things.

And those are just the variables which it has suddenly become cheap and easy to collect. There are still a ridiculous number of environmental variables which are too expensive to collect in large enough numbers, over a long enough time period, to give the machines something useful to chew on. So mostly the machines have to torture the data, hoping that the rack and the wheel will produce a useful confession.
posted by clawsoon at 8:00 AM on April 19, 2017 [1 favorite]


Remember that it's all math and mostly math that's been around for decades and centuries. The recent explosion of ultra cheap/fast computer memory has allowed models that needed to be averaged, now attacked more discretely. But given that it's math, some folks (probably at a whiteboard as we type) will do math magic, find theorems and lemmas and reductions that transform some currently hard hard stuff into a new graspable patterns.

Also all the "big data" models seem to have intense human "tuning" of parameters, there are likely vastly more failed neural nets that are swept into the bit bucket never to be tallied in the success/failure game. A single net that shows incredible insight may be less impressive if there were tens of thousands of failures.

So new incredible tools, not the Singularity.
posted by sammyo at 8:03 AM on April 19, 2017 [4 favorites]


neural networks will start being seen and used in circumstances where poor training data (or synaptic weights or whatever) will start getting people killed

To the extent such systems are already being used to (attempt to) predict likelihood of recidivism to assist in determining sentence length in our dangerous prison systems, they surely already have.
posted by praemunire at 8:13 AM on April 19, 2017 [5 favorites]


"Nothing emerges from this mass of contingencies, except victory against humans."

Oh, good.
posted by alrightokay at 8:24 AM on April 19, 2017 [2 favorites]


Wasn't this a WIRED editorial a few years ago?
posted by Going To Maine at 8:39 AM on April 19, 2017


The article's link to the Mississippi River Basin model was also interesting.

There was (of course) an amazing 99% Invisible episode about that. Totally worth a listen.
posted by The Bellman at 8:43 AM on April 19, 2017 [1 favorite]


So I build mathematical models for predictive purposes, and my experience leads me to be substantially skeptical of these approaches, at least in some domains and data sets. The model building process I go through is slow and deliberate because data correlations produce a number of unreasonable effects that may improve the fit in the modelled data, but will lead to obviously incorrect answers in potential future scenarios.

One example is trying to predict what mode of transport a person travelling from point i to point j will take. There are a number of elements in that decision, but they are tightly correlated - the time it takes to walk, to cycle, to drive and to take transit are all highly correlated with the physical distance. The money cost of driving, and in some cases, of transit are also correlated with distance. It's very easy to produce models where at least some of these parameters have the wrong sign - where the best fit is instead of saying you don't like paying money and don't like spending time, instead a model where time overpredicts and is corrected by saying you like paying money. This may produce better results in the current data, but will be disastrously wrong if somebody proposes a new toll road, for example. Maybe bigger data will improve that, although I'm not sold yet.

If anyone out there is interested in the Mississippi River Basin model, you may also be interested in the Bay Area Model, which is still operating and is a pleasant day trip if you're within day trip range of Sausalito.
posted by Homeboy Trouble at 8:55 AM on April 19, 2017 [7 favorites]


Observing correlations is the first step before the long hard job of figuring out causality. Neural nets just allow you to shrug and absolve yourself of responsibility.
posted by benzenedream at 9:00 AM on April 19, 2017 [1 favorite]


Hasn't science worked with completely incorrect models during most of history, anyways?

No. Science works with models that are not completely correct, which is not the same thing as being "completely incorrect".

For example, classical (e.g. Newtonian) mechanics is still the state of the art regarding the dynamics of macroscopic hunks of matter. It's failings only become apparent if one is modeling phenomena at very small scales, or near very intense gravitational fields, or moving close to the speed of light.

Good scientific theories, when they fail, fail at the margins in a predictable way.
posted by mondo dentro at 10:15 AM on April 19, 2017 [4 favorites]


I'd like to expand on sammyo's comment, which is bang-on. The article repeatedly commits to something like the following quote:

humans simply cannot comprehend the model the computer has built for itself.

With no justification whatsoever besides the empirical observation that we don't "understand" these models yet. This is deeply unfortunate, because there is actually a concerted effort to put together (quoting from sammyo):

theorems and lemmas and reductions that transform some currently hard hard stuff into a new graspable patterns.

What could such theorems and lemmas look like? A neural network is just a large mathematical object with a bunch of parameters. We have plenty of tools in mathematics for pulling apart complicated objects to reveal simpler structure hidden inside. For example, in Boolean Circuit Complexity, we can already take some kinds of simple neural networks of constant depth and "decompose" them into simpler objects, specifically low-degree polynomials. See these lecture notes for a technical explanation. Low-degree polynomials are (comparatively) well-understood mathematically, and can be thought of as depth-2 circuits. So, we already have results that take weak neural networks of whatever constant depth you like, and "crush" them down to depth-2. Intensive research is going into generalizing these techniques to decompose the neural networks used in practice, and honestly I don't think we're too far off.

I'll continue to expand on sammyo's comment, about another issue that the article seems to ignore:

Also all the "big data" models seem to have intense human "tuning" of parameters

It's a lot worse than that. All the "big data" models have undergone intense human "tuning" of the structure of the network. Clever humans have to tailor the network to the task it is being trained for, discarding enormous classes of designs if they don't perform well. So, there is a deeply non-trivial human element in the loop of these new machine learning algorithms, to the point where I think it is appropriate to consider the neural-network methodology as an algorithm that, currently, requires a human to come up with candidate network structures in an adaptive way. It is a major open problem to develop an algorithm that successfully searches all reasonable network structures.

So, it isn't as if our impressive success and "alien understanding" even comes from a procedure that can discover neural network structures for itself. Additionally, the fact that we're achieving such predictive success with constrained network structures arrived at by painful manual search and testing gives us additional leverage in finding "decompositions" of these structures; we don't even need to decompose every possible network structure, just "the ones that we've found that do OK."

What I liked about the article was discussion of the following question: "how should we treat the 'knowledge' that comes out of 'black-box' predictors?". This question has also seen a great deal of study in theoretical computer science, mostly focused on attempts to verify such knowledge to weaker computational entities. One of the ways we could cope is to force real-world neural networks to use an interactive proof system, so that even if we can't decompose them (yet), we can force them to "explain" themselves in some sense.
posted by kitten_hat at 10:51 AM on April 19, 2017 [5 favorites]


This is all a little bit woo.

For example, the article suggests that AlphaGo is a relatively straightforward application of deep learning principles to the game of go. The name of the Nature article "Mastering the game of Go with deep neural networks and tree search" about AlphaGo should tip one off that there is at least one other thing that went into building it.
posted by ethansr at 10:58 AM on April 19, 2017 [2 favorites]


kitten_hat: "What I liked about the article was discussion of the following question: "how should we treat the 'knowledge' that comes out of 'black-box' predictors?"."

That was the whole point of the article, which you just spent your entire comment refuting. How does this question have any interest if you are so incredulous about the idea that these things can be black boxes?
posted by TypographicalError at 11:30 AM on April 19, 2017


TypographicalError: Because we might eventually construct predictors that are really black-box (ie, maybe we can prove that no decomposition is possible for some class of predictors), or we might want something useful to do in the interim while we don't yet have appropriate decompositions for deep nets.

Basically, I think the article has a kind of unfortunate dichotomy that ignores work such as feature visualization being done on deep nets currently. It sets up current nets as black-boxes, where really they are varying shades of grey, and I expect them to become grey-er in the future. Here's a quote from the article:

If knowledge includes the justification of our beliefs, then knowledge cannot be a class of mental content, because the justification now consists of models that exist in machines, models that human mentality cannot comprehend.

What if I have models that explain how the models work, or even some insight into features via empirical procedures? So, I think the epistemological claims made by the article are at best dubious. Comprehension isn't a dichotomy like this.

On the other hand, I really do like the discussion of foreswearing knowledge, because it addresses the important question of how we should behave as these models evolve, bracketing the epistemology. Essentially, we need to come up with a set criteria that relate our current best understanding of any given model (how grey is the box?) to how appropriate it is to use in a variety of situations (credit scoring, crime, medicine). We need to develop a nuanced picture of justification vs. predictive power, and what we're willing to accept in difference situations.
posted by kitten_hat at 11:55 AM on April 19, 2017


What if I have models that explain how the models work, or even some insight into features via empirical procedures? So, I think the epistemological claims made by the article are at best dubious. Comprehension isn't a dichotomy like this.

I am curious about what you mean by this, and would wish to understand better. When you talk about "decompsing" networks and crushing them into depth-2, what does that mean? Does it mean being able to predict how a visual neural net will classify a given image? To determine what traits or characteristics of the image it is using to classify it? To determine how it weights a given trait or characteristic in classifying an object? Can these things be understood in this way, since all the decision making being made by the net is via a sort of weighted voting system?

In other words, are you talking about a procedure in which you can correctly predict, "the neural net will score this image at +0.53" or are you talking about a procedure in which you can correctly predict, "the neural net will fail to recognize Asian faces in group photos because its training set did not include humans with Epicanthic folds"?

From my limited understanding, it seems that problems of the latter type are the potential concern. Like the dumbbell problem --- nobody noticed that just about every photo of a dumbbell in the training set also included an arm, and so the net was trained to only recognize dumbbells in the presence of arms. Such a flaw can easily be discovered by a human, glancing at the generated images. Something like a neural net that only issues credit card offers to households with incomes over $70K in majority white zip codes, though, I don't know if that kind of flaw is something that can easily be diagnosed by a human from looking at a reverse-engineered sample set.

There may be some business incentive for, say, google, to seek and uncover flaws in a neural net powered image search feature. There may be far less incentive for a credit card company to delve into like flaws in a credit rating app. The law may bar conscious and deliberate discrimination, it has far less to say about unconscious and unintentional discrimination. They will not want to open the black box for fear of increasing their liability and breaking their shiny new toys. A method of algorithmically analyzing neural net output may improve the understanding of scientists, but I wonder if it would lessen any of the dangers the applications of such technologies post unless such analysis results in something you can explain in words, not in math, to show cause and effect....
posted by Diablevert at 1:03 PM on April 19, 2017 [3 favorites]


I am perhaps tipping my hand as an experimentalist here rather than a theoretician, but I think the reliance on overarching theories is overstated. We have theories which appear to correct predict outcomes within certain domains virtually 100% of the time (e.g. classical Newtonian mechanics), but there are gaps between the domains of various rigorous theoretical models, particularly when it comes to explaining the discontinuities between them. But that's not really anything new.

Even in what we call 'prescientific' eras, people developed predictive models based on empirical data. They don't qualify as rigorous theories today, but some of them were not unreasonable given the experimental data available at the time. E.g. aspects of Aristotelianism, or the "corpuscular" conceptualization of light by Descartes et al. But those theories weren't overarching, and in order to get anything close to being an overarching / unified theory, they had to move pretty far over the line we now draw between science and philosophy/religion and into experimentally unverifiable territory.

Manufacturing predictive models based on experimental data is what people do all the time (heck, I strongly suspect my dog does it too), but having that theory mesh with other theories in a way that is traceable all the way to first principles, particularly with coherent mathematical models and experimental data along the way, is the exception rather than the rule.

Hell, it's only very recently that the scientific community has been able to bridge the gap between the models used by chemists and physicists. Chemistry ticked along just fine throughout the 19th and into the early 20th century with a model of chemical bonding that worked fine, but was full of seemingly-arbitrary values derived empirically with little in the way of theoretical backing. And while it's great that we have quantum chemistry now, if you're a rubber chemist or something, it probably doesn't matter. The 1789 Higgins model of valance bonds works fine at the "everyday chemistry" scale, much like classical mechanics being fine for an artilleryman.

The stuff we're likely to get out of new datamining / big data analysis techniques (if it doesn't just give us a lot of garbage and false leads, which I think is likely as well) wouldn't really be any different than the Higgins model, in that it would construct isolated puzzle pieces that could be very useful on their own. Fitting those puzzle pieces into the big picture, and getting them to interlock with other theories at the edges, will be a challenge, but it's similar to how science has moved forward for most of history.
posted by Kadin2048 at 1:12 PM on April 19, 2017 [1 favorite]


Diablevert: When you talk about "decompsing" networks and crushing them into depth-2, what does that mean? Does it mean being able to predict how a visual neural net will classify a given image? To determine what traits or characteristics of the image it is using to classify it? To determine how it weights a given trait or characteristic in classifying an object?

This is exactly the hope: that decomposition of neural nets into simpler mathematical objects will allow us to answer these kinds of interpretive questions about the models in a fundamental and principled way. A decomposition would not immediately give ways to answer such questions, but it would give us a place to start. To take your example:

"the neural net will fail to recognize asian faces in group photos because its training set did not include humans with Epicanthic folds"?

We have a hope of identifying this sort of pathology if asian faces are well enough represented in the test data. When testing simple computational models that are related to neural nets, there are ways to "zoom in" on families of examples that the model does poorly on (see: boosting algorithms). If we have boosting algorithms to "zoom in" on errors and also a decomposition that allows us to understand why the model is giving a positive or negative result, these could be combined to describe a "missing feature" in the model.

If the class of examples that the model will do poorly on is not well represented in the test data, then we can still hope to couple a decomposition with adversarial techniques to generate examples that our network does poorly on, and reasons why it fails on those examples. But there's no guarantee that these synthetic examples will reflect real features of real human populations!

So coming up with powerful definitions of what it means for a dataset to be representative of different human sub-populations in a fair way is incredibly important. There's a good conference on this kind of thing. I think many of the talks listed there also have video of the presentations.
posted by kitten_hat at 1:27 PM on April 19, 2017


I read the description, thought "Huh, this sounds similar a crappy Chris Andersen article in Wired from ages ago." Clicking on it that seems to have been not-a-bad-first-reaction.

I admit after the first few paragraphs I only skimmed*. So I'm not saying there aren't interesting questions. But there's one thing I think is a fundamental error for one class of problem: Getting a computer to do something that already has a known solution has smaller impact than other types of machine learning.

If I feed a computer a bunch of images and tell it "these are the number 8" (an example in TFA) or give Google millions and millions of words' worth of translated texts (from the Wired article IIRC) it is successful because I can already solve the problem. There's a huge class of clear "right" answers. The computer has gotten up to human level of competence (maybe) but it hasn't discovered anything new. It's now a productivity tool. But the main thing you've "learned" beyond that is something about practical applications of machine learning algorithms. You haven't really discovered anything.

I do see the implications of this rather a lot creep into proposals, where not discovering something is used as evidence you can discover something. People are impressed by something like Watson on Jeopardy (lots and lots of clearly defined questions and answers pre-existed) and want it to work on cutting edge biomedical literature (lots of messy inputs, few clear questions and fewer answers.)

*This decision really ties into the "hate reading" thread from yesterday for me.
posted by mark k at 7:59 PM on April 19, 2017


« Older ☀ + ☃ = ♨   |   Synthesizers and Dolls Newer »


This thread has been archived and is closed to new comments