the most important part of learning is actually forgetting
October 7, 2017 12:34 AM   Subscribe

New Theory Cracks Open the Black Box of Deep Learning - "A new idea called the 'information bottleneck' is helping to explain the puzzling success of today's artificial-intelligence algorithms — and might also explain how human brains learn."
posted by kliuless (52 comments total) 56 users marked this as a favorite
first posting to say that this whole concept of being able to take hypothesis back and forth between silicon and dura matter is one that needs to stop before it causes significant challenges to both the researcher side as well as the regular old human side.

tl;dr - discomfited by assumptions that what happens inside a computer can help you understand the human brain
posted by infini at 12:35 AM on October 7, 2017 [3 favorites]

I've had colleagues who complained about the human vs computer comparisons, but I was interested in cognitive psychology and felt their dismissals weren't justified other than a sense that they found it offensive, reductive, etc. But I think there's a formal background that informs this sort of research and it is a legitimate field of scientific inquiry, it just attracts a lot of people who make various lay mistakes (both for and against I.e. overstating the ideas or being uninterested/defensive) due to lack of familiarity with modern psychology and theoretical computer science; it sucks as a kind of subtly privileged, classist intellectual gatekeeping but those are prerequisites to these sorts of questions.
posted by polymodus at 12:47 AM on October 7, 2017 [5 favorites]

Why Neuroscience Is the Key to Innovation in AI
posted by kliuless at 12:56 AM on October 7, 2017

As a lay person lucky to know people deeply embedded in this field, I don’t think it’s possible to overstate how quickly breakthrough discoveries are occurring.
posted by andrewdoull at 3:17 AM on October 7, 2017 [4 favorites]

Whoa, that presence may have something to do with absence. Whodathought. I'll try to keep this in mind
posted by rudster at 3:53 AM on October 7, 2017

As I understand it, deep learning and artificial intelligence are very different things.
posted by nofundy at 5:11 AM on October 7, 2017 [1 favorite]

Watson may disagree.
posted by sammyo at 6:02 AM on October 7, 2017 [1 favorite]

So, if the two-phase learning model is correct, optimizations around this knowledge could make neural networks much less CPU intensive.

Sounds great, carry on.
posted by MikeWarot at 6:15 AM on October 7, 2017 [2 favorites]

This sounds a lot like the general pattern of recursive algorithms, where you take a data structure, pull out the parts that are important, and pass those parts to the next call.
posted by ikea_femme at 6:55 AM on October 7, 2017

Of course, there a human writes how to determine what those parts are!
posted by ikea_femme at 6:56 AM on October 7, 2017

this whole concept of being able to take hypothesis back and forth between silicon and dura matter is one that needs to stop before it causes significant challenges to both the researcher side as well as the regular old human side.

As someone with actual degrees in both theoretical CS and cognitive neuroscience, this is unfairly reductive. There's important work being done in the overlap between how computers process information (especially via deep learning and AI) and how humans process information. There's also an ocean of CS that people without any background in neuroscience or psychology take, hoist over their heads, and shout "BEHOLD, I HAVE SOLVED BRAINS". The latter is, obviously, bullshit, but claiming that the whole concept needs to stop is throwing the baby out with the bath water.

A good rule of thumb is that if a theory is being put forward by someone with an academic background in both fields, or a cross-functional team of researchers across disciplines, it's probably worth at least listening to. As TFA points out pretty early on, the researcher presenting this theory is both a neuroscientist and a computer scientist, and so presumably is going in to this with the right kind of background to avoid it being a shitshow.
posted by Itaxpica at 7:46 AM on October 7, 2017 [16 favorites]

The heart is not a pump, it's a magical feat of musculature and homeostatic control and feedback mechanisms that propels a fluid through a vastly complex vascular network.

But at some level, it's a pump.

Brains retain and process data, appear to have some built in capacities and some that develop in response to stimuli, memory works (with varying degrees of modification and falsifiability), inference can be taught and generalized... At some point yes the brain has been an arrow, a crossbow, a steam engine, and a dozen other inapt metaphors.

But this isn't that, and claiming there's nothing to cross between the disciplines seems almost willfully short sighted.
posted by abulafa at 8:14 AM on October 7, 2017 [6 favorites]

One facet of the explosion of AI is the huge hardware speed/memory/network interconnections and the re-purposing of GPU's which are specialized and only become economical via the economies of scale due to the gaming community. Many of the ideas of "Big Data" "ML" and neural nets are decades old but were impractical even with huge military research budgets. The world's computation rate of change is increasing. Moore's law is only for the standard cpu chip, GPU "transistors" is way past doubling yearly. Intel is shipping an AI thumb drive (which may be partially analog, another possible optimization). The google "TPU" pushes the GPU component count/functionality over another threshold. And data, Terabyte was a huge deal, Exabytes are available at AWS, just click. (exabyte is a million TB)

A lot of the "advancements" are "tweaks" to well known math that are found to work for certain data. On approach to solving a "big data" problem is to throw a bunch of different models at the problem, one works, yeah your a ML genius. Just a few years ago a single model was too expensive let alone trying dozens. Not to minimize the very smart scientists and mathematicians doing amazing work, there are just tools fairly recently available that make it practical to test some ideas.

My other point is it took gaming to provide economies of scale for some of the chip tooling, a new chip design is insanely expensive. The next one coming up is the self driving cars, that will put a demand for ML style and price pressures that are just hard to imagine, will phones have TB or PB of local memory in a few years?
posted by sammyo at 8:57 AM on October 7, 2017 [5 favorites]

That's a great point: a lot of people outside the field don't realize that the science behind deep learning dates back to the 80s, but it was pretty quickly written off as "a cool theory but way too resource-intensive to ever be useful for anything real", and the field moved on to other things. It's just over the last few years that the kind of resources considered comically, unrealistically massive thirty years ago have becomes entirely commonplace, and the deep learning boom is a direct result of that.
posted by Itaxpica at 9:39 AM on October 7, 2017 [3 favorites]

A question: What is the difference between this and the commonly-used exercises of dimensionality reduction and feature extraction? Aren't those the same processes of figuring out what's important and throwing the rest away? What makes this new?
posted by clawsoon at 9:41 AM on October 7, 2017

(Obviously, if Geoffrey Hinton says it's an original idea in machine learning, then it's an original idea. I'm curious to know what makes it original.)
posted by clawsoon at 9:43 AM on October 7, 2017

Thanks for the links! Planning to watch the vid and read the paper this weekend.

I've been thinking a lot about the importance of the internal representations, and what we can do to help them along. The perspective for a very long time was to ignore the internal rep, and let back prop figure it out: batch normalization shoes that this is fundamentally wrong. If we can help the process along, all for the best. Variational autoencoders are also super interesting from this perspective, explicitly encouraging the net to learn a rich internal representations with an approximately normal distribution. (Unfortunately, training multiple variational layers seems to be hard...)

All of which is to say, I'll be very excited to learn ways to train up better internal representations.
posted by kaibutsu at 10:09 AM on October 7, 2017

Not only learning but forgetting is basically the only way we have a natural sense of time. It's easier to remember what happened yesterday vs. last Friday vs. 17 Fridays ago, and it's easy to see how our memories "bottleneck away" the boring stuff.
posted by lubujackson at 10:24 AM on October 7, 2017 [1 favorite]

I wonder at what stage the human brain performs the filtering? Do we start with all the information and then filter?

How long do Ai s take to train?
posted by 92_elements at 10:34 AM on October 7, 2017

How long do Ai s take to train?

It depends on so many factors - the task in question, how powerful your computers are, how well-modeled the problem space is, etc - that that's kind of like asking "how long does it take a human to learn something". On one end of the scale, I can train a basic AI system for, say, numerical data pattern recognition in a few minutes or hours on consumer hardware. On the other end of the spectrum, AlphaGo apparently takes four to six weeks to train, though that could maybe be sped up.

Note also that this is just running the actual training step assuming that building your ML system is already done, which can be a massive undertaking on to itself.
posted by Itaxpica at 12:31 PM on October 7, 2017 [1 favorite]

The teams could be from a dozen related and important disciplines. Until I start to hear about considering the human brain as part of a real person, and addressing the significant yet intangible, subjective, intuitive, emotional, and visceral part of how inputs get digested in order to emerge as output/s (a broader attempt at framing the debated term "processing") - I'll watch and wait for these rational, logical, evidence driven outcomes to manifest. Data in and synthesis of an approach to analysis out strips out the multidimensionality of human intelligence.
posted by infini at 12:55 PM on October 7, 2017 [1 favorite]

it sucks as a kind of subtly privileged, classist intellectual gatekeeping but those are prerequisites to these sorts of questions.

posted by infini at 1:03 PM on October 7, 2017

Ok, so you're willing to write off... basically every single branch of cognitive, behavioral, and strucural neuroscience, including neurology, for not approaching their fields of study the way you want them to? Cool.
posted by Itaxpica at 1:08 PM on October 7, 2017 [1 favorite]

(To say nothing of the huge amount of work on neuroscience that does, in fact, "consider the human brain as part of a real person" - which is to say basically the entire field of modern cognitive neuroscience, which is essentially built on that)
posted by Itaxpica at 1:11 PM on October 7, 2017 [2 favorites]

I watched the video about backpropagation and thought, "This doesn't seem biologically plausible." I Googled, and sure enough, backpropagation isn't biologically plausible. Like you said upthread, Itaxpica, it's premature to say that we've SOLVED BRAINS; we haven't even modeled a single chain of neurons very well. Still interesting research, though.
posted by clawsoon at 1:39 PM on October 7, 2017

It's worth noting that huge algorithmic breakthroughs keep coming down the pipe which cut training time dramatically. Batch normalization often speeds up training by 10x, and was only discovered in 2015.
posted by kaibutsu at 3:12 PM on October 7, 2017

So I'm probably going to be seeing a guy tonight whose current job for Big F***ing Deal Search Engine outfit is " ... to map the human brain."

Anything I should ask him?
posted by philip-random at 4:09 PM on October 7, 2017

he's not a neuroscientist
posted by philip-random at 4:10 PM on October 7, 2017

You know, I don't think that the idea of AI developments mapping backward to the human brain is so invalid. As AI learning is based on the model of learning we understand from our own minds, then it would make perfect sense that any developments would be applicable to human minds, since the AI learning is based on human learning.

Now, when we have AIs writing AIs, not so much.
posted by Samizdata at 4:16 PM on October 7, 2017

Anything I should ask him?

Last I heard (which was quite some time ago), we couldn't even predict the output of a single biological neuron with complex dendritic structure, in vitro. Is that still the case, and if so how does he expect mapping the human brain to be useful?
posted by Coventry at 4:33 PM on October 7, 2017

My understanding of the neurology of memory, a lay understanding to be sure, is that most memories, or the pathways that represent memories to be more exact, fade if there is no access of the memory. Unless the memory has reason to be maintained, like it is associated with a particularly strong physiological response, see PTSD or Proust's, In Search of Lost Time. Basically the more you recall something the stronger the memory becomes. If a memory is not accessed it fades over time. Backpropogation seems like a way of simulating memory decay in a digital system. So really an analogous process. Of course you rewrite the memory every time you access it so that if a layer of complication that computer neural networks seem to be avoiding so far.
posted by Ignorantsavage at 4:34 PM on October 7, 2017

Do ANNs even have memories? That's something I've never heard of. General concept learning, yes, but specific recollection of incidents or items?
posted by clawsoon at 4:43 PM on October 7, 2017

Some architectures have memories. My favorite is the Lie-access Neural Turing Machine.
posted by Coventry at 4:49 PM on October 7, 2017 [1 favorite]

philip_random: "What would you do if you learned the main application for your work was military?"
posted by rhizome at 5:30 PM on October 7, 2017 [2 favorites]

So this post and the subsequent discussion covers a lot of ground that more or less constitutes my exact area of research. I'd like to chime in with my opinion on a couple things.

i) What was Tishby's talk actually about? I only watched it once, and am not deeply familiar with the information bottle neck method, but essentially what Tishby has shown is how to analyze what's going on when you train a neural network. It's not a new algorithm for training the networks, nor is a method of it's own for machine learning.

This is actually still really important. Optimization is essentially about trying to minimize or maximize a number based on some parameters which can change the number. In deep learning, the parameters are the weights of a neural network, and the the number is usually something along the lines of "the average probability that the model gives to the data". If a model is trained on some data, then it should change its parameters to assign that data a higher probability. This is sometimes called "explaining" data.

Even though we can mathematically define the numbers that we want to optimize, we can't necessarily calculate them. For the complex models that are studied by deep learning, this is more or less always the case. So deep learning is in the funny position that it produces techniques to train these extremely complicated models, but it's really hard to rigorously state that the model is improving. It's a very experimental field for this reason, and requires a lot of intuitions about how to train a network and how to tell if it's doing anything.

So clear ways of interpreting how well a network explains data is extremely important. In Tishby's case, he considers multilayer neural networks which learn input-output relations (e.g. the input is a hand drawn digit, and the output is the digit category), and he considers two quantities. They are how much information each layer of the network has about the input, and how much information each layer has about the output. What he finds is quite interesting.

Early on during training, the layers of the network learn to accumulate a lot of information about the input, and the output of the network better and better matches the target output until it saturates. Since the network is only trained to compute the output, you might think that it would stop learning once this happens. Instead what happens is that the earlier layers of the network start to lose information about the input, and lose more and more information as the network is further trained. Apparently, the successive layers of the network learn to throw out as much information as they can, until the output contains only information about the correct output, and not the input. This is why he calls this compression.

ii) As for the interplay between deep learning and neuroscience, it's complicated. It's not really disputed that neuroscience has regularly inspired great advances in deep learning. Even this fact alone is remarkable, and I feel like it's often understated. The most famous figures in the field of deep learning have regularly taken very simple intuitions about biological neural networks, transferred them to artificial neural networks in the most obvious way, and then produced stunning results. This tells you there's something pretty fundamental to the way that the brain processes information, since even a cursory understanding of its functioning can result in revolutions in computer science.

So what about the other way? Often the assessment here is pretty negative. This is an oversimplification, but one can say that the purpose of theory in science is to predict what's going to happen next. To the best of my knowledge there hasn't really been a model from deep learning which has done this. That is, no one has taken one of these models, trained them on some neurological data, and then used that trained model to say something about data we haven't collected yet.

Some scientists take this as enough evidence to completely discount the value of deep learning in modelling the brain. There's two reasons why I think that's unfair. Firstly, machine learning in general has done a lot to explain the functioning of the brain, at least post-hoc. You can learn at what certain neural circuits in the brain are doing and say, "hey that's like this algorithm", and "hey that's learning these kinds of features about the data". These claims don't make new predictions, but they do help us organize the vast amounts of information we have collected.

Secondly, it's only really now that our microscopes and experimental methods are getting sophisticated enough to validate these models. After all, to say whether an artificial neural network models real neurons in the brain, we have to record the activity of these neurons. Lots of these neurons. In real time. While they're computing things. Without killing the animal. So it's a tricky business, but it's starting to happen now, and it's an exciting place to be.
posted by Alex404 at 7:15 PM on October 7, 2017 [19 favorites]

Great explanations, Alex404, thanks!
posted by clawsoon at 9:01 PM on October 7, 2017

Apparently, the successive layers of the network learn to throw out as much information as they can, until the output contains only information about the correct output, and not the input. This is why he calls this compression.

More precisely, it constructs a representation of information which is relevant to inference about the output. It probably happens because of the stochasticity in stochastic gradient descent: Each time the network is trained on a different set of examples, and the more parsimonious the internal representation is, the less opportunity there is for the network to learn spurious associations between the example inputs and outputs. This paper gave a new perspective on a still somewhat surprising and crucial aspect of deep learning: Despite being essentially massively over-parameterized statistical models, they can often be trained in such a way that they don't overlearn (i.e., fixate on intricate spurious correlations in the training data between the inputs and outputs.)

In a sense, the batch training of stochastic gradient descent acts as a different kind of information bottleneck, by only allowing the net to see a few examples at a time and forcing the net to fixate on the relevant correlations. Other perspectives on SGD as a regularization method were around before this, though, such as described in Bayesian Learning via Stochastic Gradient Langevin Dynamics.
posted by Coventry at 9:10 PM on October 7, 2017 [3 favorites]

philip_random: "What would you do if you learned the main application for your work was military?"

that was actually one of his previous gigs. As he put it, working for the US military was one of the best gigs he ever had -- total professionalism and diligence all down the line ... until he started to notice that some of the stuff he'd been working on in an abstract way started getting applied to stuff like targeting. So he quit.

I guess the Big-Deal-Search-Engine stuff is best thought of as post-military.
posted by philip-random at 1:29 AM on October 8, 2017 [1 favorite]

he's not a neuroscientist
posted by philip-random

but his son is ...
posted by philip-random at 1:36 AM on October 8, 2017

philip-random: At BigNameSearch, does his work simply focus on "the brain" or are there attributes already in place for demographic segmentation, such as conventionally applied to marketing, consumer advertising et cetera, eg. age groups, ethnic backgrounds, gender?
posted by infini at 5:16 AM on October 8, 2017

philip_random: "What would you do if you learned the main application for your work was military?"

This actually cause me grief coming up in the mid 80's. Ended up changing majors because there were plenty of jobs, sure, but few that didn't involve designing shit I would prefer not to.
posted by mikelieman at 7:07 AM on October 8, 2017 [2 favorites]

does his work simply focus on "the brain" or

Yes, but he's not naive about who's paying him. As with the military, the stuff will get applied in ways that ... well, I wouldn't say he can't imagine, because he's a pretty damned smart guy ... but which aren't immediately apparent.

One interesting thing he said to me was, "1984 was a long, long time ago." And then he elaborated that, even with all the tech available to make for the most horrific of Orwellian dystopias (and it really is all here), he saw other, bigger, more likely stuff to be concerned about.

"Like what?" I said.

"Have you noticed who's in the White House?" he said. And then went on to suggest that it was a nation of way too-many-ill-educated people who put him there, not technology evolving at an astronomical rate. Which will, of course, continue to happen. Because that's been the story of his whole life. He built his own computer from scratch at age eleven in 1970 or thereabouts, and it's been a slowly but surely ever accelerating rocket ride ever since.

tldr: we ain't seen nothin' yet.
posted by philip-random at 10:34 AM on October 8, 2017 [1 favorite]

The Third Party Doctrine means any inferences about actual people can be used as government surveillance at any point.
posted by rhizome at 11:32 AM on October 8, 2017

tldr: we ain't seen nothin' yet

I know. That's why I've been such a pain in the ass in this thread. There's questionable disregard for ethics in user testing in most of the labs, is what I've been hearing from the boffins on campus.
posted by infini at 12:11 PM on October 8, 2017 [1 favorite]

Whoa, that presence may have something to do with absence. Whodathought. I'll try to keep this in mind

Think of anything except a neural net.
posted by sjswitzer at 2:10 PM on October 8, 2017

In medicine and research for the longest time the effects of drugs, treatments, therapies, states of illness, research, all centered around the wellness of the 60 year old white guy. This may seem oblique, but I copied a paragraph from the article, that reminded me of my Dad, who was a criminal investigator. One of his favorite sayings was don't try to confuse me with the facts." He was describing an attitude he encountered in fifties and sixties law enforcement. So anyway.

"Then learning switches to the compression phase. The network starts to shed information about the input data, keeping track of only the strongest features — those correlations that are most relevant to the output label."

I am reading this and it sounds like artificially created, in the box thinking. I wonder if consciousness, intellect, creativity are still measured as the way the male mind operates. If they want deep learning, to function just like the minds that made the world as it is, are we not doomed?
posted by Oyéah at 2:42 PM on October 8, 2017 [1 favorite]

It seems like you're saying goal-oriented attentional processes a masculine trait. If so, why do you believe that?

In any case, consciousness, intellect, creativity... these systems have none of those characteristics. The nets in the paper the lead article's based on are being trained on a simple classification task.
posted by Coventry at 6:47 PM on October 8, 2017 [1 favorite]

goal-oriented attentional processes with winnowing of what is deemed external considerations is a process that serves those, serving narrow focus, goals. So when designing machine tasks this is reasonable. When trying to design AI to interface with humans then this narrow, goal oriented, process is how we create obdurate systems.

Goal oriented attentional processes are a short term, problem solving phase of human endeavors. They often substitute for the whole ball game.
posted by Oyéah at 2:52 PM on October 9, 2017 [1 favorite]

Yeah, don't believe the hype. Major breakthroughs have been achieved in automated reasoning, but we're a long, long way from human-level intellect. Maybe there are systems which are maybe as smart as a puppy, and when you can industrialize their inferences as with Alpha Go, you can get super-human performance in very limited domains. Also, 18 months ago I said "maybe as smart as a lizard..." so things are coming along fast.
posted by Coventry at 8:39 PM on October 9, 2017

« Older "None of it was cool."   |   "As a young footballer, everybody is selling the... Newer »

This thread has been archived and is closed to new comments