DeepMind "solves" protein folding
November 30, 2020 8:55 AM   Subscribe

 
My love affair with DeepMind (many previous posts on MeFi) continues.

One side effect of this breakthrough is likely that foldit is dead in the water.

As always with this kind of thing, "wait for the paper" applies. Having said that, DeepMind's record on this stuff a la AlphaGo is pretty solid. They don't seem in the habit of overhyping early results for press releases.
posted by lazaruslong at 9:11 AM on November 30, 2020 [2 favorites]


Holy fuck, it's crazy how much better AlphaFold is than its competition. I had no idea single-domain protein structure prediction was anywhere near to having a practical solution.
posted by RichardP at 9:14 AM on November 30, 2020 [4 favorites]


As always with this kind of thing, "wait for the paper" applies.

This is a little bit future-tech, jet-packs for all my friends! level news. I mean, this doesn't sound like ...in five years we'll have fusion reactors powering our toaster-ovens.

If this is actually functional this is a follow-up to CRISPR : or puts it in a new category of usefulness, doesn't it?
posted by From Bklyn at 9:15 AM on November 30, 2020


What a beautiful contribution to humanity from Google!

Sorry, apparently this is made by Alphabet whoever that is
posted by East Manitoba Regional Junior Kabaddi Champion '94 at 9:19 AM on November 30, 2020 [4 favorites]


And for your next Nobel Prize, go the other direction: given a target structure (e.g. a receptor), predict the amino acid sequence that results in a protein that binds to that target. That would practically solve biologic drug design.
posted by jedicus at 9:24 AM on November 30, 2020 [14 favorites]


Protein folding has been a pretty heavy-duty problem. One of the PIs behind Foldit gave a talk a few years back, noting that the algorithms hit a wall (at that time) and no amount of raw computation power in our lifetimes would really help on getting an accurate answer within a usable timeframe, unless there were improvements there. Drug and catalyst designs could be helped immensely with a fast, accurate system.

As with our CASP13 AlphaFold system, we are preparing a paper on our system to submit to a peer-reviewed journal in due course.

That's going to be the real test, I think. Publication and reproducing results are going to be what meet the actual gold standard. And as much as Google relies on public datasets (publicly-funded datasets) to train their networks, they do not so far have a good track record on reproducibility, where machine learning intersects with academia:
Scientific progress depends on the ability of independent researchers to scrutinize the results of a research study, to reproduce the study’s main results using its materials, and to build on them in future studies.
Publication of insufficiently documented research does not meet the core requirements underlying scientific discovery2,3. Merely textual descriptions of deep-learning models can hide their high level of complexity. Nuances in the computer code may have marked effects on the training and evaluation of results4, potentially leading to unintended consequences5. Therefore, transparency in the form of the actual computer code used to train a model and arrive at its final set of parameters is essential for research reproducibility.
Validation from peer-reviewed journals will require that other people are able to run this.
posted by They sucked his brains out! at 9:26 AM on November 30, 2020 [11 favorites]


Along the dotted sequence?
posted by Thorzdad at 9:31 AM on November 30, 2020 [1 favorite]


I doubt foldit will become irrelevant. The problems thrown at the players will become more specialized, targeting proteins where the algorithms are still failing. Getting new heuristics from players is something that can be mined by the algorithms. Some humans have an instinctive ability to visualize proteins that can be used for creating algorithms.
posted by Xoc at 9:34 AM on November 30, 2020 [3 favorites]


I work in pharma, with computers, and I get e-mails from vendors trying to convince me to be really excited about AI advances, with the implication that I will be missing out on things if I don't pay them to guide me through the thicket.

So I got an article much like these about a year ago from a sales side guy, which included some professor making dismissive quotes about the dinosaurs in industry. He had accidentally included the internal e-mail chain though, so I saw that their technical guy had already succinctly made the point that, while impressive, protein folding is not really the limiting step in drug discovery.

And for your next Nobel Prize, go the other direction: given a target structure (e.g. a receptor), predict the amino acid sequence that results in a protein that binds to that target. That would practically solve biologic drug design.

The article says you can get as a good as cryo EM or x-ray. It's worth remembering that you can't predict anything about potency from that--the difference between nanomolar and micromolar is sub-angstrom and can't be deduced from looking at a structure. (It can sometimes be modeled when you have a structure, but you're still talking 10x or more uncertainties.)

I don't mean there is zero chance. There are certainly people who think this sort of thing can be applied to ligand/protein binding and do better than, or meaningfully improve, the QM approach.
posted by mark k at 9:38 AM on November 30, 2020 [15 favorites]


Apologies for the non-working link to the Nature article; here's a corrected one: https://doi.org/10.1038/s41586-019-1799-6
posted by They sucked his brains out! at 9:40 AM on November 30, 2020 [2 favorites]


A few decades ago you could get your PhD by solving one protein structure.

protein folding is not really the limiting step in drug discovery

...this is also extremely true.
posted by aramaic at 9:58 AM on November 30, 2020 [4 favorites]


This is very, very cool.

As mark k notes, static protein folding is not the limiting step because the difference between "lab curiosity" and "drug lead" let alone an actual drug is due to the dynamic behaviour of proteins. Still very cool though and maybe this method could be a promising new avenue for solving the latter? No idea as I don't know much about protein-drug interaction modelling but still cool.
posted by atrazine at 10:30 AM on November 30, 2020 [3 favorites]


I'm amused by the idea that the best extant approach to the protein folding problem involves evolving an algorithm that's probably about as analysis-resistant as the protein folding it simulates.

Correct me if I'm wrong, but isn't it the case that nobody, not even the people responsible for winding these things up and letting them rip, really has any idea how any of the Alpha* family actually works? I mean, we can clearly see what they do, but is there anything to be learned by inspecting the innards of a successfully trained Alpha* other than a huge and completely inscrutable table of mysterious numeric weightings?
posted by flabdablet at 10:55 AM on November 30, 2020 [4 favorites]


This is a stunning advance. Go to Inthepipeline for your analysis as usual.


And for your next Nobel Prize, go the other direction: given a target structure (e.g. a receptor), predict the amino acid sequence that results in a protein that binds to that target. That would practically solve biologic drug design.



Target binding is NOT the rate-limiting step in drug discovery. We're really good at finding binders; the vital work of binders that have no offtarget effects, the right ADMET, and years of in-vivo testing are usually much bigger problems and timesinks. The data sets for those are much noisier and will be much less tractable to "solution"* by deep learning compare to folding, IMHO.


Paraphrasing Derek Lowe author of inthepipeline, "if we could discover selective binders in hours rather than weeks, it's still the equivalent of saving 5 minutes off a taxi ride to the airport for a multi-leg transnational flight"

* deep learning will affect all of those topics and speed them up, but protein folding takes a pretty well graphed and well defined problem - orientation of atoms in space, whereas e.g. toxicity/off target effects: the literature and graph is so.much.bigger and noisier....
posted by lalochezia at 11:09 AM on November 30, 2020 [8 favorites]


DeepMind has also been working at throwing deep neural networks at quantum chemistry (very readable and interesting blog post).

None of this is really "AI"... it's more like, modern AI required getting really good at solving complicated optimization problems, and it turns out you can use these new (or old with new tweaks) techniques on optimization problems from physics and chemistry too.
posted by vogon_poet at 11:15 AM on November 30, 2020 [1 favorite]


Correct me if I'm wrong, but isn't it the case that nobody, not even the people responsible for winding these things up and letting them rip, really has any idea how any of the Alpha* family actually works?

I can't speak to this model specifically, but the notion that deep learning models are total black boxes hasn't been true for a few years now. There's still a lot to be done in the field of explainable AI, but there has been progress.
posted by jedicus at 11:19 AM on November 30, 2020 [8 favorites]


Oh! A relevant and really great post for folks interested in more explanation of what's going on here can be found here. I remember reading this back at the end of 2018 and feeling rather excited.
posted by lazaruslong at 11:42 AM on November 30, 2020 [1 favorite]


I'm seeing a lot of comments second guessing the importance of the results (especially on the orange website). It's worth keeping in mind that these kind of results are similar to turning a trip to the library into a google/wikipedia search: sure, all the same knowledge might be available, but shaving orders of magnitude off the latency can unlock a lot of questions that you might not have even bothered trying to answer before. I'm really excited to see where this leads!

---

My understanding is that the computational accuracies here are about equal to the accuracy of the experimental outcomes. It would be super interesting to know if the 'misses' are uncorrelated... If they're uncorrelated, you get even better results by following up with experiments, and if they're heavily correlated you've got a really interesting next problem to solve.
posted by kaibutsu at 11:52 AM on November 30, 2020 [4 favorites]


For some of these problems, pragmatically, it doesn't matter so much if a human cannot understand and follow exactly what "moves" were made by a black-box optimiser to arrive at a good solution, provided the solution itself can be externally verified without depending on that black-box, and the solution can be measured to be superior when compared to solutions produced by competing approaches.

You can have a similar philosophy for solving other combinatorial optimisation problems that pop up in real life. For a completely arbitrary example that isn't protein folding, consider mega-bakery supply chain optimisation:

> [Pasco Shikishima Corporation]'s supply chain involves 15 factories in Japan, each one with several production lines, and more than 100 distribution centers. Pasco's catalog contains more than 1,000 products. 900,000 orders have to be executed each day in Pasco's factories. For each order, Pasco has to decide where and when to produce it. Moreover, Pasco has to decide where to source raw materials and which routes to deliver distribution centers. The goal is to minimize production and distribution costs over several days of horizon, while respecting production and distribution capacities.

Does it matter exactly how they figure out which 900,000 orders to execute each day? If a human does it with chalk on a blackboard or if some software black box does it? Not really, provided they have a repeatable process for doing so, since they need to keep generating new plans. Does it matter how good the plan of 900,000 orders is? (e.g. how accurate it is at reflecting the constraints of the real situation and maximising profit or what-not). Very much so.
posted by are-coral-made at 11:56 AM on November 30, 2020 [1 favorite]


Does it matter exactly how they figure out which 900,000 orders to execute each day? If a human does it with chalk on a blackboard or if some software black box does it? Not really, provided they have a repeatable process for doing so, since they need to keep generating new plans.

IME the problem with AI is that you can't explain why it does what it does, without resorting to math talk that pointy haired bosses like even less than hand waving about how it's 90% accurate (but 10% of the time makes bizarro world errors), but don't ask me why. The problem isn't measuring accuracy, it's selling other people on employing AI. As jedicus points out there are efforts being made in Explainable AI but IMHO they're too nascent. "I built another equally-unexplainable AI to help me explain the first one" isn't really a great answer. Certainly the type of people that make non-academic, real-world decisions in many arenas aren't going to accept that.
posted by axiom at 12:26 PM on November 30, 2020 [2 favorites]


the notion that deep learning models are total black boxes hasn't been true for a few years now

That article doesn't really bear that out, it's more like "yo dawg I heard you like opaque black boxes". Apparently they're using neural networks to generate plausible explanations about what other neural networks are doing?
posted by echo target at 12:26 PM on November 30, 2020 [2 favorites]


the notion that deep learning models are total black boxes hasn't been true for a few years now.

Still pretty fucking dark charcoal grey though.
posted by flabdablet at 12:29 PM on November 30, 2020 [7 favorites]


What's the computational complexity class of protein folding problem? Is it even known? Google says NP-hard but that could mean NP-complete or much more. Wouldn't it be some kind of quantum complexity class?
posted by polymodus at 12:30 PM on November 30, 2020


To be fair,

using neural networks to generate plausible explanations about what other neural networks are doing

does sound like a pretty plausible explanation for an awful lot of human "reasoning".

That said, I'm not at all sure that I want to be driven along a highway three metres behind another car at 100km/h by something deliberately engineered to implement Artificial Cognitive Distortions.
posted by flabdablet at 12:31 PM on November 30, 2020


And new drugs, and new drugs, and new drugs for problems caused by the problems we have created, and new drugs.
posted by Oyéah at 1:06 PM on November 30, 2020



And new drugs, and new drugs, and new drugs for problems caused by the problems we have created, and new drugs.


to be sure we've created some problems. and the usual disclaimers about capitalism, colonialism and power pertain.

but

to characterize drug discovery as only responding to the modern condition is to dismiss the unnecessary suffering and early death of billions of people over the last few thousand years....

fundamentally, bodies develop disease, and NO value system or culture has addressed disease as well as modern science. sorry. but it's true.

- modern medicine - of which drug discovery is a part - has helped more people than any. social. movement. you could name. for most of those people there was NO other way to help them.

we have words for people who want to stop people minimizing suffering.
posted by lalochezia at 1:49 PM on November 30, 2020 [35 favorites]


Does it matter how good the plan of 900,000 orders is?

If there's a better plan, yes. As a network gets increasingly complex, a loss function can be rife with local minima (some visually extreme examples), so you are certainly not guaranteed a globally optimal solution.

That's one reason people want to understand how the larger network is built: where it fails can lead to catastrophic outcomes in edge cases a network cannot handle (e.g., self-driving cars, AI medicine or guided cancer diagnoses, etc.).

Delivering baked goods is not a life or death matter, granted, but for most companies, finding efficiencies is all about getting from one local minima to a deeper and more profitable one, whether it is by using neural networks or some other process/model.
posted by They sucked his brains out! at 2:41 PM on November 30, 2020


There's a interesting question I got from reading Hacker News (adding grain of salt), of what this means for academia vs Google, is there something wrong about how academic research (in protein folding or in general) is set up that this actually makes a case for big bad corporations to be making the breakthroughs now. Will Google get the Nobel next year? Should academics do some soul searching, between chess and now molecular biology? I'd like to see a journalist or expert write about this social aspect of doing science.
posted by polymodus at 4:14 PM on November 30, 2020 [1 favorite]


is there anything to be learned by inspecting the innards of a successfully trained Alpha* other than a huge and completely inscrutable table of mysterious numeric weightings?

Probably not, but maybe it's a sufficiently complex problem that there's no aesthetically pleasing solution. Just like there's no music of the spheres, if you want to predict the planets' motion it's just a bunch of boring numerical integration.
posted by RobotVoodooPower at 6:06 PM on November 30, 2020 [2 favorites]


delivering bread is very much a life or death matter
posted by um at 6:48 PM on November 30, 2020 [2 favorites]


is set up that this actually makes a case for big bad corporations to be making the breakthroughs now.

Historically this was true in many cases. E.g. IMB and transistors, Exxon in soft matter, etc. I would be super interested to see a real history, but the influence of companies in scientific progress is longstanding. It went through a lull because of profit-driven cuts to basic research.
posted by lab.beetle at 7:15 PM on November 30, 2020


The experience of the chess community with DeepMind and AlphaZero was very mixed and may be a cautionary tale for researchers in protein folding. To recap they claimed that their software was able to beat Stockfish — the most powerful chess program at the time. People were very excited that a significant improvement in chess programs would would help bring new insights into the game that top players would be able to bring to their games.

At the top level modern classical slow chess (as opposed to the rapid and blitz game that has become popular during covid) has become very driven by computer analysis of opponents games and various opening lines in advance of the match. So the initial hope around AlphaZero was that top players would be able to use this tool to get a flood of new ideas and insights into positions.

Yet when the details were released it was clear to most involved in the Stockfish open source project that Google had crippled Stockfish and configured it to get an outcome that would make a good press release. Then they never made it available for any kind of real external scrutiny . The hired a couple of well known chess authors to review some of its games and put out a book — but that seemed like more google /Deep Mind PR.

The Chess Engine Championship TCEC is the most important open competition for chess programs and AlphaZero has never entered. However open source developers have attempted to user the techniques described in the DeepMind research papers to make an open source engine called “Lc0 /Leila”.

Lc0 has quickly become a top program; but unlike in Google’s paper it has not shown a clear leap over Stockfish and generally has finished second. Stockfish itself recently shifted to something called NNUE (Efficient Updatable Neural Networks) as a model for a deep learning approach. That seems to have shown much more promise than Lc0 and has resulted in some impressive gains in their performance.
posted by interogative mood at 9:42 PM on November 30, 2020 [5 favorites]


It's true that latency is a big problem and reducing it by a few orders of magnitude is a big deal. There are lots of contract research organizations who will generate wet lab structure data for a few thousand bucks per protein. But if I can do a decent enough job in silico that generating a slightly worse version costs 5 bucks of compute time and takes an (or can be done on spare compute cycles) I can predict structures for every protein in the human genome and let it guide researchers doing things like predicting SNP function in completely unknown proteins.
posted by benzenedream at 11:24 PM on November 30, 2020


There's a interesting question I got from reading Hacker News (adding grain of salt), of what this means for academia vs Google, is there something wrong about how academic research (in protein folding or in general) is set up that this actually makes a case for big bad corporations to be making the breakthroughs now. Will Google get the Nobel next year? Should academics do some soul searching, between chess and now molecular biology? I'd like to see a journalist or expert write about this social aspect of doing science.

It's a very superficial analysis to split the people doing this kind of work into "academia" vs "private sector". What's really important is the conditions under which people are working. DeepMind is part of Google, sure, so "private sector" whatever that means to you, but Google itself if an incredibly profitable monopoly able to buy and then fund DeepMind to do this kind of work on the basis that they might make money from it somehow. As a result, it is very well funded and they are under no pressure to produce or sell a product the way most of the "private sector" is nor under pressure to publish papers (regardless of actual merit) nor to write grant proposals the way that people in academia have to.

If anything, this is an example of the effectiveness of the XeroxParc / Bell Labs approach of just giving a bunch of smart people money to work on things with a very vague steer on what that should be over either the way academia currently works or the way that most of the private sector approaches r&d spending. Google of course is able to produce things where they only capture a small fraction of the public good generated through the incremental knowledge because of their monopoly position.
posted by atrazine at 4:53 AM on December 1, 2020 [4 favorites]


Former computational biologist here who worked on protein sequence alignment, it's impressive work but not even close to being a solved problem. One of my best friends has spent the last 15 years working on protein folding, so I picked up a bit from him too.

Someone who once interviewed me for a grant laid out some of it pretty well today.

Basically (combining the opinions of me and the people mentioned):
- the advance in average accuracy is impressive
- some existing programs do better on certain types of protein
- the way proteins fold depend on conditions in the cell, the most basic example being many proteins need other proteins present (chaperone proteins) to fold correctly, so it's not necessarily deterministic
- some proteins are like dry tagliatelle nests and some are like nests where bits have been cooked and can move around, in fact in general their function is to move around - what does it even mean to "solve their folding"
- Google have computational resources (good) and a PR department (ehhh) that the average research group can only dream about
posted by kersplunk at 6:51 AM on December 1, 2020 [2 favorites]


Google of course is able to produce things where they only capture a small fraction of the public good generated through the incremental knowledge because of their monopoly position.

Not sure if I read this correctly, but if a big bad capitalist corporation is appropriating the public good, i.e. the fruits or giant shoulders of academia, it nevertheless is a problem of academia vs Google leading the research breakthroughs in a given field, only that the assessment is Google's approach is fundamentally parasitic or at best dependent on academic basic science. And still the problem remains that academics were in the CASP race too, so does that not beg some kind of reflection on academia's own structures regardless of how Google's R&D functions? (Example, just pay professors and grad students Google quantities more money is the condition, but that cannot happen independently of reorganizing academia and its sociological context.)
posted by polymodus at 1:09 PM on December 1, 2020


Not sure if I read this correctly, but if a big bad capitalist corporation is appropriating the public good, i.e. the fruits or giant shoulders of academia, it nevertheless is a problem of academia vs Google leading the research breakthroughs in a given field, only that the assessment is Google's approach is fundamentally parasitic or at best dependent on academic basic science. And still the problem remains that academics were in the CASP race too, so does that not beg some kind of reflection on academia's own structures regardless of how Google's R&D functions? (Example, just pay professors and grad students Google quantities more money is the condition, but that cannot happen independently of reorganizing academia and its sociological context.)

Not quite what I meant. Research has value. Companies try and capture the value of innovation they pay for but the nature of pure research in particular is that it isn't possible to capture it through patents and trade secrets sufficiently and therefore "normal" private sector companies don't do it.

Society benefits from research (even if you just look at the vulgar economic side) and benefits more than it costs to do it. A rational society therefore should spend lots on research. If you leave research entirely to the competitive, normal parts of the private sector, very little would ever get done. That's because private sector companies will only fund research where they can capture enough of the value to cover their costs and profit and most basic research generates most of its value over a long period of time and very broadly which means that the private sector tends to concentrate its R&D in very narrow late stage development spending which has a short pay-back and is easily capturable through IP laws.

Monopolies have historically spent on R&D almost as if they were governments and DeepMind is analogous to the 20th century Bell Labs, money is being to spent do relatively basic research into deep learning with the sort of vague directional idea that Google might make money from it but they're not developing products or anything like that. A non-monopoly would be unlikely to act this way because it's not economically rational... unless you're able to control such a massive slice of the economy or a sector of it that you can profit from general growth. Bell for instance was confident that improvements in communications technology would come back to them in the form of profits somehow or another because of their ability to extract monopoly rents. Google has invested heavily in their machine learning hardware and cloud business and a breakthrough in machine learning, even if they own literally none of the IP and/or give it away will make them a huge amount of money.

The model of Bell Labs was effectively to pay researchers, including early career researchers quite well and to let them do essentially what they wanted to do with very light supervision. Academia might want to reflect on how much more productive researchers would be if they were able to purely pursue ideas in a very different sociological context as you say. (Individual academics generally agree with that, nobody likes writing grant proposals, but there's a whole complex context around them).

My original point was that taking this as an example of "the private sector" being good at innovation is misguided since this kind of innovation mostly comes from a very unusual part of the private sector which has little in common economically with most business.
posted by atrazine at 1:34 AM on December 2, 2020 [2 favorites]


Every time I read "Deep Mind", my brain pipes up "Infinite Polygon Engine".
posted by inpHilltr8r at 9:48 AM on December 5, 2020 [1 favorite]


« Older How old, ambient Japanese music became a smash hit...   |   Sweet Land of Liberty Newer »


This thread has been archived and is closed to new comments