Breaking: AI Has Not Put Human Mathematicians Out of Work Yet
October 12, 2022 10:50 AM   Subscribe

Kauers and Moosbauer pen a short, understatedly snarky response to the recently heavily hyped Nature cover article describing the DeepMind team's AlphaTensor project.

Matrix multiplication is one of the most basic and import computations in mathematics. In 1969, Volker Strassen kicked off an entire field of research by showing that you can multiply two 2x2 matrices with 7 scalar multiplications instead of the 8 required using the classical method, which by application of recursion meant that arbitrary nxn matrices can be multiplied together with asymptotically fewer than O(n3) operations. The current best known complexity is now O( n2.3728596) due to Virginia Vassilevska Williams and Josh Alman but that algorithm is mostly of theoretical interest and doesn't actually answer the question, "What is the fastest way to multiply two small matrices together?"

Hyperbolic statements about the work aside, this is the problem the DeepMind paper aims to solve. Alexandre Sedoglavic at the University of Lille maintains a catalog of best known methods for multiplying small matrices and the new DeepMind results improve a number of the small rectangular records (notably 4x5 times 5x5) and the 10x10 and 11x11 square records (buried in the appendices). But the result they highlight most is the improvement of 4x4 and 5x5 multiplication over characteristic 2 / boolean arithmetic. The Kauers and Moosbauer paper throw a bit of cold water on that by publishing this very quick response which replicates the 4x4 result with elementary methods, and improves on the 5x5 result in a simlar way. Their decision to call the DeepMind team's result "The FBHHRBNRSSSHK-Algorithm" is a purposeful reminder that a human team discovered it (but also a bit of a joke that the paper had a very large number of co-authors compared to papers in the field of mathematical computing where 2-5 is more typical)
posted by 3j0hn (14 comments total) 14 users marked this as a favorite
 
I'm something of a deep learning skeptic, at least to the extent that I don't think they can solve all problems. DeepMind has done a lot of great work though, and I skimmed this Nature paper when it came out and came away impressed. This type of pattern searching problem (i.e. algorithms for solving matrix multiplication) strike me as something that deep learning could indeed do well.

I'm also not one to think that mathematicians are going to be out of jobs soon... and fast matrix multiplication, though extremely important, is also a very specific problem. It's still interesting to hear that mathematicians can still flex and turn the tables on big compute, even on well defined problems like this.

Nice post. Thanks.
posted by Alex404 at 11:31 AM on October 12, 2022 [1 favorite]


I’m worried this could escalate into Street Countdown. Please stay safe, everyone.
posted by Abehammerb Lincoln at 12:41 PM on October 12, 2022 [1 favorite]


Oh wow. Matrix Multiplication is definitely part of my wife's work as a mathematician so I can't wait for her to explain all of this tonight.
posted by Navelgazer at 1:24 PM on October 12, 2022 [3 favorites]


This type of pattern searching problem (i.e. algorithms for solving matrix multiplication) strike me as something that deep learning could indeed do well.
I definitely agree with this, and in fact this field has been using structured compute search to discover novel formulas for a while. This paper definitely interested me, but instead of striking me as "Wow, chess playing AI architecture can discover new algorithms", it hit me more as "Oh, the chess playing AI that I previously thought of as magical is in fact just doing a complicated structured search".
posted by 3j0hn at 1:56 PM on October 12, 2022


"The Kauers and Moosbauer paper throw a bit of cold water on that by publishing this very quick response which replicates the 4x4 result with elementary methods, and improves on the 5x5 result in a similar way."

This isn't really cold water at all. It is extremely common in mathematics for new results to be subsequently simplified and extended, once people know where to look... No one else found these methods in the previous fifty years when they theoretically could have, presumably because the search space is too large. This is exactly the situation where the reinforcement learning helps; it can look for promising solutions tirelessly.

The other nice part of the result is the practical application. It's typical these days to run tuning tests too pick the best algorithms for specific matrix sizes running on particularly hardware. Commodity mathematical optimization would presumably let you notice where most of your time is being spent and then go try to find better techniques automatically, instead of needing to hire a wizard to work out a solution for hardware that you will migrate off of on a year anyway...
posted by kaibutsu at 2:09 PM on October 12, 2022 [5 favorites]


This isn't really cold water at all. It is extremely common in mathematics for new results to be subsequently simplified and extended
It is a bit of cold water because K&M almost certainly already had the 4x4 result in hand as part of the forthcoming paper, and they chose to publicize it in this note to say, hey you can get this result with traditional methods.

The 5x5 result is an simple (published 3 days later!) extension of the FBHHRBNRSSSHK result and is more in that flavor. But the title of the paper is definitely meant as cold water, to point out that AlphaTensor isn't actually finding optimal solutions yet (which DeepMind didn't explicitly claim, but the coverage of the paper has been very EXTRA).
posted by 3j0hn at 8:47 PM on October 12, 2022 [2 favorites]


I don't think they already had a result and got scooped by Deepmind. They say they're extending from the Deepmind result in their (extremely short) preprint:

"This [new 5x5] solution was obtained from the scheme of Fawzi et al. [aka, Deepmind] by applying a sequence of transformations leading to a scheme from which one multiplication could be eliminated. Our new scheme for 4 × 4-matrices was obtained by the same technique, taking the standard multiplication algorithm (with 64 multiplications) as starting point."
posted by kaibutsu at 9:37 PM on October 12, 2022


I am certain K&M had their techniques in hand as part of the in preparation paper they cite, the timeline is too short and their 4x4 result didn't require anything from the AlphaTensor paper. (except maybe the idea to report results in characteristic 2, something it doesn't look like people working in this area were doing before).
posted by 3j0hn at 10:07 PM on October 12, 2022 [1 favorite]


Speed isn't enough to ensure they already had a result... Recall how fast the prime gap results came after Zhang's initial paper was released - the first result was only a week after the initial drop. And the k+m paper (short as it is) explicitly says they used the deepmind technique to find the new 4x4 method. This looks to me like some very knowledgeable people read the deepmind work last week, spent a day with a white board, realized they could do one better, and posted something to the arxiv quick as they could to get priority on the result.
posted by kaibutsu at 11:46 PM on October 12, 2022 [1 favorite]


Others have pointed out that even for multiplying small matrices, speed is not all that matters, you also need numerical stability.
roughly speaking, this concerns the compounding of errors resulting from using limited-precision representations of real numbers. It is known that Strassen's algorithm for matrix multiplication is numerically less stable than the naïve algorithm, which has cubic time complexity, compared to Strassen's O(n^2.807), and for that reason, the naïve algorithm is used in quite a few real-world applications. I suspect that "Strassen-square" is numerically less stable than Strassen, and according to Fawzi et al., most of their automatically discovered algorithms have numerical error bounds considerably higher than "Strasssen-square".
Minimizing the number of multiplications is a fun game for theorists, but in practice, there's no point if floating-point error blows up all the accuracy in your result.
posted by a car full of lions at 11:54 PM on October 12, 2022 [4 favorites]


These results are for matrices with coefficients in a ring of characteristic 2, i.e., the integers modulo 2, where arithmetic is all exact and finite, so floating point error is not relevant.
posted by mubba at 5:38 AM on October 13, 2022


Oops, never mind, I see the DeepMind paper is not limited to finite fields.
posted by mubba at 6:04 AM on October 13, 2022 [2 favorites]


Yes, I should have clarified that my comment was specifically about taking the wind out of the DeepMind paper hype, not anything to do with this arXiv paper.
posted by a car full of lions at 6:58 AM on October 13, 2022


Thanks for sharing that link a car full of lions. The DeepMind paper is a bit all over the place in the results it claims. Most of my bubble was talking about this result from a Theory of Computation point of view (novel formulas for small matrix products) rather than the automatic algorithm tuning of "practical systems" aspect. It's interesting to see some cold water coming from that direction as well. The AlphaTensor paper is interesting, being first use of this kind of deep learning to algorithm design, but there is a lot of hot air surrounding it
posted by 3j0hn at 7:38 AM on October 13, 2022


« Older Leaves on a Stream   |   Honey Badger Something Something Something Newer »


This thread has been archived and is closed to new comments