It's Hard
August 20, 2023 1:13 PM Subscribe

The tension at the heart of the natural proofs barrier is that the task of distinguishing high-complexity functions from low-complexity ones is similar to the task of distinguishing true randomness from the pseudorandomness used to encrypt messages. We’d like to show that high-complexity functions are categorically different from low-complexity functions, to prove P ≠ NP. But we’d also like for pseudorandomness to be indistinguishable from randomness, to be confident in the security of cryptography. Maybe we can’t have it both ways. from Complexity Theory’s 50-Year Journey to the Limits of Knowledge
posted by chavenet (10 comments total) 33 users marked this as a favorite

The problem that's been tugging at the back of my brain the last couple years is this:

In the new world, we have a mix of Good-Old-Fashioned Computer Science and machine learning. Neural networks are universal function approximators, and AFAICT, we reeeeeally don't have any kind of good complexity theory for understanding what's easy vs hard for neural networks. (The exception to this is when we know that a problem is NP-complete or worse, and can show that the neural network is a P-time approximator.)

The problem with P/NP is that it's so dependent on algorithm. One would really love a more information-theoretic understanding of function complexity, and (especially) some way to judge that complexity with access only to observational data about the functions inputs and outputs. Basically, a way to look at a pile of data and get a sense of how /much/ neural network I need to throw at it. AFAICT, we have no idea how to do this.

Something that feels related: P/NP is only part of the story. Another part of the story is that quadratic time algorithms are often way too slow, and we need to replace them with linear time variants. This happens in a few different ways: sometimes you can get linear by increasing memory and parallelizing, and sometimes you can shave off big parts of a problem with statistical methods. AFAIK, we also don't have a great theory of when something is quadratic time vs linear time.

But I'm also not a complexity theorist, and it's likely there's a big pile of interesting literature I'm missing on these topics.
posted by kaibutsu at 2:46 PM on August 20, 2023 [5 favorites]

Basically, a way to look at a pile of data and get a sense of how /much/ neural network I need to throw at it. AFAICT, we have no idea how to do this.

Indeed we do not. More importantly, if we could solve your "how much neural network do I need" problem, we'd already be done with the learning problem, via a predictor-to-distinguisher reduction! Let's take the problem you suggest and formalize it a little, and then connect it to modern meta-complexity research.

SMALL-CIRCUIT-DETECTOR PROBLEM.
IN: Randomly sampled set of "examples" of function behavior, like images with binary labels (e.g., cat/not cat, this being Metafilter.)

OUT: YES if these labels are consistent with a "reasonable sized" circuit, NO otherwise.

Notice the difference between this problem and MCSP: the data points are random examples instead of a complete description of the function on all inputs. Far more similar to real-world machine learning! In fact, it turns out, that if you can solve SMALL-CIRCUIT-DETECTOR efficiently then you can learn any function that a reasonable-sized circuit can represent, from random examples (Kothari and Livini, 2018).

A key limitation of Carmosino, Impagliazzo, Kabanets, and Kolokolova 2016 was, from a natural proof, they could only extract a learning algorithm that makes queries to a teacher. In other words, their algorithms cannot learn from random samples; they produce an image and they query a teacher 'is this a cat or not'.

In a very exciting recent development, Goldberg and Kabanets 2023 were able to take a problem closely related to MCSP (a type of resource-bounded Kolmogorov complexity, MKTP) and show that, if you have an efficient MKTP algorithm, you can solve SMALL-CIRCUIT-DETECTOR. Thus, they produce algorithms that can learn from random examples, buy. So, if MKTP is easy, then we have an incredibly powerful universal learning algorithm, that does not need a teacher.

But, as you say, MKTP and SMALL-CIRCUIT-REFUTER look incredibly hard! And indeed, we have no idea how to solve them in general! So... are we just sad forever? Or stuck trying to design algorithms for a problem that we have no idea how to get started on?

No. Because, as mentioned in the article, there are deep ties to cryptography. Ren and Santhanam 2021 showed that if MKTP is hard enough, cryptography actually exists.

This is one objective of meta-complexity: find these problems where, if they are too hard to solve efficiently, we can leverage that hardness to build secure cryptosystems, and if they are easy to solve, we get powerful learning algorithms. We seek to characterize win/win opportunities in the computational landscape.
posted by kitten_hat at 3:54 PM on August 20, 2023 [21 favorites]

Something that feels related: P/NP is only part of the story. Another part of the story is that quadratic time algorithms are often way too slow, and we need to replace them with linear time variants. ... AFAIK, we also don't have a great theory of when something is quadratic time vs linear time.

Indeed we didn't, up until fairly recently. There is a (now) whole world of "Fine-Grained Complexity" considering exactly these issues of distinguishing between linear, quadratic, cubic time: the exact exponent in the polynomial of the runtime.

What is surprising and beautiful about this line of work is that the exact runtime of "key" problems in P is inextricably connected to similar "key" problems in NP, like Boolean Satisfiability (SAT). Here's another quanta article discussing the close relationship between exact complexity of string edit distance and SAT.

Just as polynomial time can be quantitatively refined by looking at the exact exponent, the "brute-force search" time that seems to be required to solve various NP hard problems can be refined by looking at exact constants ε in an exponential growth rate of the form:

2^{(1 - ε) n}

That is, can we "shave off" a little time in the brute-force search for an answer, or not? The Wikipedia article on the Exponential time hypothesis does a pretty decent job of explaining this.
posted by kitten_hat at 4:19 PM on August 20, 2023 [7 favorites]

Basically, a way to look at a pile of data and get a sense of how /much/ neural network I need to throw at it. AFAICT, we have no idea how to do this.

As a snotty social science person my usual answer is that the best way to do this is by having a good theory, and that many things we think of as statistical or computational problems are really mostly research design and theory b development problems.
posted by GCU Sweet and Full of Grace at 4:30 PM on August 20, 2023 [2 favorites]

>> and get a sense of how /much/ neural network I need to throw at it. AFAICT, we have no idea how to do this.

> As a snotty social science person my usual answer is that the best way to do this is by having a good theory, and that many things we think of as statistical or computational problems are really mostly research design and theory b development problems.

to add a small concrete example: suppose we're interested in predicting the movements of planets, the orbits of the spheres. we're adherents of the geocentric theory, earth is the center of the solar system. we can propose a statistical model where the spheres dance according to epicycles about the earth. if we collect enough observational data from our telescopes, and jam enough parameters onto our model, we can tune our epicycle model to accurately predict all future movements of the spheres from our telescopes. success! but theoretically this geocentric explanation is quite a poor one, although we can use it to make accurate predictions for our local situation, we're going to run into all kinds of problems if we try to apply our fitted model to predict what the orbits of the spheres would look like from venus, or from Proxima Centauri.
posted by are-coral-made at 5:05 PM on August 20, 2023 [1 favorite]

>> As a snotty social science person my usual answer is that the best way to do this is by having a good theory, and that many things we think of as statistical or computational problems are really mostly research design and theory b development problems.

Arguably this is what many practical successes in machine learning come from: discovering that a particular neural network has a "reasonable shape" for some task (vision, language processing) and then throwing a gigantic pile of data at tuning the connection weights in said network --- parameters of the model.

Now, of course, a theory of geo-centrism and gravitation is fairly compact in terms of number of equations and parameters, so we were able to do pretty decent statistical processing even in the 1780's --- I think Laplace and Legendre used precursors to what we would today call least-squares regression to solve orbital dynamics problems around that time period?

Unfortunately, a rich enough theory of (for example) language processing seems much, much more difficult to describe in a compact way, so as to make it amenable to this kind of parametric modelling. This is why I expect many of the problems currently studied in meta-complexity to have more to do with cryptography than practical machine learning. The algorithms that one could run, if meta-complexity problems are easy, are universal: if any small circuit exists that is a good fit for the data, they will find it. Seems almost too good to be true, doesn't it? We can at least hope to get provably secure cryptography, in that case.

Currently I'm trying to develop a family of meta-complexity problems that are deliberately less universal, and so hopefully better "match up" to practical learning considerations like theory-building.

(Full disclosure, this article is about my research community and I talked to the author about it. I am delighted with how it came out; he did a better job building a compelling and accessible narrative about the field than I ever have. It is easy to get lost in how exciting the mathematics is when you're trying to explain things to newcomers or interested friends.)
posted by kitten_hat at 6:05 PM on August 20, 2023 [16 favorites]

This is such a well written piece. It might raise all sorts of “Well, actually” from actual practitioners, but I never before read an lay explanation of what P=NP means that I’ve actually understood. So, many thanks for posting this.
posted by DangerIsMyMiddleName at 6:47 AM on August 21, 2023 [1 favorite]

A note about NP-completeness: when we say that solving one NP-complete problem is equivalent to solving all of them, we mean that it is possible to take an instance of an NP-complete problem and turn it into a corresponding instance of a different problem; if we can solve that different problem, we can also mechanically translate the solution back into the terms of the original problem, and all of these translations are "cheap enough"- they don't get exponentially harder the bigger the problems and solutions are.

To give a silly example that uses actually-easy problems, we can say that "sort a list" and "find the maximum number in a list" are equivalent. I can say that "find the max" is "reducible" to sorting, because if I have the problem of finding the maximum number in a certain list, I can take my list, sort it, and then easily and mechanically translate the solution to the sorting problem into a solution to the find-the-max problem (by looking at the last number in the sorted list).

Sometimes, a category of problem (like the "Travelling Salesman" problem, which hopes to find the cheapest route to visit every city, given a set of possible paths between cities with different travel costs) is (probably!) hard in the worst case, BUT can have instances that are easy to solve in practice. For example, one can imagine a line of cities

* -- * -- * -- *

and even if the line of cities is very long, there's really just one route for the salesman to travel.

Since there is variation in how hard instances of NP-complete problems might be in practice, there is a tantalizing possibility- what if you could take a problem instance that is hard in its original form and translate it into an equivalent NP-complete problem of a different type? Might the new problem be easy in practice?

Spookily, the answer is no. Hard-in-practice problem instances in one form stay hard-in-practice even when translated into other forms.
posted by a snickering nuthatch at 8:42 AM on August 21, 2023 [2 favorites]

Arguably this is what many practical successes in machine learning come from:

[this delights me]
posted by GCU Sweet and Full of Grace at 11:47 AM on August 21, 2023 [1 favorite]

Sometime in the all too recent past, I learned what exactly is meant by a "deep" neural net. I was a little shocked to find out that all it meant was a neural net with more than one layer and that (unless I'm mistaken here) previously all neural nets were only one layer deep. The reason this was surprising to me was that one of the most famous results in circuit complexity theory -- and what is a neural net if not a super-duper fancy circuit -- is that the PARITY function is not in computable by a polynomially-sized, constant depth Boolean circuit (AND, OR, NOT). Now, obviously these aren't exactly comparable situations (e.g.: asymptotic complexity stuff vs. practical engineering stuff) but I can't help but get the feeling in the back of my brain that someone somewhere should have recalled this result and perhaps wondered about the computational power of circuit depth. In that same vein, I wonder if any one has considered adding some kind of continuous analog to a MOD gate (perhaps via some kind of rotation of the inputs) into neural nets to try to harness the power of ACC which was one of the smallest complexity classes not known to be distinct from some huge complexity class (possibly PSPACE?) until Ryan William's blockbuster result of 2011. I suspect the computational effort of such a gate would likely dissuade most investigations but if I had the engineering inclination, this would probably be one area I'd try to poke into.
posted by mhum at 10:56 AM on August 22, 2023 [1 favorite]

« Older Extreme heat, tropical cyclones, and more:... | Same place, different songs, half a century apart Newer »

This thread has been archived and is closed to new comments

MetaFilter

It's Hard
August 20, 2023 1:13 PM Subscribe

Tags

Share

It's Hard August 20, 2023 1:13 PM Subscribe

Tags

Share

It's Hard
August 20, 2023 1:13 PM Subscribe