NP Complete
February 19, 2021 8:42 AM Subscribe
Zero Knowledge Proofs - "By their name, they are proofs that reveal absolutely no information to the verifier... at the end of it, you know nothing more than you knew before except that the claim I made was true... It seems counterintuitive to say the least because we somehow associate conviction with information transfer."[1,2,3]
also btw...
also btw...
- Computer Scientists Achieve 'Crown Jewel' of Cryptography - "A cryptographic master tool called indistinguishability obfuscation has for years seemed too good to be true. Three researchers have figured out that it can work."[4,5,6,7,8,9]
- IBM Makes Encryption Paradox Practical - "'Fully homomorphic' cryptography allows... for the processing of encrypted files without ever needing to decrypt them first."
Computerphile published a video about zero knowledge proofs a few years ago.
posted by polytope subirb enby-of-piano-dice at 8:56 AM on February 19, 2021
posted by polytope subirb enby-of-piano-dice at 8:56 AM on February 19, 2021
@He_is_only_the_imposter:
As I understand it, it’s like “informational escrow”. I need you to accept that my information is correct, valid, etc. But if I show you the information, you can steal it. So I put it into “escrow”.
You can look at it inside the “IE” box, but all you will see is the “true” light flashing. The details are shrouded.
You have now verified that my claim about the information is “true”, but the information itself has not been “unlocked” in the process in a way that would allow it to be stolen..
I think that’s right.
posted by Pirate-Bartender-Zombie-Monkey at 9:06 AM on February 19, 2021
As I understand it, it’s like “informational escrow”. I need you to accept that my information is correct, valid, etc. But if I show you the information, you can steal it. So I put it into “escrow”.
You can look at it inside the “IE” box, but all you will see is the “true” light flashing. The details are shrouded.
You have now verified that my claim about the information is “true”, but the information itself has not been “unlocked” in the process in a way that would allow it to be stolen..
I think that’s right.
posted by Pirate-Bartender-Zombie-Monkey at 9:06 AM on February 19, 2021
I like to watch videos like this Numberphile episode as an alternative to hallucinogenics. They reveal the complete and utter boundedness of my understanding of anything/everything and at the same time the complete and utter boundary-lessness of anything/everything. It makes me feel very calm.
posted by Balthamos at 9:34 AM on February 19, 2021 [8 favorites]
posted by Balthamos at 9:34 AM on February 19, 2021 [8 favorites]
This was interesting and I feel like I'm in roughly the same space as Numberphile in that I followed the general outline Avi gave but feel like I need a better understanding of the actual how of the process of translating a given mathematical statement to a map. Would love to see an accessible explanation of that Cook-Levin process.
One thing I wonder about Avi's explanation vs. Numberphile's questions: how much of the disconnect is Avi doing the mathematician thing of saying something is easy and simple (e.g. whatever the Cook-Levin algorithm is I guess) and not also saying "...and also computationally enormous and impractical" in a way that better conveys why it's not a magic machine that collapses the complexity of everything.
Like my gut here is the bit that I didn't understand to be explicitly outlined in this video is that the Cook-Levin algorithm is a simple, and solved, and works, and also is hugely impractical in exactly the same way that computing NP Complete problems is hugely impractical. So it doesn't solve the problem of high-complexity computation time, it just creates an isomorphism between all high-complexity NP Complete problems and the specific elegant NP Complete problem of testing a mapping for 3-colorability? But I'm not confident that I've got that right in any case.
The other thing that I'm unsure about with the Zero Knowledge Proof process is: if the way to test someone's black box proof is to check local subsets of the solution (e.g. checking a random pair of adjacent countries in a map for valid different colors), then you can prove your black box works by always producing a good pair. But to test every single pair would for a large problem be a ton of work, yeah? So do you need to check every single pair to accept the black box proof, or do you simply check a large enough sample to reduce the chances of an invalid mapping to a very very small number and call it a day?
posted by cortex at 9:41 AM on February 19, 2021 [1 favorite]
One thing I wonder about Avi's explanation vs. Numberphile's questions: how much of the disconnect is Avi doing the mathematician thing of saying something is easy and simple (e.g. whatever the Cook-Levin algorithm is I guess) and not also saying "...and also computationally enormous and impractical" in a way that better conveys why it's not a magic machine that collapses the complexity of everything.
Like my gut here is the bit that I didn't understand to be explicitly outlined in this video is that the Cook-Levin algorithm is a simple, and solved, and works, and also is hugely impractical in exactly the same way that computing NP Complete problems is hugely impractical. So it doesn't solve the problem of high-complexity computation time, it just creates an isomorphism between all high-complexity NP Complete problems and the specific elegant NP Complete problem of testing a mapping for 3-colorability? But I'm not confident that I've got that right in any case.
The other thing that I'm unsure about with the Zero Knowledge Proof process is: if the way to test someone's black box proof is to check local subsets of the solution (e.g. checking a random pair of adjacent countries in a map for valid different colors), then you can prove your black box works by always producing a good pair. But to test every single pair would for a large problem be a ton of work, yeah? So do you need to check every single pair to accept the black box proof, or do you simply check a large enough sample to reduce the chances of an invalid mapping to a very very small number and call it a day?
posted by cortex at 9:41 AM on February 19, 2021 [1 favorite]
(I should say that I feel like Avi was addressing the enormous complexity thing when he clarified that Wiles' FLT proof would, if translated into the formal logic that presumably the Cook-Levin algorithm processes, would go from being a whatever-hundred page paper to being millions of pages of logic. I just wish there had been a little more actually direct discussion of that and of the complexity implications during the interview to make that bit clearer.)
posted by cortex at 9:43 AM on February 19, 2021
posted by cortex at 9:43 AM on February 19, 2021
As I understand it, a zero knowledge proof would be like, "I'd like to know if Alice's balance is high enough to pay for this transaction" and a zero knowledge proof is a scheme where Alice can prove it without sharing her balance or directly asking a trusted third party to check the balance for you. So it could be like "I have an algorithm that only returns true and a verification code if I put in a signed bank statement, your requested minimum balance, and your public key. If I tamper with any of these values, your algorithm will tell you that I'm lying."
Obviously, that's not the only case, but that's part of how Monero works so you can validate transactions without actually knowing who has what balance.
A popular example is imagine if you want to prove to a blindfolded person you have a flat coin with a different decal on each side, but they suspect it might be the same design on both sides, and you don't want to say what's on the decals for some weird reason. Have that person keep the coin out of your view, and let them flip it if they choose to. They then show you the coin and ask it's been flipped. Each time, there's a 50% chance you can guess correctly if they're the same, but it's 100% of the time if the designs are different. If you repeat this test a few times, it becomes half as probable each time that you're lying.
posted by ikea_femme at 9:48 AM on February 19, 2021 [9 favorites]
Obviously, that's not the only case, but that's part of how Monero works so you can validate transactions without actually knowing who has what balance.
A popular example is imagine if you want to prove to a blindfolded person you have a flat coin with a different decal on each side, but they suspect it might be the same design on both sides, and you don't want to say what's on the decals for some weird reason. Have that person keep the coin out of your view, and let them flip it if they choose to. They then show you the coin and ask it's been flipped. Each time, there's a 50% chance you can guess correctly if they're the same, but it's 100% of the time if the designs are different. If you repeat this test a few times, it becomes half as probable each time that you're lying.
posted by ikea_femme at 9:48 AM on February 19, 2021 [9 favorites]
FWIW, the "proof" is still somewhat probabilistic. Every time you inspect a small part of the full 3-coloring solution, you will either see two different colors, which gives you a little bit of confidence that the solution is valid, or you see two of the same color, which proves the solution is invalid. You repeat the inspection process until you have the desired level of confidence in the solution, but the adversary gets to shuffle the colors around each time, to prevent you from drawing conclusions about the relationship of nodes to each other in each subsequent round of inspections.
posted by rustcrumb at 9:50 AM on February 19, 2021 [5 favorites]
posted by rustcrumb at 9:50 AM on February 19, 2021 [5 favorites]
Here's a kinda-close example using public key crypto:
You have a public key and private key, and the private key is never revealed. We start a conversation, and I want to verify that you are who you say you are by determining whether you have your super-secret private key. To prove that you have the private key, I can encrypt a random word with the public key and send it across to you. Then all you have to do is decrypt the word with the private key and send it back. Then I still don't know what the private key is, but I do know you have it. (with very high probability: if you send back a random word instead of decrypting, there's a tiiiiny chance you pick the same random word.)
---
The discrete log example on wikipedia is similar to this, but without a public key (which in practice means we need to know that the public key is really yours, relying on some prior knowledge or a trusted third party to vouch that the public key is really yours).
The core tool in the real zero-knowledge proofs is the probabilistic interactive proof step: At each step, we either verify that you're not cheating, or that you can do a certain operation with the info you claim to have. Do it enough times, and the probability that you're telling the truth approaches 1.
posted by kaibutsu at 10:17 AM on February 19, 2021 [3 favorites]
You have a public key and private key, and the private key is never revealed. We start a conversation, and I want to verify that you are who you say you are by determining whether you have your super-secret private key. To prove that you have the private key, I can encrypt a random word with the public key and send it across to you. Then all you have to do is decrypt the word with the private key and send it back. Then I still don't know what the private key is, but I do know you have it. (with very high probability: if you send back a random word instead of decrypting, there's a tiiiiny chance you pick the same random word.)
---
The discrete log example on wikipedia is similar to this, but without a public key (which in practice means we need to know that the public key is really yours, relying on some prior knowledge or a trusted third party to vouch that the public key is really yours).
The core tool in the real zero-knowledge proofs is the probabilistic interactive proof step: At each step, we either verify that you're not cheating, or that you can do a certain operation with the info you claim to have. Do it enough times, and the probability that you're telling the truth approaches 1.
posted by kaibutsu at 10:17 AM on February 19, 2021 [3 favorites]
Have that person keep the coin out of your view, and let them flip it if they choose to. They then show you the coin and ask it's been flipped. Each time, there's a 50% chance you can guess correctly if they're the same, but it's 100% of the time if the designs are different. If you repeat this test a few times, it becomes half as probable each time that you're lying.
I believe the canonical explanation is the "Cave of Ali Baba".
posted by BungaDunga at 11:15 AM on February 19, 2021 [1 favorite]
I believe the canonical explanation is the "Cave of Ali Baba".
posted by BungaDunga at 11:15 AM on February 19, 2021 [1 favorite]
Here's where the video lost me:
How do I, the observer, know that the 3-color map that you've provided me is actually a representation of the problem? What makes it clear to me that this map corresponds to "Fermat's Last Theorem" and not "Is a Hot Dog a Sandwich"?
posted by explosion at 11:17 AM on February 19, 2021
How do I, the observer, know that the 3-color map that you've provided me is actually a representation of the problem? What makes it clear to me that this map corresponds to "Fermat's Last Theorem" and not "Is a Hot Dog a Sandwich"?
posted by explosion at 11:17 AM on February 19, 2021
I like to watch videos like this Numberphile episode as an alternative to hallucinogenics.
¿Por qué no los dos?
posted by Dr. Curare at 11:24 AM on February 19, 2021 [3 favorites]
explosion: I think it's that the 3-color mapping problem is just the example they're using here because it's a nice simple problem that also happens to be NP Complete. They stick with it for consistency's sake in the interviewer, but the whole thing isn't fundamentally about 3-coloring, it's about being able to generally look at a piece of the answer, for ANY question that someone has a black box solution for, and seeing that that answer is consistently correct.
So if they had a black box that could take a statement (e.g. "this food item has the sandwich property") and produce an answer, and you can give them a food item that you personally already know the sandwichness state of, and they consistently give you back an answer that confirms that knowledge for any given food item you hand them, they've produced proof that they can determine sandwichness without giving you any information you didn't already have. You know a hotdog is (or is not) a sandwich, but you don't know what their generalized method of calculating sandwichness is, only that it always works.
The video's focus on 3-colorability is enough to cover every NP Complete problem because every NP Complete problem can be translated to a 3-colorability problem, is the key thing. The actual method for doing that, the Cook-Levin algorithm, is something I still don't really have a good handle on, but the result of it is that if determining sandwichness is NP Complete then determining sandwichness is equivalent to determining 3-colorability.
So to put it another way: you should be able to ask about sandwichness either in the formal language of satisfying sandwichness, whatever that is, or in the formal language of satisfying 3-colorability, whichever is more convenient for both parties. The questions can be reduced to the same problem; either way you know the answer and the owner of the black box is establishing (at least against increasingly vanishingly small odds) that they can determine that answer for arbitrary input.
posted by cortex at 11:36 AM on February 19, 2021
So if they had a black box that could take a statement (e.g. "this food item has the sandwich property") and produce an answer, and you can give them a food item that you personally already know the sandwichness state of, and they consistently give you back an answer that confirms that knowledge for any given food item you hand them, they've produced proof that they can determine sandwichness without giving you any information you didn't already have. You know a hotdog is (or is not) a sandwich, but you don't know what their generalized method of calculating sandwichness is, only that it always works.
The video's focus on 3-colorability is enough to cover every NP Complete problem because every NP Complete problem can be translated to a 3-colorability problem, is the key thing. The actual method for doing that, the Cook-Levin algorithm, is something I still don't really have a good handle on, but the result of it is that if determining sandwichness is NP Complete then determining sandwichness is equivalent to determining 3-colorability.
So to put it another way: you should be able to ask about sandwichness either in the formal language of satisfying sandwichness, whatever that is, or in the formal language of satisfying 3-colorability, whichever is more convenient for both parties. The questions can be reduced to the same problem; either way you know the answer and the owner of the black box is establishing (at least against increasingly vanishingly small odds) that they can determine that answer for arbitrary input.
posted by cortex at 11:36 AM on February 19, 2021
I haven't watched the video yet but IMO one awesome part of zero knowledge is the famous theorem, IP = PSPACE.
Whether P = NP is unknown, and whether NP = PSPACE is also unknown. But the complexity theory of interactive proofs says in a tantalizing way that most truths we care to know about (decision problems) can be known by the act of communication between two beings (computational agents). I find that very profound, a mathematical illustration of the notion of dialectical truth.
posted by polymodus at 11:41 AM on February 19, 2021 [1 favorite]
Whether P = NP is unknown, and whether NP = PSPACE is also unknown. But the complexity theory of interactive proofs says in a tantalizing way that most truths we care to know about (decision problems) can be known by the act of communication between two beings (computational agents). I find that very profound, a mathematical illustration of the notion of dialectical truth.
posted by polymodus at 11:41 AM on February 19, 2021 [1 favorite]
Explosion: The reason I can’t show you some arbitrary map is because we both know Cook-Levin. You can do Cook-Levin for yourself to translate the statement you want a truth-value for into a map. Then, if I ever answer one of the challenges in a way that is invalid for the map, you can just stop the protocol and conclude that I don’t really know an answer to your problem.
You don’t know how to answer the question, but you do know how to phrase the question using Cook-Levin, and could detect if I am answering according to an invalid translation.
Cortex: Cook-Levin style transformations are actually easy, to the point where we can use them to phrase questions about the correctness of real hardware chips and software systems. Of course, answering those questions efficiently on realistic problems is the subject of intense research, but it has broken through into a few applications. I think Microsoft actually ships (see, Industrial Adoption subsections) tools that use Cook-Levin style transformations of device drivers to phrase questions like “is there an input that will make this driver crash the whole computer?”
posted by kitten_hat at 12:24 PM on February 19, 2021 [2 favorites]
You don’t know how to answer the question, but you do know how to phrase the question using Cook-Levin, and could detect if I am answering according to an invalid translation.
Cortex: Cook-Levin style transformations are actually easy, to the point where we can use them to phrase questions about the correctness of real hardware chips and software systems. Of course, answering those questions efficiently on realistic problems is the subject of intense research, but it has broken through into a few applications. I think Microsoft actually ships (see, Industrial Adoption subsections) tools that use Cook-Levin style transformations of device drivers to phrase questions like “is there an input that will make this driver crash the whole computer?”
posted by kitten_hat at 12:24 PM on February 19, 2021 [2 favorites]
It's interesting because as with the Microsoft example, the argument is that it seems there's an observed concordance in that in practical cases we care about, the intractable problems are often actually manageable despite the theory saying in the general case it might not be. Gurevich who is a computer scientist supported by Microsoft has mentioned there are actual projects to check the halting problem for a given piece of C code. His talks are on YouTube, pretty neat and entertaining.
posted by polymodus at 1:06 PM on February 19, 2021
posted by polymodus at 1:06 PM on February 19, 2021
This is pretty fascinating, but something eludes me. I may misunderstand, but isn’t Cook-Levin about propositional logic, i.e. without quantification? I grasp the reduction of SAT to three coloring, and the reduction of any mathematical assertion to first-order logic. I’m just not seeing how first-order logic reduces to SAT. I very vaguely remember a procedure for throwing out universal quantifiers and replacing existential quantifiers with constants; I forget its name, but it seems possibly relevant?
posted by sjswitzer at 1:21 PM on February 19, 2021
posted by sjswitzer at 1:21 PM on February 19, 2021
This seems to be the answer to my question. I’ll see if I can make any sense of it.
posted by sjswitzer at 1:38 PM on February 19, 2021
posted by sjswitzer at 1:38 PM on February 19, 2021
When you have a proof of a statement and we both agree that statement has been transformed into a map (or graph) by the Cook-Levin process, you begin by converting your proof into a 3-coloring of the map. Then we go into a loop, the length of which being determined by my threshold for doubt, wherein:
1. You apply a random permutation of the colors in your 3-coloring. ["if we have one, we have six."]
2. I choose (probably at random) a pair of neighboring countries (vertices) and verify that they are of different colors, and I keep a tally of the color pairs I observe (Red&Blue vs. Yellow&Red, etc.)
After some enormous number of loops, I look at my tally and see that each of the three distinct pairs of colors was observed sufficiently close to 1/3 of the time. If so, I declare that you have a proof of the statement, though I don't know anything about the coloring, since each observation happened after a random shuffling of colors. I also randomly chose two neighboring countries since I want to make sure every possible pair has an equal shot so I don't miss out on finding two neighboring countries with the same color (should that exist in your coloring, thereby showing your proof is not valid).
One quick question:
Are there many graphs that are barely 4-colorable, in the sense that just a few subgraphs require that 4th color? Would I want to pre-process my list of neighboring countries to check so I didn't miss those? Are there simple algorithms for finding such places, or am I left to randomly shuffle all possible pairs of neighboring countries and just pick a large enough N?
posted by klausman at 1:43 PM on February 19, 2021
1. You apply a random permutation of the colors in your 3-coloring. ["if we have one, we have six."]
2. I choose (probably at random) a pair of neighboring countries (vertices) and verify that they are of different colors, and I keep a tally of the color pairs I observe (Red&Blue vs. Yellow&Red, etc.)
After some enormous number of loops, I look at my tally and see that each of the three distinct pairs of colors was observed sufficiently close to 1/3 of the time. If so, I declare that you have a proof of the statement, though I don't know anything about the coloring, since each observation happened after a random shuffling of colors. I also randomly chose two neighboring countries since I want to make sure every possible pair has an equal shot so I don't miss out on finding two neighboring countries with the same color (should that exist in your coloring, thereby showing your proof is not valid).
One quick question:
Are there many graphs that are barely 4-colorable, in the sense that just a few subgraphs require that 4th color? Would I want to pre-process my list of neighboring countries to check so I didn't miss those? Are there simple algorithms for finding such places, or am I left to randomly shuffle all possible pairs of neighboring countries and just pick a large enough N?
posted by klausman at 1:43 PM on February 19, 2021
[I actually don't know if I have that totally right.]
posted by klausman at 1:52 PM on February 19, 2021
posted by klausman at 1:52 PM on February 19, 2021
I’m just not seeing how first-order logic reduces to SAT.
Your question made me think of Tao's remarks on Skolemisation in his blog post on completeness and decidability, would that be related?
Whereas in computation complexity, first-order logic with arbitrary quantifiers is part of the polynomial hierarchy (PH), and it is not known if it can be collapsed (we do know it's in between, as if: NP <= PH <= PSPACE), as I understand my rusty CS theory. "Exist-Forall" is a distinct complexity class from "Exist" (which is SAT), and so forth.
posted by polymodus at 1:57 PM on February 19, 2021 [1 favorite]
Your question made me think of Tao's remarks on Skolemisation in his blog post on completeness and decidability, would that be related?
Whereas in computation complexity, first-order logic with arbitrary quantifiers is part of the polynomial hierarchy (PH), and it is not known if it can be collapsed (we do know it's in between, as if: NP <= PH <= PSPACE), as I understand my rusty CS theory. "Exist-Forall" is a distinct complexity class from "Exist" (which is SAT), and so forth.
posted by polymodus at 1:57 PM on February 19, 2021 [1 favorite]
Thanks! Skolem Normal Form is exactly what I was trying to remember, but I haven’t thought about it since grad school many many years ago. Many.
posted by sjswitzer at 4:46 PM on February 19, 2021 [1 favorite]
posted by sjswitzer at 4:46 PM on February 19, 2021 [1 favorite]
Something I once realized about SAT being NP-complete is that converting from conjunctive normal form to disjunctive normal form must be at least NP-hard because you can just read off all the satisfying solutions from it. Probably “everybody” knows that, but I don’t remember being taught it.
(Edit; I probably meant NP complete there but too tired and too foggy on the details to think clearly about it.)
posted by sjswitzer at 4:57 PM on February 19, 2021 [1 favorite]
(Edit; I probably meant NP complete there but too tired and too foggy on the details to think clearly about it.)
posted by sjswitzer at 4:57 PM on February 19, 2021 [1 favorite]
Sjswitzer: Cook-Levin and the NP-completeness of SAT is indeed about propositional logic. You are right to be suspicious about converting arbitrary first-order sentences into propositional logic. The trick in complexity theory is the size of objects being quantified over. Basically, we’re converting a single first-order formula into a family of propositional formulas, each of whose size depends on the particular problem size we want to solve. Let me try and be precise.
When we talk about NP, we mean problems that are efficiently — deterministic polynomial time — verifiable, given a “witness.” In logical language, this witness is bound with an existential quantifier. For the map-coloring example, we could say, in first-order logic: “There exists a valid coloring of this map,” and the coloring itself is the witness.
Now then, how big can any coloring be? That is, how many bits will it take to describe? If we call the number of nodes in the graph n, just 2n bits: write down which of the three colors per node, in order. Notice that this witness is efficiently describable; not much larger than the graph itself! We can imagine a simple computer program (Turing Machine) that reads a graph and a proposed witness, and checks if this is an actual 3-coloring.
Cook-Levin consists of transforming the Turing Machine (TM) that checks a witness and the input (eg, graph) but not the witness into a propositional formula. The formula is a long list of statements about the local consistency of witness-checking computation on this particular input. Like “on step 27, the Turing Machine moves tape head exactly one cell left or right, and finds the same symbol as was there last time” The variables in this formula reflect how the computation evolves and encode the bits of a witness. The upshot is, the size of that formula depends on the size of the initial graph. It is “reasonably” sized — polynomial — compared to the graph we are asked to color.
If we could solve SAT, we could find an assignment to all these “computation transcript” variables that make the machine say “yes” on this particular input.
posted by kitten_hat at 6:22 PM on February 19, 2021
When we talk about NP, we mean problems that are efficiently — deterministic polynomial time — verifiable, given a “witness.” In logical language, this witness is bound with an existential quantifier. For the map-coloring example, we could say, in first-order logic: “There exists a valid coloring of this map,” and the coloring itself is the witness.
Now then, how big can any coloring be? That is, how many bits will it take to describe? If we call the number of nodes in the graph n, just 2n bits: write down which of the three colors per node, in order. Notice that this witness is efficiently describable; not much larger than the graph itself! We can imagine a simple computer program (Turing Machine) that reads a graph and a proposed witness, and checks if this is an actual 3-coloring.
Cook-Levin consists of transforming the Turing Machine (TM) that checks a witness and the input (eg, graph) but not the witness into a propositional formula. The formula is a long list of statements about the local consistency of witness-checking computation on this particular input. Like “on step 27, the Turing Machine moves tape head exactly one cell left or right, and finds the same symbol as was there last time” The variables in this formula reflect how the computation evolves and encode the bits of a witness. The upshot is, the size of that formula depends on the size of the initial graph. It is “reasonably” sized — polynomial — compared to the graph we are asked to color.
If we could solve SAT, we could find an assignment to all these “computation transcript” variables that make the machine say “yes” on this particular input.
posted by kitten_hat at 6:22 PM on February 19, 2021
polymodus: One of the best parts of complexity theory is trying to grapple with this apparent contradiction between the classical theory (worst-case NP hardness) and empirical reality: reasonable heuristics can solve surprisingly large SAT instances.
This is called “average-case complexity theory.” We say, look, we’re not worried about arbitrarily bad instances of (for example) SAT we’re worried about problems that actually tend to occur in nature — or in Intel’s chips. So let’s restrict attention to sources of problems that are “reasonable” in the following sense: there is an efficient Turing Machine that turns random bits into a SAT instance. Then, we try to understand what it is about different sources of SAT instances that make them easy or hard.
The main problem here is to decide which complexity world we live in. These five worlds reflect a more nuanced picture than the classical theory, which does indeed seem to contradict our practical experience.
Trying to figure this out is hard. I refuse to speculate as to which world we live in.
posted by kitten_hat at 6:37 PM on February 19, 2021 [1 favorite]
This is called “average-case complexity theory.” We say, look, we’re not worried about arbitrarily bad instances of (for example) SAT we’re worried about problems that actually tend to occur in nature — or in Intel’s chips. So let’s restrict attention to sources of problems that are “reasonable” in the following sense: there is an efficient Turing Machine that turns random bits into a SAT instance. Then, we try to understand what it is about different sources of SAT instances that make them easy or hard.
The main problem here is to decide which complexity world we live in. These five worlds reflect a more nuanced picture than the classical theory, which does indeed seem to contradict our practical experience.
Trying to figure this out is hard. I refuse to speculate as to which world we live in.
posted by kitten_hat at 6:37 PM on February 19, 2021 [1 favorite]
FWIW, the "proof" is still somewhat probabilistic. Every time you inspect a small part of the full 3-coloring solution, you will either see two different colors, which gives you a little bit of confidence that the solution is valid, or you see two of the same color, which proves the solution is invalid. You repeat the inspection process until you have the desired level of confidence in the solution, but the adversary gets to shuffle the colors around each time, to prevent you from drawing conclusions about the relationship of nodes to each other in each subsequent round of inspections.
It doesn't need to be probabilistic. Both of you have the same map; what you're trying to establish is whether your adversary has a 3-colouring of it. So you can systematically work through every border in the map, asking your adversary for their randomly permuted colour-pairs for the cells on either side of that border, until you've checked them all.
The part of this I currently having difficulty with is the idea that doing this could actually remain zero-knowledge. Intuition tells me that although any individual colour pair is indistinguishable from random, given a map of any decent size for which an adversary possesses a particular 3-colouring whose colours they're permuting in order to answer your queries, it ought to be possible to reconstruct that 3-colouring from the query results, unscrambling the colour permutations by comparing test results from map cells that get checked multiple times because they have multiple borders in much the same way as far too many "anonymized" data sets have turned out to have fairly easy de-anonymizations by exploiting internal correlations.
So there's probably some really elegant way that stuff cancels out other stuff that I'm not seeing yet.
posted by flabdablet at 8:18 PM on February 19, 2021
It doesn't need to be probabilistic. Both of you have the same map; what you're trying to establish is whether your adversary has a 3-colouring of it. So you can systematically work through every border in the map, asking your adversary for their randomly permuted colour-pairs for the cells on either side of that border, until you've checked them all.
The part of this I currently having difficulty with is the idea that doing this could actually remain zero-knowledge. Intuition tells me that although any individual colour pair is indistinguishable from random, given a map of any decent size for which an adversary possesses a particular 3-colouring whose colours they're permuting in order to answer your queries, it ought to be possible to reconstruct that 3-colouring from the query results, unscrambling the colour permutations by comparing test results from map cells that get checked multiple times because they have multiple borders in much the same way as far too many "anonymized" data sets have turned out to have fairly easy de-anonymizations by exploiting internal correlations.
So there's probably some really elegant way that stuff cancels out other stuff that I'm not seeing yet.
posted by flabdablet at 8:18 PM on February 19, 2021
flabdablet, my reckoning is that the key here is to ask what comparing multiple pairs with the same common country in one half would mean. Because you're not going in with a lack of knowledge about whether a given input map is 3-colorable: you know that, you just want the owner of the black box to prove that they can prove that too, for an arbitrary input.
Because the random permutation of colors on each check, if you have a country A that borders on countries B, C, D, and E, and you check {A,B}, {A,C}, {A,D}, and {A,E}, the colors you'd get back for each pair would have a random color for A and a random different color for B, C, D, and E, yeah? So you could take notes on each of those comparisons and conclude through the accumulation of data that...A is a different color than B or C or D or E. If you already know your map is 3-colorable, you've learned nothing: the inference is that this 3-colorable map is in fact 3-colorable.
posted by cortex at 9:41 PM on February 19, 2021
Because the random permutation of colors on each check, if you have a country A that borders on countries B, C, D, and E, and you check {A,B}, {A,C}, {A,D}, and {A,E}, the colors you'd get back for each pair would have a random color for A and a random different color for B, C, D, and E, yeah? So you could take notes on each of those comparisons and conclude through the accumulation of data that...A is a different color than B or C or D or E. If you already know your map is 3-colorable, you've learned nothing: the inference is that this 3-colorable map is in fact 3-colorable.
posted by cortex at 9:41 PM on February 19, 2021
you're not going in with a lack of knowledge about whether a given input map is 3-colorable: you know that, you just want the owner of the black box to prove that they can prove that too
I might well have misunderstood, but I thought that the point was that you're both working off the same input map, your adversary asserts that they are able to 3-colour it, and that the task is to check the truth of that assertion without learning their actual 3-colouring. I wasn't under the impression that you need to start off in possession of a 3-colouring of your own to make this work.
posted by flabdablet at 11:08 PM on February 19, 2021
I might well have misunderstood, but I thought that the point was that you're both working off the same input map, your adversary asserts that they are able to 3-colour it, and that the task is to check the truth of that assertion without learning their actual 3-colouring. I wasn't under the impression that you need to start off in possession of a 3-colouring of your own to make this work.
posted by flabdablet at 11:08 PM on February 19, 2021
I suspect that the resolution to my perceived dilemma might be along the lines that processing the adversary's responses in order to reconstruct their 3-colouring will in general involve at least as much computation as constructing my own from first principles, in which case talking to them about it doesn't give me anything of value.
posted by flabdablet at 11:24 PM on February 19, 2021
posted by flabdablet at 11:24 PM on February 19, 2021
I might have misunderstood too, so, heh. This is very much an "I think this is the situation" take, not "I think you will find, upon further examination, that I am right about this", as far as that goes. My gut sense so far of the Zero Knowledge aspect of this is that for you to not gain knowledge, you need to know going in what the black box can prove on the way out, so learning the satisfiability of your input map would be additional knowledge that violates that expectation. But I don't know if I've got that right.
posted by cortex at 11:40 PM on February 19, 2021
posted by cortex at 11:40 PM on February 19, 2021
I wonder if you can involve a random permutation of the graph vertices in an interesting way, similar to how the colors are randomly permuted. (Countries = graph vertices, edges = borders between countries.)
A key feature in the crypto interactive proofs is getting /either/ a check of correctness /or/ a check of non-cheating at each step. So Peggy (the prover) puts a random permutation in one envelope, and then has an envelope for the color of each country. Victor (the verifier) can ask for /either/ two random countries colors, /or/ some salient information about the permutation (combined in a clever way with the problem so that the answer isn't completely arbitrary).
posted by kaibutsu at 11:52 PM on February 19, 2021
A key feature in the crypto interactive proofs is getting /either/ a check of correctness /or/ a check of non-cheating at each step. So Peggy (the prover) puts a random permutation in one envelope, and then has an envelope for the color of each country. Victor (the verifier) can ask for /either/ two random countries colors, /or/ some salient information about the permutation (combined in a clever way with the problem so that the answer isn't completely arbitrary).
posted by kaibutsu at 11:52 PM on February 19, 2021
It doesn't need to be probabilistic. [...] So you can systematically work through every border in the map
If the prover knows which edge you are going to ask for, they can simply lie about that edge. The key point is that the prover sends you binding commitments about their permuted assignment-- think of it like each round they send you a bunch of locked boxes, each one containing a color for a vertex. You ask for the keys to open two of the boxes, check that the vertex assignments inside really are different colors, and burn the rest of the boxes. Then next round they send you a new set of boxes. So if the prover knows you're going to ask about vertex 123 and vertex 456 in a particular round, in that round they can put "Blue" into box 123 and "Red" into box 456 and stuff the other boxes with anything at all because they know you aren't going to open them.
But by choosing edges randomly you give the prover nowhere to hide: either they commit to a genuine solution, or there is a chance you will catch them in an inconsistency, and by repeating the process you can amplify that chance of catching them to any arbitrary level you're comfortable with.
posted by Pyry at 3:05 AM on February 20, 2021 [1 favorite]
If the prover knows which edge you are going to ask for, they can simply lie about that edge. The key point is that the prover sends you binding commitments about their permuted assignment-- think of it like each round they send you a bunch of locked boxes, each one containing a color for a vertex. You ask for the keys to open two of the boxes, check that the vertex assignments inside really are different colors, and burn the rest of the boxes. Then next round they send you a new set of boxes. So if the prover knows you're going to ask about vertex 123 and vertex 456 in a particular round, in that round they can put "Blue" into box 123 and "Red" into box 456 and stuff the other boxes with anything at all because they know you aren't going to open them.
But by choosing edges randomly you give the prover nowhere to hide: either they commit to a genuine solution, or there is a chance you will catch them in an inconsistency, and by repeating the process you can amplify that chance of catching them to any arbitrary level you're comfortable with.
posted by Pyry at 3:05 AM on February 20, 2021 [1 favorite]
I suspect that the resolution to my perceived dilemma might be along the lines that processing the adversary's responses in order to reconstruct their 3-colouring will in general involve at least as much computation as constructing my own from first principles
Well, each round you learn that for a particular edge (i, j), color(i) != color(j). Suppose that you play long enough that you manage to ask about every edge, then you will have learned that for all edges (i, j), color(i) != color(j). Solving this set of constraints is exactly the original problem!
posted by Pyry at 3:27 AM on February 20, 2021
Well, each round you learn that for a particular edge (i, j), color(i) != color(j). Suppose that you play long enough that you manage to ask about every edge, then you will have learned that for all edges (i, j), color(i) != color(j). Solving this set of constraints is exactly the original problem!
posted by Pyry at 3:27 AM on February 20, 2021
The thing that's just clicked for me is that there is absolutely no reason at all why the colours that the prover puts inside the envelopes need to be a mere permutation of the colours in the prover's putative 3-colouring, as suggested during the colour-renaming part of the Numberphile exposition. They can be a completely arbitrary coding of those colours into symbols drawn from a very large symbol space.
The vague outline of the crack that I intuited might be possible to run against the described zero-knowledge proof process relied on cross-checking the query results for edge(i, j) with those for edge(i, k), edge(j, k) and so on, using the results accumulated around loops in the graph to discern which of the six possible permutations of three colour names had been globally applied on any given query step.
But if all I'm seeing revealed inside my chosen envelopes at each step are something like GUIDs that code for colours, and the GUID codes chosen for each query response are completely independent of those chosen for previous responses, then there is no way at all to correlate results across query steps and the crack doesn't even get out of the starting gate.
So I'm a happy camper now.
posted by flabdablet at 5:23 AM on February 20, 2021
The vague outline of the crack that I intuited might be possible to run against the described zero-knowledge proof process relied on cross-checking the query results for edge(i, j) with those for edge(i, k), edge(j, k) and so on, using the results accumulated around loops in the graph to discern which of the six possible permutations of three colour names had been globally applied on any given query step.
But if all I'm seeing revealed inside my chosen envelopes at each step are something like GUIDs that code for colours, and the GUID codes chosen for each query response are completely independent of those chosen for previous responses, then there is no way at all to correlate results across query steps and the crack doesn't even get out of the starting gate.
So I'm a happy camper now.
posted by flabdablet at 5:23 AM on February 20, 2021
No, it has to be a permutation because if the prover was allowed to use a >3 symbol space they could fool the verifier with an N-coloring instead of a 3-coloring. If the prover could use an arbitrarily large color set, they could trivially give each vertex a unique color and then every edge check would pass. But even with just permutations, cross checking doesn't give the verifier any information because the prover picks a random permutation each round.
Now, you might say that the prover could perhaps get to use a different set of three colors each round: that it declares "this round I'm using (red, green,blue)" or "this round I'm using (fjCpeYqEN0OvkMMjg7fckw, WE4T7SAYskuocA5drXnWwg, mMtqikjvFEudYkYZFckoqw)", but this is just permutations by a different name since each round we can always rename the first color 'red', the second 'green', and the third 'blue' or (0,1,2) or whatever canonical labeling we want.
posted by Pyry at 6:33 AM on February 20, 2021 [2 favorites]
Now, you might say that the prover could perhaps get to use a different set of three colors each round: that it declares "this round I'm using (red, green,blue)" or "this round I'm using (fjCpeYqEN0OvkMMjg7fckw, WE4T7SAYskuocA5drXnWwg, mMtqikjvFEudYkYZFckoqw)", but this is just permutations by a different name since each round we can always rename the first color 'red', the second 'green', and the third 'blue' or (0,1,2) or whatever canonical labeling we want.
posted by Pyry at 6:33 AM on February 20, 2021 [2 favorites]
If the prover could use an arbitrarily large color set, they could trivially give each vertex a unique color and then every edge check would pass.
Oops.
even with just permutations, cross checking doesn't give the verifier any information because the prover picks a random permutation each round.
I am going to have to sit down with a lot of brown paper before intuition stops nagging me about this.
Thanks for the reality checks.
posted by flabdablet at 7:03 AM on February 20, 2021
Oops.
even with just permutations, cross checking doesn't give the verifier any information because the prover picks a random permutation each round.
I am going to have to sit down with a lot of brown paper before intuition stops nagging me about this.
Thanks for the reality checks.
posted by flabdablet at 7:03 AM on February 20, 2021
Every time you inspect a small part of the full 3-coloring solution, you will either see two different colors, which gives you a little bit of confidence that the solution is valid, or you see two of the same color, which proves the solution is invalid. You repeat the inspection process until you have the desired level of confidence in the solution, but the adversary gets to shuffle the colors around each time, to prevent you from drawing conclusions about the relationship of nodes to each other in each subsequent round of inspections.
Sheesh. Mathematicians. The guy in the video goes through and explains everything, but then leaves the explicit description of the dramatic reveal as an exercise for the viewer to imagine. He gets right up to it, but he doesn't actually say what you just said. He doesn't explicitly paint the picture of "Okay, here's what happens as we verify it," the way you just did.
posted by straight at 11:16 AM on February 20, 2021 [1 favorite]
Sheesh. Mathematicians. The guy in the video goes through and explains everything, but then leaves the explicit description of the dramatic reveal as an exercise for the viewer to imagine. He gets right up to it, but he doesn't actually say what you just said. He doesn't explicitly paint the picture of "Okay, here's what happens as we verify it," the way you just did.
posted by straight at 11:16 AM on February 20, 2021 [1 favorite]
It's probably helpful to try a minimal example.
Take a four-cycle; write a coloring as XYZW where each letter is the color of a vertex, and adjacency 'wraps.' Then without loss of generality, the first two vertices are RG, and there are three legal completions: RGRG, RGRB, and RGBG. Then suppose the Prover has one of these three colorings, and the verifier wants to figure out ANYTHING about which coloring the Prover is using.
The verifier then wants to figure out is whether vertices 1 and 3 have the same color. But since the colors are permuted every time, it's impossible to figure that out looking at one edge at a time.
It's interesting to consider what happens here if the Verifier asks for the non-edge (1, 3). If the Prover gives the actual colors, they've revealed info about the coloring (eg, if the colors match, it isn't RGBR). So the Prover has to respond 'hey that's not a real edge!' or the scheme collapses. Seems like the envelope opening contains exactly one bit of information: "Is this edge legal or not?" and if it /doesn't/ tell you that, you're probably getting some other bit of information about the graph coloring.
posted by kaibutsu at 11:25 AM on February 20, 2021 [1 favorite]
Take a four-cycle; write a coloring as XYZW where each letter is the color of a vertex, and adjacency 'wraps.' Then without loss of generality, the first two vertices are RG, and there are three legal completions: RGRG, RGRB, and RGBG. Then suppose the Prover has one of these three colorings, and the verifier wants to figure out ANYTHING about which coloring the Prover is using.
The verifier then wants to figure out is whether vertices 1 and 3 have the same color. But since the colors are permuted every time, it's impossible to figure that out looking at one edge at a time.
It's interesting to consider what happens here if the Verifier asks for the non-edge (1, 3). If the Prover gives the actual colors, they've revealed info about the coloring (eg, if the colors match, it isn't RGBR). So the Prover has to respond 'hey that's not a real edge!' or the scheme collapses. Seems like the envelope opening contains exactly one bit of information: "Is this edge legal or not?" and if it /doesn't/ tell you that, you're probably getting some other bit of information about the graph coloring.
posted by kaibutsu at 11:25 AM on February 20, 2021 [1 favorite]
Both the holder of the secret proof and the interrogator know the graph and labeling of its nodes/edges. After all, the thing being proven is whether a given graph can be 3-colored.
posted by sjswitzer at 3:07 PM on February 22, 2021
posted by sjswitzer at 3:07 PM on February 22, 2021
« Older Super Monkey Ball: It's about the monkey inside | Our king is in another castle! Newer »
This thread has been archived and is closed to new comments
posted by He Is Only The Imposter at 8:49 AM on February 19, 2021 [2 favorites]