The Quest for Randomness
April 24, 2014 12:22 AM   Subscribe

Can you ever be reasonably sure that something is random, in the same sense you can be reasonably sure something is not random (for example, because it consists of endless nines)? Even if a sequence looked random, how could you ever rule out the possibility that it had a hidden deterministic pattern? And what exactly do we mean by “random,” anyway?
posted by empath (48 comments total) 37 users marked this as a favorite
 


Didn't know the the 762nd decimal place of π, the Feynman point, had a name until today.
posted by fredludd at 1:58 AM on April 24, 2014 [3 favorites]


I sometimes joke that the strongest argument for taking τ (=2π) as the fundamental circle constant is that the Feynman point for τ extends to seven nines.
posted by Proofs and Refutations at 2:07 AM on April 24, 2014 [4 favorites]


In communications theory, the definition of a random sequence is one that contains no information.

Apart from the entry of meteors into the Earth's atmosphere, random events are very hard to find in nature.
posted by three blind mice at 2:50 AM on April 24, 2014


Apart from the entry of meteors into the Earth's atmosphere, random events are very hard to find in nature.

Do you mean exact timing of meteors, location of entry, number per minute/hour/day/year? Given that there are predictable annual cycles of meteor showers that return at known times year after year, such that you can estimate the likely ratio of meteors on day(x) versus meteors on day(y), and given that meteors have predictable statistical distribution of entry (more at equator), and also a diurnal pattern, and locations of origin in the sky at different times, this does not seem a great example of randomness.
posted by Jimbob at 2:56 AM on April 24, 2014 [3 favorites]


The comments about the article on Scott Aaronson's blog are also worth reading.

As someone who's very much into quantum stuff, I'm looking forward to Part 2!
posted by maskd at 2:57 AM on April 24, 2014 [1 favorite]


three blind mice: "Apart from the entry of meteors into the Earth's atmosphere, random events are very hard to find in nature."

Not true at all, radioactive decay is very common indeed.
posted by Proofs and Refutations at 3:06 AM on April 24, 2014 [7 favorites]


"random events are very hard to find in nature"

There are chips and other computer stuff (to use the technical term), which generate random numbers based on heat, noise, and other envorinmental events. Speaking of computers, the soft side is usually lousy for randomness.
posted by KMB at 3:10 AM on April 24, 2014 [2 favorites]


Could Fourier analysis of a data set show whether it was random or not? One of the problems of building a cryptographic one time pad generator using avalanche breakdown junction noise as the primary source is that there are various external factors which can introduce periodic energy. Anecdotally, there's a story about a bright-eyed young engineer presenting his key generator to his greybeard boss for approval - said manager hooked up the box to the test equipment, started it running and then just put his mobile phone on top the case. Bzzzt! Thanks for playing ! Please try again.

I ask because I have a long-time project idea for a modern version of SIGSALY that, in its current version, is basically a Raspberry Pi in a box with a modem chip, two USB ports, a headset, audio in/out and a button. You put two blank large USB keys into the ports and press the button: the box then fills them both with identical randomness. Give one key to a friend with a similar box, and when you want to talk secretly over the phone or the radio, press the button again. The two boxes negotiate a starting point within the key bitstream and set about encoding and decoding the audio, overwriting the key as they go. It solves a useful subset of the problems of secret audio, can (and should) be an entirely open design, and is cheap and robust against remote attacks and interception of the encrypted stream (guarding against key theft and local audio-based bugging is left as an exercise to the reader). But it does need a decent source of true randomness in the box...
posted by Devonian at 3:38 AM on April 24, 2014 [5 favorites]


Blue snores fought ragamuffin cantos.
posted by fairmettle at 4:00 AM on April 24, 2014


Devonian, what you have is basically a one-time pad. They don't actually solve any problems that modern cryptographic systems can't handle, and they have problems of their own, particularly key exchange. This is a solved problem using modern cryptographic systems, but one-time pads require the physical exchange of data (in your case, USB keys) and are therefore subject to interception.
posted by Joe in Australia at 4:42 AM on April 24, 2014 [3 favorites]


  But it does need a decent source of true randomness in the box...

Just as well the Raspberry Pi has a built-in hardware random number generator, then.
posted by scruss at 5:06 AM on April 24, 2014


Blue snores fought ragamuffin cantos.

Although not the exact words I would have used it was my first thought, thus conceptually this comment was predictable.

Truly random is hard.
posted by sammyo at 5:13 AM on April 24, 2014


This is a solved problem using modern cryptographic systems

Standalone Diffie-Hellman is subject to interception too: two people can communicate using it without ever noticing that in fact there's a man in the middle who's decrypting their messages.

But you're right, we've come a long way since the one time pad. Authentication is still a painful thing to do though -- partly why we can't have a secure Internet just yet.
posted by maskd at 5:15 AM on April 24, 2014


Although not the exact words I would have used it was my first thought, thus conceptually this comment was predictable.

Truly random is hard.


Yes, true randomness is more than just unpredictability: it can't be causally related to anything in the past. Even if your number passes all the statistical randomness tests you can't guarantee that it wasn't caused by anything.

Some people think that you can get truly random numbers using quantum mechanics. I would guess that Scott is going to talk about that in Part 2.
posted by maskd at 5:25 AM on April 24, 2014 [2 favorites]


I don't think that kind of metaphysical randomness (ie: an uncaused effect, essentially) is actually possible. Though the idea of it goes back to Lucretius.
posted by empath at 5:49 AM on April 24, 2014 [1 favorite]


dfuTGVpvqPAfteSgJuYkkRPLjdHThpjvHIqlBnottmsIzLQGsTjgftoyjvfpimaqOCFd0VUbmcoKIIlxaarVhv0HPvubADmLIBYkWbEXKSIJ0RbZ0RIKRkvbDEFNR0bnuAqmhWFUBtQHijWkAOgOMC
posted by Omission at 6:06 AM on April 24, 2014 [1 favorite]


I know about Diffie-Hellman, and the issues with OTP and key security. The Raspberry Pi project idea is as much about creating a system that's simple enough to be widely understood and has as few complexities in implementation as possible, as making something that is the 'best' solution for the problem of secure speech communication. Most crypto problems in the real world come from implementation issues or overall system design infelicities, rather than the underlying encryption methods.

I didn't know about the Pi's on-chip RNG, but I don't think it's open source. The nice thing about rolling one's own hardware RNG -- assuming you can make one that works and is testable -- is that it removes reliance on a particular architecture and implementation, albeit by substituting problems of its own. You may not want to use a Pi, after all, or a new version may not be backwards compatible, or the NSA may have secretly snuck into the fab plant with a really, really small scalpel and hacked a back door into whatever chip you've ended up with. Of course, they haven't and I'm sure the Pi's RNG is perfectly fine, but this way they couldn't even if they wanted to.
posted by Devonian at 6:16 AM on April 24, 2014 [1 favorite]


Give one key to a friend with a similar box, and when you want to talk secretly over the phone or the radio, press the button again. The two boxes negotiate a starting point within the key bitstream and set about encoding and decoding the audio, overwriting the key as they go. It solves a useful subset of the problems of secret audio, can (and should) be an entirely open design, and is cheap and robust against remote attacks and interception of the encrypted stream (guarding against key theft and local audio-based bugging is left as an exercise to the reader). But it does need a decent source of true randomness in the box...

You also can't reuse any of the randomness, so you can only encrypt exactly a usb key's worth of data with it.

To be honest, a OTP, while better in a theoretical sense isn't any better in a practical sense than properly implemented standard encryption.

And for a OTP, pseudorandom is probably good enough.
posted by empath at 6:23 AM on April 24, 2014


Could Fourier analysis of a data set show whether it was random or not?

I would think it would reveal patterns in it, but could not prove absolutely that those patterns didn't occur randomly.

http://d24w6bsrhbeh9d.cloudfront.net/photo/aOqVEKN_700b.jpg

(And yeah, that event might take longer than the age of the universe to happen...)


This has no doubt been proven, disproven, or argued about endlessly by theorists: but I think randomness is relative. I could take meaningful data and present it to you claiming it's random. As long as you don't know where I got the data, and can't pattern-match it to discover it's a Windows .DLL, the ASCII codes for all the last names in a Manhattan phone book, every third digit of pi, or the prices of every item in a particular grocery store 185 days ago in order of shelf placement, it contains no information. And even if you do match it, it's not absolute proof that it's not a coincidental match...
posted by Foosnark at 6:34 AM on April 24, 2014


And for a OTP, pseudorandom is probably good enough.

Gah, no. The more pseudorandom bits the attacker sees, the more likely they can predict the sequence, especially if they know what generator you're using. This is the property that the NSA exploited with their Dual EC generator.
posted by RobotVoodooPower at 6:59 AM on April 24, 2014 [2 favorites]


Devonian: I ask because I have a long-time project idea [...] a Raspberry Pi in a box with a modem chip, two USB ports, a headset, audio in/out and a button. You put two blank large USB keys into the ports and press the button: the box then fills them both with identical randomness.

I've been working on almost precisely this project for the past couple of months. My version is a bit smaller and simpler: it's a board that breaks in half. Each half has a USB interface and a 2GB NAND flash. I use a noisy ring oscillator fed into a noisy A/D to generate the bits. I just assembled my v3 prototypes; it'll probably be launched in a couple of weeks or so.

github repo, v2 prototypes.
posted by phooky at 7:31 AM on April 24, 2014 [10 favorites]



>You also can't reuse any of the randomness, so you can only encrypt exactly a usb key's worth of data with it.

Well, yes, which is why I said use large USB keys and delete the key as you consume it (although securely deleting stuff from flash has its own issues). Assuming you're using 4800 bps full-duplex, your 32GB key will give a year's continuous comms. Takes a while to fill, though.

Think of it as an exercise in building a good-enough system with limited resources, one that will resist automated attack (the surface between the device and the shared network is tiny - a slow modem that sends data and handshakes - and even I could write software that would manage that entirely safely) and has a good chance of being sufficiently resistant to any surreptitious attack that the bad guys would have to get their jollies some other way.

Testing the RNG for randomness is the thing I'm least comfortable doing, and it's something that's fascinated me for some time. Hence my Fourier question... I suppose frequency analysis is a normal part of cryptography, but haven't seen this particular approach.
posted by Devonian at 7:50 AM on April 24, 2014


Could Fourier analysis of a data set show whether it was random or not?

There is a pretty large literature on mathematical tests for randomness, using various statistical techniques and transformations, including frequency analysis. One typically uses not a Fourier transform, but the related Hadamard transform - if you are sufficiently interested, I'm sure you can find the relevant papers.
posted by crazy_yeti at 7:57 AM on April 24, 2014 [2 favorites]


As long as you don't know where I got the data, and can't pattern-match it to discover it's a Windows .DLL, the ASCII codes for all the last names in a Manhattan phone book, every third digit of pi, or the prices of every item in a particular grocery store 185 days ago in order of shelf placement, it contains no information

But, one could measure the (information-theoretic) entropy of each of these bit-streams, and they would vary widely. The DLL and phone-book information, encoded or not, is going to have a lot of internal structure and that is detectable without understanding or decoding it. The 'every third digit of pi' would look closest to random, but those other examples you cited are highly structured - even the grocery store prices (items with similar prices will be closer on the shelf, there is an overall distribution to the prices which is far from random, and there are also tendencies like prices ending in .99, and the leading digits would probably exhibit a distribution according to Benford's Law).
posted by crazy_yeti at 8:05 AM on April 24, 2014 [1 favorite]


But, one could measure the (information-theoretic) entropy of each of these bit-streams, and they would vary widely

But what about the "infinite number of monkeys" problem (or, if you prefer, the Borges library problem). Doesn't every infinite random sequence also contain every finite structured sequence? That is, a truly random number generator must, eventually, spit out the sequence of "the ASCII codes for all the last names in a Manhattan phone book" as well as spitting out "every third digit of pi" or the complete text of Hamlet in binary form, no? So no test run on any particular sequence of numbers produced by a given random number generator can definitively disprove its randomness, can it?
posted by yoink at 8:17 AM on April 24, 2014 [1 favorite]


That is, a truly random number generator must, eventually ...

It's all in that word "eventually". An infinitely long, truly random sequence has a high probability (approaching 1) of containing the text of Hamlet. But we can calculate how long, on average, we'd have to wait until we saw that, and that expected time would be extremely long (probably longer than the age of the known universe). So, if we turn the (alleged) random generator on, and the VERY FIRST THING that comes out is Hamlet, we can say that this is *extremely* unlikely and it's much more likely that we are looking at a book of Shakespeare, rather than the output of a randomizer. And this "likelihood" can be made mathematically precise.
posted by crazy_yeti at 9:06 AM on April 24, 2014 [1 favorite]


Phooky - cool! Have you got a market in mind?

crazy_yeti - ah, thanks. I've looked a bit at the literature over the years (I first got interested in what random actually was when I worked at a place that had tame mathematicians and spooky engineers, thirty years ago...) but haven't found that.

As for the difference between a signal and noise, you can in general tell a signal because it has structure. You might not be able to tell what the function of that structure is, or you might tell the structure but not what its contents are, but a signal is structure+data. If you try hard enough, you can disguise the structure pretty well (spread spectrum radio can live beneath the noise floor, for example), but if you're not bothered then anyone with a knowledge of information theory who intercepts it, can tell you've sent a signal. It's what SETI's looking for, after all.

An infinite series doesn't have to contain all series, neither does an infinite series of infinite series. The sequence 2,4,6,8... is infinite, but will never contain the sequence 3,5,7, and a random series can go on forever and never say "BILL SHAKESPEARE SUCKS HAHAHA". And anyway, to the best of our knowledge, infinity in real life is bounded pretty hard by the temporary nature of the universe and we're quite good at probability calculations. In general, from what I understand, physical RNGs don't just take an entropy source and clean it up, they explicitly cater for common-enough-to-count random-but-don't-look-it runs of state and don't let them out.
posted by Devonian at 9:12 AM on April 24, 2014


And this "likelihood" can be made mathematically precise.

I'm not convinced that it can. The chance of hamlet coming out is precisely the same as any other string of equal length.
posted by empath at 9:17 AM on April 24, 2014


An infinite series doesn't have to contain all possible series, neither does an infinite series of infinite series. The sequence 2,4,6,8... is infinite, but will never contain the sequence 3,5,7, and a random series can go on forever and never say "BILL SHAKESPEARE SUCKS HAHAHA".

A truly random infinite series, does, in fact, have to contain all possible series. An infinite random series that never produced the sequence "BILL SHAKESPEARE SUCKS HAHAHA" would, quite clearly, not be random. Bringing non-random infinite series into this is entirely unhelpful.
posted by yoink at 9:18 AM on April 24, 2014 [1 favorite]


An infinite series doesn't have to contain all series

Right, obviously - the simplest example to consider is the series 1,1,1,1,1, .... which is considered to contain zero 'information' because each new term doesn't tell you anything you don't already know! But a *random* sequence should be maximally surprising at each step. Another way to think about this is compression - a file containing 1,1,1,1 would compress by a large factor, while a file containing pure white noise should not be compressable at all. In fact, in purely information-theoretic terms, the white-noise sequence has the *highest* amount of information - although of course not information that's useful to me and you - nobody gets "informed". The day-to-day definition of information is a bit different than the information-theoretic concept.
posted by crazy_yeti at 9:22 AM on April 24, 2014


The chance of hamlet coming out is precisely the same as any other string of equal length.

You're right of course, but that doesn't mean that we can't measure the likelihood for Hamlet, or any other predetermined string. I could write down a sequence of one million random bits, and the likelihood of that sequence is the same as any other pre-specified sequence. But, we use Hamlet in the example because it's classic and recognizable. If you turned on the generator, and it produced the series of bits that I had written down, that would be just as amazing as Hamlet (if the Hamlet sequence were the same length), but nobody would recognize it - nobody would say: "Hey, that's the same sequence of bits that crazy_yeti wrote down ...." But in terms of bits, the information content, and the likelihood, is the same for both examples.
posted by crazy_yeti at 9:26 AM on April 24, 2014 [2 favorites]


In other words, it's more likely that Hamlet comes out than "some jumbled-up random-looking sequence", because there are billions of those jumbled-up random-looking sequences and only one Hamlet. But if I restrict to one specific JURLS, it's a different story.
posted by crazy_yeti at 9:28 AM on April 24, 2014


empath: "I don't think that kind of metaphysical randomness (ie: an uncaused effect, essentially) is actually possible."

Quantum mechanics doesn't really work without it. That's pretty strong evidence for its existence.
posted by Proofs and Refutations at 9:32 AM on April 24, 2014 [1 favorite]


Quantum mechanics doesn't really work without it. That's pretty strong evidence for its existence.

Unpredictability isn't quite the same thing as an uncaused effect. The wave function evolves deterministically. Theoretically, information is never lost and you can reverse it.
posted by empath at 9:37 AM on April 24, 2014


Yoink: sorry, yes. I knew I shouldn't have got involved in infinity. Randomness is quite hard enough by itself...

Crazy_yeti: One thing, though - if I'm sending someone a key for an OTP, then the white-noise file is actually maximally useful for the task. Framing... And was 'a bit different' a deliberate pun, or merely fortuitous? (A very clever friend who'd aced engineering at Cambridge once told me that he could usually pick his way through the bones of most academic papers on most subjects. Except information theory "where not only can I not understand the abstracts, I'm usually stumped by the titles...")
posted by Devonian at 9:39 AM on April 24, 2014


Want a random number? Here:

7

Another one:

6


feel free to use this random number generator, but please post attribution to this comment.
posted by signal at 9:40 AM on April 24, 2014


QM (and, in my opinion, free will) seems to be all about "uncaused effects". Absolutely nothing happens to "cause" radioactive decay, and, at least according to a lot of thinkers, the same is true of the Big Bang ... seems like these things "just happen", and the quest for causes is perhaps comforting but ultimately futile.
posted by crazy_yeti at 9:42 AM on April 24, 2014


Absolutely nothing happens to "cause" radioactive decay

Not true. You can't predict when it will happen, but beta decay, for example is caused by weak interactions. You can track all the particle interactions that lead to it.

QM is weird but it's not chaos. There are rules.
posted by empath at 9:47 AM on April 24, 2014


QM is weird but it's not chaos Nam. There are rules, Donny.
posted by yoink at 9:54 AM on April 24, 2014 [1 favorite]


QM is weird but it's not chaos. There are rules.

Sure there are rules, but the nature of these rules is, they only give you probabilities (more accurately, probability *amplitudes*). They will never, ever tell you *when* the beta decay is going to happen, or *why* it happened at that moment and not another.
posted by crazy_yeti at 9:55 AM on April 24, 2014


While we're off on this tangent, I'd like to mention a very interesting book I recently read: The Origin of the Idea of Chance in Children by Piaget (famous developmental psychologist) and Inhelder - using a simple apparatus, over a period of nearly 20 years, they engaged thousands of children of different ages in a dialogue, centered around a simple experimental apparatus consisting of a tilting table and 6 red and 6 white marbles - they started in an ordered state, and each time the table is tilted, they get more scrambled up. Children were asked if the balls would ever return to their starting state, and the way the answers vary with the children's age is fascinating (younger children don't think the initial ordered state will ever return, but around the age of 7 they begin to understand that, eventually, it will). Amazing how much insight these workers got out of such a simple setup. The concept of randomness is very subtle and develops slowly (later than the concepts of space and number), and I dare say even many adults don't really understand it!

One particularly charming response from a 5-year old girl (I'm paraphrasing since I don't have the book in front of me). When asked where the balls are going to go when the table is tipped, she says that "we don't know, but the balls know, because they have to do it" . ("Out of the mouth of babes!")
posted by crazy_yeti at 10:03 AM on April 24, 2014 [3 favorites]


It's kind of baffling to me American Scientist would write a whole article about randomness and scrupulously avoid the word "entropy". I mean the article basically describes entropy in the usual way, and props for going in to Kolmogorov complexity. But it's weird to not talk about the more fundamental concept first.

I once had the privilege of sitting with Gregory Chaitin for a few hours. At the time he was working on his constructive complexity approach, his work on Ω, the probability a random program will halt. I still have some 1995-era lisp code of his that builds up a computational approach to measuring randomness. I wonder if he ever developed that into a curriculum anyone else follows?
posted by Nelson at 10:04 AM on April 24, 2014 [2 favorites]


American Scientist is the new Scientific American.

(I mean that in a complimentary way--American Scientist is about the closest thing we have to the old, awesome Scientific American of the 70s and 80s.)
posted by jjwiseman at 11:20 AM on April 24, 2014 [3 favorites]


So you mean that American Scientist is the old Scientific American.

Or, rather, American Scientist is the new "old Scientific American".

This is getting confusing. But yeah, I agree.
posted by benito.strauss at 11:58 AM on April 24, 2014 [1 favorite]


It's kind of baffling to me American Scientist would write a whole article about randomness and scrupulously avoid the word "entropy". I mean the article basically describes entropy in the usual way, and props for going in to Kolmogorov complexity. But it's weird to not talk about the more fundamental concept first.

Kolmogorov complexity really is the more fundamental concept -- the entropy gives you how random a string is but you have to start with a fixed probability distribution, whereas the Kolmogorov complexity is a universal measure of randomness. Shame that it's uncomputable.
posted by maskd at 12:05 PM on April 24, 2014 [2 favorites]


I don't think that kind of metaphysical randomness (ie: an uncaused effect, essentially) is actually possible.

I think causality is something we made up and just started reasoning with and never really got around to defining properly.

Consider observing a set of traffic lights. You'll find that a green light is always followed by a yellow; a yellow by a red; and a red by a green. That's pretty much the Law of Traffic Lights. So, is it legitimate to infer that red lights cause green lights? If not, why not?
posted by flabdablet at 4:23 PM on April 25, 2014 [1 favorite]


The more pseudorandom bits the attacker sees, the more likely they can predict the sequence, especially if they know what generator you're using. This is the property that the NSA exploited with their Dual EC generator.

The entire point of a cryptographically secure pseudorandom number generator is to make that very prediction computationally infeasible without knowledge of the generator's internal state. The fact that the Dual EC PRNG has been shown to fail that test just makes it a poor CSPRNG; it doesn't mean you can't build a really good one.
posted by flabdablet at 4:37 PM on April 25, 2014


« Older #000000 and #FFFFFF   |   Net neutrality "all but dead" Newer »


This thread has been archived and is closed to new comments