How We Know
February 26, 2011 11:22 AM Subscribe

How We Know. An essay about information theory in the New York Review of Books by Freeman Dyson, building off a review of James Gleick's The Information.

Over the course of the article, Dyson discusses Gleick's examples of Kele and drum language, the original Napoleonic telegraph, and Wikipedia (and, by proxy, other hive-mind websites as well...) vis-a-vis Shannon's Law:

The consequences of the information flood are not all bad. One of the creative enterprises made possible by the flood is Wikipedia, started ten years ago by Jimmy Wales. Among my friends and acquaintances, everybody distrusts Wikipedia and everybody uses it. Distrust and productive use are not incompatible. Wikipedia is the ultimate open source repository of information. Everyone is free to read it and everyone is free to write it. It contains articles in 262 languages written by several million authors. The information that it contains is totally unreliable and surprisingly accurate. It is often unreliable because many of the authors are ignorant or careless. It is often accurate because the articles are edited and corrected by readers who are better informed than the authors.

Jimmy Wales hoped when he started Wikipedia that the combination of enthusiastic volunteer writers with open source information technology would cause a revolution in human access to knowledge. The rate of growth of Wikipedia exceeded his wildest dreams. Within ten years it has become the biggest storehouse of information on the planet and the noisiest battleground of conflicting opinions. It illustrates Shannon’s law of reliable communication. Shannon’s law says that accurate transmission of information is possible in a communication system with a high level of noise. Even in the noisiest system, errors can be reliably corrected and accurate information transmitted, provided that the transmission is sufficiently redundant. That is, in a nutshell, how Wikipedia works.

Gleick previously (he's also Richard Feynman's biographer)
Dyson previously, 2, 3, 4

posted by The Michael The (42 comments total) 63 users marked this as a favorite

Dyson + Gleick = Braingasm. Thanks for this.
posted by Splunge at 11:45 AM on February 26, 2011 [1 favorite]

"Jimmy Wales hoped when he started Wikipedia that the combination of enthusiastic volunteer writers with open source information technology would cause a revolution in human access to knowledge."

In the interest of historical accuracy...

A. Jimmy Wales did not "start Wikipedia." Jimmy Wales started Nupedia with several partners. Jimmy Wales and Larry Sanger started Wikipedia after Nupedia had failed.

B. Jimmy Wales did not hope "that the combination of enthusiastic volunteer writers with open source information technology would cause a revolution in human access to knowledge." Neither did Larry Sanger, who is really responsible for the moving Nupedia to the wiki format. Actually, they were both desperate because Nupedia, which was built on a more or less standard editorial model, was dying. The adoption of the wiki was a last ditch effort to get the thing going. Neither Wales nor Sanger thought it would work.

C. Jimmy Wales did not hope to cause a "revolution in human access to knowledge." When he co-founded Nupedia, he said he wanted to create a free on-line encyclopedia (written, it should be said, by experts under the guidance of an editor, that being Larry Sanger). Nupedia was to be a for-profit venture, though certainly a public-spirited one. Wales wanted to turn a profit (and, just to be clear, that's not a bad thing). Nothing changed when Wales and Sanger launched Wikipedia. It, too, was to be an encyclopedia largely written by experts. And it, too, was to be a for-profit business. Even after months of operation, Wales was still weighing whether to sell ads on the site or otherwise monetize it

I could cite a source, but only at the risk of self-linking. Suffice it to say you can find the whole story in an archived article in The Atlantic. (Mods, feel free to remove that last bit if you feel it violates policy.)
posted by MarshallPoe at 12:35 PM on February 26, 2011 [8 favorites]

I could cite a source, but only at the risk of self-linking. Suffice it to say you can find the whole story in an archived article in The Atlantic. (Mods, feel free to remove that last bit if you feel it violates policy.)

Self-linking is okay in comments if the information is relevant to the discussion.
posted by JHarris at 12:39 PM on February 26, 2011

(And you write for the Atlantic? Awesome!)
posted by JHarris at 12:39 PM on February 26, 2011

I've never actually sat down and read Shannon's original paper on the subject, but apparently Bell Labs has been kind enough to actually put the thing online, accessible from here, or you can just go directly to the pdf. Seems quite readable, if you already know the basics of probability and statistics (Markoff processes, normal distributions, etc.)
posted by TypographicalError at 12:41 PM on February 26, 2011

That was a very enjoyable article to read. I love Dyson's style here; simple almost grade school level statements put together step-by-step like a mathematical proof, leading to an informative or surprising conclusion. Maybe over-reaching but it reminds me of Einstein's Gedanken experiments; beautiful in their simplicity.

The bit about information increasing (or at least concentrating) over time in the universe is new to me. How is this explained in terms of entropy in a closed system?
posted by Mei's lost sandal at 12:49 PM on February 26, 2011 [1 favorite]

He writes pretty good for a guy who makes vacuum cleaners.
posted by doctor_negative at 12:49 PM on February 26, 2011

I highly doubt he writes for The Atlantic.
posted by yerfatma at 1:01 PM on February 26, 2011

Is this your Atlantic article?

The Hive -- "Can thousands of Wikipedians be wrong? How an attempt to build an online encyclopedia touched off history’s biggest experiment in collaborative knowledge."

posted by ericb at 1:01 PM on February 26, 2011

Jinx, yerfatma, you owe me a Coke!
posted by ericb at 1:02 PM on February 26, 2011

I highly doubt he writes for The Atlantic.

You're joking right? Marshall Poe is indeed the article's author?
posted by ericb at 1:03 PM on February 26, 2011

Mei's lost sandal: "How is this explained in terms of entropy in a closed system?"

Higher entropy = less organized = reduced redundancy = more information.

The problem is none of the information will be at all useful. But there will be, according to the technical definition of information, more and more and more of it.
posted by idiopath at 1:12 PM on February 26, 2011

Dyson: Information and order can continue to grow for billions of years in the future...

This statement seems at odds with your description of the relationship between entropy and information, idiopath. I'm confused about this.
posted by Mei's lost sandal at 1:20 PM on February 26, 2011

Perhaps a more intuitive approach to that question:

In the technical sense, "information" is loosely a synonym for "things we don't know yet". The less organized a system is (that is, the higher its entropy), the more work it takes to know all of it, because there are fewer generalizations that can be made. That does not guarantee that knowing anything about that system is useful.
posted by idiopath at 1:22 PM on February 26, 2011

Mei's lost sandal: "This statement seems at odds with your description of the relationship between entropy and information"

Yes, that statement seems pretty odd to me. Information is the inverse of compressability. You cannot increase order and simultaneously decrease compressability. The more order there is in a message, the lower the density of information (because it has higher redundancy, that is to say lower information per unit of message, that is more order).

Maybe he is talking about longer messages? Maybe he is wrong? I am as confused as you are at this point.
posted by idiopath at 1:28 PM on February 26, 2011

FYI Purdue University has a new Science of Information center funded by the National Science Foundation. A big challenge here is how useful Shannon's original definitions of information are, for example for domains such as biology or for things like Twitter and Facebook.

Mei's lost sandal's question about entropy and information sort of allude to this issue, which is the difference between very low-level and domain-agnostic notions of information (information as a way of reducing uncertainty) and our everyday uses of it (it takes the same number of bits to store the weather, but knowing today's and tomorrow's weather is more useful for me than knowing yesterday's weather).

Here's another description (you'll need access to Communications of the ACM to see the full article):

Szpankowski argues that information goes beyond those constraints. Another way to define information is that which increases understanding and that can be measured by whether it helps a recipient to accomplish a goal. At that point, semantic, temporal, and spatial factors come into play. If a person waiting for a train receives a message at 2 P.M. saying the train leaves at 1 P.M., the message contains essentially no information. In mobile networks, the value of information can change over the time it takes to transmit it because the person receiving it has moved; instructions to make a left turn are pointless, for example, if the driver has already passed the intersection. And there's no good way to measure how information evolves on the Web. "We cannot even understand how much information is transmitted on the Internet because we don't understand the temporal aspect of information," Szpankowski says.

posted by jasonhong at 1:29 PM on February 26, 2011 [2 favorites]

This was a good article, and I really liked the bit about the drum language, but the final paragraph is pretty nonsensical. When Shannon said that meaning is irrelevant he was talking about semantic content, not anything to do with the kind of meaning that is of importance to "the human condition." Dyson and Gleick should know better than to conflate the technical meaning of 'meaning' with the more loosey-goose sense of the word.
posted by painquale at 1:41 PM on February 26, 2011

I totally get that abstract definition of information. What isn't making sense is that Dyson seems to be saying that entropy of the universe is now thought to be decreasing over time (no Heat Death), which seems very radical.
posted by Mei's lost sandal at 1:43 PM on February 26, 2011

Also, I think either Dyson or Gleick is overstating the extent of an information explosion. There has always been a ton of natural information in the world... even before humans existed, there was information. A tree carries information about the fact that there are conditions suitable for life, for example, and a river carries information about there being downward slope. What is amazing is the explosion of representations and semantic content. The fact that there is now so much "meaning" in the world--the type of meaning that is irrelevant to information theory--that is the really exciting development.
posted by painquale at 1:48 PM on February 26, 2011 [1 favorite]

The bit about information increasing (or at least concentrating) over time in the universe is new to me. How is this explained in terms of entropy in a closed system?

I kinda figured it was a "local vs. global" question, like with evolution & sun vs. 2nd law of thermodynamics.
posted by ZenMasterThis at 2:03 PM on February 26, 2011

I enjoyed reading the article, but at the end it just seemed like someone smart riffing about things that interest him. There's no real point to it, or any decent argument. Any particular bit of it is indistinguishable from information, but taken as a whole it's just chaotic noise....

Wait a minute ....
posted by Joe in Australia at 2:35 PM on February 26, 2011 [1 favorite]

You cannot increase order and simultaneously decrease compressability.

I think this is more or less exactly wrong. Consider the message "ababababababab". How much can it be compressed? Well, that depends on how you understand that message - as a sequence of repeating letters, a sequence of repeating arbitrary shapes, as the wholistic arbitrary shape, or in some entirely different way. A sea slug won't be able to compress that information or even understand it as information

I think the entire attempt to separate information from meaning - the objective from the subjective - will ultimately end in failure. There is no such thing as information separate from the ability to meaningfully interpret and use that information, which requires some sort of strange-loop-causal-singularity magic woo-woo. That's the question Dyson maddeningly avoids in this article - how did all this information create itself out of nothing? The history of everything seemingly started with a universe governed by a number of arbitrary physical rules (which arose from nothing or are perhaps themselves bootstrapped patterns running on a lower-level substrate)

How did a series of physical laws bootstrap themselves into order, life and intelligence - the universe and planet we see today?
posted by crayz at 2:48 PM on February 26, 2011

The fact that the sigils a and b exist is a rule of the system that deciphers the sequence. Gzip does not know much else about what "a" or "b" are, but can compress that sequence well. Capitol I Information is the inverse of compressability. Meaning is important, but not to information theory.
posted by idiopath at 3:07 PM on February 26, 2011

Also: end in failure? The very possibility of our current interaction rests on technology guided by the premises of information theory.
posted by idiopath at 3:10 PM on February 26, 2011

Cite for information as the inverse of compressibility.

Specifically this sentence:
This use of the term "information" might be a bit misleading, as it depends upon the concept of compressibility. Informally, from the point of view of algorithmic information theory, the information content of a string is equivalent to the length of the shortest possible self-contained representation of that string.

In other words, the less compressible a string is, the more information it contains relative to its length.
posted by idiopath at 3:17 PM on February 26, 2011

I enjoyed reading the article, but at the end it just seemed like someone smart riffing about things that interest him. There's no real point to it, or any decent argument.

The context is the crypto background to the theory and the companion idea that science is an ongoing exploration of the mysteries, presumably what works best to understand something. As to its overall import, there's a subtext in the idea that the drums can be enjoyed in total ignorance of its telegraphic message, but such ignorance is the danger when among those who share the secret, precisely because they know that you don't.
posted by Brian B. at 3:22 PM on February 26, 2011

You're joking right?

Dude, c'mon: it's got his username right in the article.
posted by yerfatma at 3:30 PM on February 26, 2011

Crayz, they discuss that very 'abababab' example on the wiki page for Kolmogorov complexity. You're right that complexity can only be specified relative to a particular language or interpreter, and this is often ignored. (The same is true of computability, and people tend to forget that. Any string is compressible and any function is computable relative to some language.) I don't think this will have the strong metaphysical implications you want to draw, though. Any theory we have is going to have to be stated in some language or other, and it is perfectly objective that, relative to the computational language in which we think, 'abababab' is compressible. That string would still be compressible relative to that abstract language even if there were no humans, just as it was before there was life. So, according to the language of our best scientific theory, it's perfectly objective that the string is compressible. What makes that language so special? It's not really that special beyond the fact that it's the one that we happen to use and think in. But given that it does happen to be the one that we state our theory in, we can say that it's perfectly objective that the string is compressible.
posted by painquale at 4:39 PM on February 26, 2011

Perhaps a more intuitive approach to that question:

In the technical sense, "information" is loosely a synonym for "things we don't know yet".

Um, no, not unless by "intuitive" you meant "incorrect."

Information is data encoded in a carrier medium, without regard to the encoding, the medium, the sender, or the receiver.

You know, just the synopsis in the OP fills me with ennui, I'm not even going to read the article. If there's anything I'm more tired of than Gleick, Wales or Dyson, it's ridiculous attempts to use Shannon and Information Theory to frame Wales' hand-waving into mathematical laws.
posted by charlie don't surf at 6:04 PM on February 26, 2011 [1 favorite]

"Even in the noisiest system, errors can be reliably corrected and accurate information transmitted, provided that the transmission is sufficiently redundant. That is, in a nutshell, how Wikipedia works." (emphasis mine)

But Wikipedia is insufficiently redundant. To be useful, redundancy has to occur somewhere known - usually in one or other of the domains e.g. frequency, time, space - and that's mostly determined by the method of transmission. For example, OFDM, COFDM, and others incorporate a lot of their redundancy in the frequency domain; if one frequency is corrupted, the receiver knows where the information is available on another.

OK, so that's technical, but how does that apply to something like Wikipedia? At any given point in time a piece of information in Wikipedia may be wrong (corrupted) - but how do you know?

Well, Wikipedia's redundancy is largely in the temporal domain. A central tenet of Wikipedia is that someone - hopefully, someone knowledgeable - sees an error and corrects it; errors get corrected over time. So, great, you know the redundancy will (hopefully) be available in time - but when? There's no fixed schedule - it may be corrected tomorrow, maybe next week, maybe in a year's time, or maybe not until after you're dead and buried. Unless you continuously monitor for it and compare to the originally sampled information, chances are you'll miss it and be left with corrupted information.

Real encyclopaedias deal with that by (a) having a fixed process to ensure the information prior to publishing is as correct as possible, (b) periodically reviewing the published information and having a defined process to vet corrections, and (c) re-publishing to a pre-advised schedule (albeit non-fixed - you can't say 'redundancy is coming on 21/12/2012', but you can say 'redundancy is coming with the next edition, expected in late 2012'). Wikipedia doesn't do (a) or (c) at all, and only does (b) in the most haphazard of ways.

(To be fair, Wikipedia also incorporates redundancy in the spatial domain - external references are spatially-disparate sources in defined locations where the information can be independently confirmed (checksummed). But then, so do traditional encylopaedias…)

So, in technical terms, Wikipedia is an extremely poor example of Shannon's Theorem - the initial data cannot be guaranteed to be correct; there is no consistently defined method for verification of transmitted data; and redundancy is ad-hoc and non-deterministic.

Which is not to particularly criticise Wikipedia on these points - they know this, and that's why they continuously say that Wikipedia is not a reference. My only criticism of them is that they do claim Wikipedia is an encyclopaedia, when it doesn't meet the basic information-transmission standards of a traditional encyclopaedia.
posted by Pinback at 6:17 PM on February 26, 2011 [1 favorite]

Wikipedia has redundancy in the versionate dimension (which is really just an instance of temporal, I think). That is, there's redundancy between versions of a page. It's true that most users don't know how to access the other versions & fewer still will actually bother, but it is there if you look for it.
posted by scalefree at 7:02 PM on February 26, 2011

As do encyclopaedias ;-) - yes, versionate is just temporal, and admittedly Wikipedia's is easier to access. Good point though; I was thinking more along the lines of temporal redundancy as typically implemented in error correction, where the checksum / correction bits come after the data (although, outside of computational ease, there's no real reason they can't come before).

However, I still think that falls under 'and only does (b) in the most haphazard of ways', although it's less haphazard than I had considered when I wrote it.
posted by Pinback at 7:47 PM on February 26, 2011

But Wikipedia is insufficiently redundant.

Human knowledge is insufficiently redundant. Can you read Etruscan? Know how to make Damascus Steel? Hell, I recall reading recently that the DOE is frantic because nobody knows how to make an essential chemical for nuclear weapons. It hasn't been made in decades, the stockpile ran out, and all the guys who knew how to make it are dead.
posted by charlie don't surf at 9:54 PM on February 26, 2011

charlie don't surf: "Um, no, not unless by "intuitive" you meant "incorrect."

Information is data encoded in a carrier medium"

Any data can fruitfully be treated as information that has been encoded. As a remarkable example with information theory and DSP you can effectively explain problems in mathematics like Benford's Law. The laws of information theory apply here, even though banford's law is a natural property of the data, not a message encoded by a human.
posted by idiopath at 10:07 PM on February 26, 2011 [1 favorite]

OK, I obviously meant "any data can fruitfully be treated as encoded, and therefore as potential information". And I am realizing you may be responding to my admitted sloppiness with terms here, it was a while ago that I read "The Mathematical Theory of Communication" though I still use many things I learned there I don't often find myself having to articulate any of it precisely.
posted by idiopath at 10:13 PM on February 26, 2011

This article put me in the very uncomfortable position of deciding that either Freemon Dyson is an idiot, that the editors of the NY Review of Books are idiots and somehow managed to make him sound like one, or that I've had too much to drink and thus am completing hallucinating him trying to make a really stupid point about steaks, the expanding universe, the Second Law of Thermodynamics, and information theory.*

I'm not sure which conclusion would be more painful.

* WTF: "The belief in a heat death was based on an idea that I call the cooking rule. The cooking rule says that a piece of steak gets warmer when we put it on a hot grill. [...] We now know that the cooking rule is not true for objects of astronomical size, for which gravitation is the dominant form of energy. The sun is a familiar example. As the sun loses energy by radiation, it becomes hotter and not cooler." I'm pretty sure this is crazy, and because of this and not the gin. But hey, he's Freeman Dyson and I'm not.
posted by Kadin2048 at 11:23 AM on February 27, 2011

The bit about the drums and the telegraph is interesting and worth reading. The rest of the piece is Mickey D science writing.
posted by storybored at 12:16 PM on February 27, 2011

My favorite part:

"Science is the sum total of a great multitude of mysteries. It is an unending argument between a great multitude of voices. It resembles Wikipedia much more than it resembles the Encyclopaedia Britannica."

Somewhere Paul Feyerabend is going "Ja!" Because of him, I tend to trust people with crutches more than others.
posted by Twang at 2:01 PM on February 27, 2011

If a person waiting for a train receives a message at 2 P.M. saying the train leaves at 1 P.M., the message contains essentially no information

Except of course if that person is planning on taking the same train tomorrow.
posted by The Emperor of Ice Cream at 5:53 PM on February 27, 2011

"Science is the sum total of a great multitude of mysteries. It is an unending argument between a great multitude of voices. It resembles Wikipedia much more than it resembles the Encyclopaedia Britannica."

So why are Dyson's misleading and wrong pronouncements about climate change so popular amongst denialists, when there is a large and well-supported body of knowledge to the contrary? Is this a problem that can be studied using information theory?
posted by sneebler at 6:27 PM on February 28, 2011 [1 favorite]

While I'm not quite sure how that's connected to the quote ... sorry .. you're right to point out his climate remarks. So far as I know, it's the biggest derail Dyson's ever had, and that's his lookout. His efforts to support that position in a video interview were pathetic - understandable considering his age.

But, then, the man's spent a lifetime thinking outside the box, and suffered a lot less disasters than most in that line of work. The efforts to design nuclear-powered spaceships were nearly as dangerously wrong-headed, but that's more 20-20 with decades of hindsight.

The quote itself, though, is a really useful insight into the nature of science (particularly its history) and deserves attention. Science as a human activity is far from the rigorously objective exercise it's often painted as... while the results tend toward the unassailable, the process is rather ordinarily messy. (See Feyerabend)
posted by Twang at 2:45 PM on March 1, 2011

Kadin2048: "* WTF: "The belief in a heat death was based on an idea that I call the cooking rule. The cooking rule says that a piece of steak gets warmer when we put it on a hot grill. [...] We now know that the cooking rule is not true for objects of astronomical size, for which gravitation is the dominant form of energy. The sun is a familiar example. As the sun loses energy by radiation, it becomes hotter and not cooler." I'm pretty sure this is crazy, and because of this and not the gin. But hey, he's Freeman Dyson and I'm not."

Yeah, I'm not following that at all. Isn't "the heat death of the universe" the ultimate result of increasing entropy?
posted by Chrysostom at 8:40 AM on March 2, 2011

« Older I am too far from Africa, I can't turn back. Next... | College Scholarships Available. Only White Men... Newer »

This thread has been archived and is closed to new comments

MetaFilter

How We Know
February 26, 2011 11:22 AM Subscribe

Tags

Share

How We Know February 26, 2011 11:22 AM Subscribe

Tags

Share

How We Know
February 26, 2011 11:22 AM Subscribe