Skip
# Never tell me the odds.

Believe me - I'm trying to do that. But I can't seem to glean the "basic ideas" out of the detail. Anyone willing to connect the dots in a "basic" way?

posted by buzzv at 2:15 PM on December 5, 2010

The standard text does a pretty good job.

The basic idea is that once you have infinitely many possible outcomes, for example picking a random number between 0 and 1, then naively assigning a probability to each possibility doesn't work - if you assign zero, then no result is possible; if you assign a non-zero probability then you can immediately find a set of outcomes with probabilities > 1.

So instead we have to assign probabilities to events (defined as sets of outcomes) rather than just to atomic outcomes, and we give up the idea that combination of probabilities works by addition when we combine uncountably many outcomes. For technical reasons it isn't possible to consistently assign probabilities to every conceivable set of outcomes, so we restrict the events under consideration to a well-behaved class (known as a sigma algebra).

posted by Orang Kota at 3:24 PM on December 5, 2010 [4 favorites]

Let's imagine a different scenario first, and forget that it has anything to do with probability.

Suppose we have two rocks, each a same sized cube, uniform density, but different weights. To get the weight of either rock, you multiply their respective densities against the volume. So since the rocks are of the same volume, the rock with a larger weight has a higher density.

Now let's imagine a rock with a variable density, and that we have an acurrate density function describing it - that is, at every point in the rock, we have a function which tells us the infinitesimal weight around that point. With this rock, we can get its total weight, or the weight of any subregion, by integrating over the region of interest - that is, summing all the infinitesimal weights.

Finally, weight is just that property which corresponds to integrating areas without considering any other function. If we wanted to consider a property besides weight - something bound up with the density like maybe heat or something, then when we have another function describing the distribution of heat over the volume, we can integrate over the heat function,

So in general, we have some description of the density pattern of some thing, and we want to integrate a function over some region of that thing, weighted by that density.

Probability is nothing more then the study of weighted functions. Normally when you integrate a function you assume every value of x is equally important when integrating f(x) over some region. But in probability theory, some regions of f(x) are more probable. The probability of these regions are determined by the probability density function (pdf). We can calculate the probability of these regions by integrating over them with no extra function (just the basic weight from the rock example). On the other hand, if we want to calculate, say, the average output of a function given the probability of its inputs, then we integrate the given function with respect to the pdf. If the function has large values in highly probable areas, then its expected value will be high. If it only has low values in high probability areas, then its expected value will be lower. If we want to know the expected value given that x is drawn from only a subregion of the entire space, hey we can handle that too.

There's just one problem. Some regions that we might integrate over are extremely pathological and cannot be assigned a consistent weight. What measure theory is is the most sophisticated theory of non pathological set sizes. It separates those regions that can fit in a nice algebraic structure - e.g. if we have the weights of two disjoint nice regions we can simply add them and get the weight of another nice region - from those which can't. And the nice ones can then support all sorts of other pleasent mathematics.

The technical details of this are hairy enough, and the way I've described it is a bit backwards (density functions are not really the primitive objects that I've made them out to be) but I believe this should give you the gist. Probability theory is nothing more than the study of weighted functions, and measure theory lays out down the most clean and general framework for what kind weighting and what kind of functions we can consistently talk about.

posted by Alex404 at 4:22 PM on December 5, 2010

This is -- to say the least -- a controversial claim.

Say what you will about the measure-theoretic notion of probability, but it says it's going to do something and then it does it. It does not, as far as I know, produce any paradoxes. (The existence of Banach-Tarski sets is no more paradoxical than the existence of a square root of negative one.)

You are right, of course, that the measure-theoretic formulation shines light on the "cracks" in the naive interpretation of probability. So much the worse for the naive interpretation.

posted by escabeche at 5:14 PM on December 5, 2010 [1 favorite]

Certainly true, though I do take some comfort from the evident difficulty of putting together a quantum mechanics in which "God does not play dice with the world."

I brought up Tarski's paradox because I've often seen arguments in which probabilities are identified with lengths, areas, and volumes-- an approach ratified by the standard measure-theoretic definitions of these quantities if you accept a measure-theoretic basis for probability-- but when it comes to volumes, that could lead to some problems if part of your sample space is identified with the unit sphere, and the events of interest with subsets of that sphere, and the probabilities of those events in turn with the volume of those subsets, because Tarski's tells you you can decompose your sphere into five not so easy pieces and reassemble those into a sphere of twice the volume.

And that would directly contradict the property that the probability of an event is always the sum of the probabilities of its constituent sub-events (not to mention that this sum can never exceed 1), because that sum could be shrunk and expanded at will with a Tarski decomposition.

posted by jamjam at 7:18 PM on December 5, 2010

Post

# Never tell me the odds.

December 5, 2010 11:15 AM Subscribe

Measure-theoretic probability: Why it should be learnt and how to get started. The clickable chart of distribution relationships. Just two of the interesting and informative probability resources I've learned about, along with countless other tidbits of information, from statistician John D. Cook's blog and his probability fact-of-the-day Twitter feed ProbFact. John also has daily tip and fact Twitter feeds for Windows keyboard shortcuts, regular expressions, TeX and LaTeX, algebra and number theory, topology and geometry, real and complex analysis, and beginning tomorrow, computer science and statistics.

Measure theory was one of the more painful stat graduate classes. Makes stochastic processes easier, but my god the hurting.

posted by a robot made out of meat at 11:33 AM on December 5, 2010

posted by a robot made out of meat at 11:33 AM on December 5, 2010

Holy shit, that's awesome. This is like what my blog would be if I was a thousand times smarter, worked a thousand times harder, and had a blog.

posted by kmz at 11:35 AM on December 5, 2010 [4 favorites]

posted by kmz at 11:35 AM on December 5, 2010 [4 favorites]

I thought the main article was a really clear introduction. I actually want to read some of that stuff again...and pay attention to the tips given at the end (create your own links between measure-theoretic and classical probability, keep in mind that the basic ideas are straightforward).

I was not impressed by the probability fact-of-the-day, however...mostly just boxed-in statements from a text book.

posted by klausman at 11:44 AM on December 5, 2010

I was not impressed by the probability fact-of-the-day, however...mostly just boxed-in statements from a text book.

posted by klausman at 11:44 AM on December 5, 2010

The overwhelming majority of the LaTeX tips should be known to any LaTeX user. Put math in math mode? Seriously? Others, I wouldn't expect everyone to know, like how to type ø, not because they're good tips, but because they're only useful to relatively few people, who probably already know. I can't actually parse his "Beamer exists" post. I guess he was using PowerPoint for presentations two years after I first used Beamer, having managed to ignore things like the slides document class and Prosper, which I'm pretty sure predates Beamer.

posted by hoyland at 1:08 PM on December 5, 2010

posted by hoyland at 1:08 PM on December 5, 2010

*When learning measure-theoretic probability: * Keep in mind that the basic ideas are straightforward; don’t let the technical detail obscure the basic ideas.*

Believe me - I'm trying to do that. But I can't seem to glean the "basic ideas" out of the detail. Anyone willing to connect the dots in a "basic" way?

posted by buzzv at 2:15 PM on December 5, 2010

*Anyone willing to connect the dots in a "basic" way?*

The standard text does a pretty good job.

The basic idea is that once you have infinitely many possible outcomes, for example picking a random number between 0 and 1, then naively assigning a probability to each possibility doesn't work - if you assign zero, then no result is possible; if you assign a non-zero probability then you can immediately find a set of outcomes with probabilities > 1.

So instead we have to assign probabilities to events (defined as sets of outcomes) rather than just to atomic outcomes, and we give up the idea that combination of probabilities works by addition when we combine uncountably many outcomes. For technical reasons it isn't possible to consistently assign probabilities to every conceivable set of outcomes, so we restrict the events under consideration to a well-behaved class (known as a sigma algebra).

posted by Orang Kota at 3:24 PM on December 5, 2010 [4 favorites]

*Anyone willing to connect the dots in a "basic" way?*

Let's imagine a different scenario first, and forget that it has anything to do with probability.

Suppose we have two rocks, each a same sized cube, uniform density, but different weights. To get the weight of either rock, you multiply their respective densities against the volume. So since the rocks are of the same volume, the rock with a larger weight has a higher density.

Now let's imagine a rock with a variable density, and that we have an acurrate density function describing it - that is, at every point in the rock, we have a function which tells us the infinitesimal weight around that point. With this rock, we can get its total weight, or the weight of any subregion, by integrating over the region of interest - that is, summing all the infinitesimal weights.

Finally, weight is just that property which corresponds to integrating areas without considering any other function. If we wanted to consider a property besides weight - something bound up with the density like maybe heat or something, then when we have another function describing the distribution of heat over the volume, we can integrate over the heat function,

*while weighting it by the density.*More density around some particular point means more heat bound up in that region. (Note: I am not a physicist. Hopefully this example made sense.)

So in general, we have some description of the density pattern of some thing, and we want to integrate a function over some region of that thing, weighted by that density.

Probability is nothing more then the study of weighted functions. Normally when you integrate a function you assume every value of x is equally important when integrating f(x) over some region. But in probability theory, some regions of f(x) are more probable. The probability of these regions are determined by the probability density function (pdf). We can calculate the probability of these regions by integrating over them with no extra function (just the basic weight from the rock example). On the other hand, if we want to calculate, say, the average output of a function given the probability of its inputs, then we integrate the given function with respect to the pdf. If the function has large values in highly probable areas, then its expected value will be high. If it only has low values in high probability areas, then its expected value will be lower. If we want to know the expected value given that x is drawn from only a subregion of the entire space, hey we can handle that too.

There's just one problem. Some regions that we might integrate over are extremely pathological and cannot be assigned a consistent weight. What measure theory is is the most sophisticated theory of non pathological set sizes. It separates those regions that can fit in a nice algebraic structure - e.g. if we have the weights of two disjoint nice regions we can simply add them and get the weight of another nice region - from those which can't. And the nice ones can then support all sorts of other pleasent mathematics.

The technical details of this are hairy enough, and the way I've described it is a bit backwards (density functions are not really the primitive objects that I've made them out to be) but I believe this should give you the gist. Probability theory is nothing more than the study of weighted functions, and measure theory lays out down the most clean and general framework for what kind weighting and what kind of functions we can consistently talk about.

posted by Alex404 at 4:22 PM on December 5, 2010

(oh, and: The one limit on measures in probability theory is that the weight of the whole region - i.e. the probability, is 1.)

posted by Alex404 at 4:29 PM on December 5, 2010

posted by Alex404 at 4:29 PM on December 5, 2010

Jacking up naive probability and sliding a measure-theoretic foundation under it does seem to offer great gains in power, generality, connection to other branches of math and various modern mathematical conveniences. However, the process also seems to at least put some cracks in the old edifice that weren't so obvious before, if in fact they were there at all.

Manton does bring up this problem in these paragraphs and the ones preceeding:

only to (very feebly!) dismiss them without addressing them at all-- as I have emphasized.

I object to this because, as I recall (shakily) the collection of indescribable sets alluded to in 1. is huge, with a cardinality at least equal to that of the continuum, and the sets alluded to in 2. are in fact the non-measurable sets, and those carry with them an immense tail of paradox (Tarski's, among others).

Along with the natural numbers, I think probability

In the case of the natural numbers this is an easy argument to make because the limitative theorems of Godel which are so fundamental to set theory use the properties of the natural numbers fairly extensively. Therefore the natural numbers are

In the case of probability, I think set theory fails because probability is a feature of reality, like cause and effect, for example, and only secondarily a construction in math.

Great post, though; I'll be following this blog.

posted by jamjam at 4:45 PM on December 5, 2010

Manton does bring up this problem in these paragraphs and the ones preceeding:

*Superficially, this is ok. However, it does not work for two reasons.*

1. How can we define the value of P(A) for an arbitrary subset A of \Omega when for some sets, it is not even possible to write down a description of them? (That is, there are some subsets of the interval [2,3] which we cannot even write down, so how can we even write down a definition of P which tells us what value it takes on such indescribable sets?)

2. It can be proved that there exist “bad” sets for which it is impossible to assign a probability to them in any consistent way.

It is very tempting to elaborate on the second point above. However, my experience is that doing so distracts too much attention from the original aim of understanding measure-theoretic probability.

Therefore, ultimately we define P as a function from F to [0,1] where F is a set of subsets of \Omega which we think of as (some of) the “nice” subsets of \Omega, that is, subsets of \Omega to which we can and want to assign probabilities of occurrence. Roughly speaking, F should be just large enough to be useful, and no larger.1. How can we define the value of P(A) for an arbitrary subset A of \Omega when for some sets, it is not even possible to write down a description of them? (That is, there are some subsets of the interval [2,3] which we cannot even write down, so how can we even write down a definition of P which tells us what value it takes on such indescribable sets?)

2. It can be proved that there exist “bad” sets for which it is impossible to assign a probability to them in any consistent way.

It is very tempting to elaborate on the second point above. However, my experience is that doing so distracts too much attention from the original aim of understanding measure-theoretic probability.

**It is therefore better to think that even if we could assign a probability to every possible subset, we do not want to because it would cause unnecessary trouble and complication; surely, provided we have enough interesting subsets to work with, that is enough?**Therefore, ultimately we define P as a function from F to [0,1] where F is a set of subsets of \Omega which we think of as (some of) the “nice” subsets of \Omega, that is, subsets of \Omega to which we can and want to assign probabilities of occurrence. Roughly speaking, F should be just large enough to be useful, and no larger.

only to (very feebly!) dismiss them without addressing them at all-- as I have emphasized.

I object to this because, as I recall (shakily) the collection of indescribable sets alluded to in 1. is huge, with a cardinality at least equal to that of the continuum, and the sets alluded to in 2. are in fact the non-measurable sets, and those carry with them an immense tail of paradox (Tarski's, among others).

Along with the natural numbers, I think probability

*loses*credibility by being formulated in set-theoretic terms.In the case of the natural numbers this is an easy argument to make because the limitative theorems of Godel which are so fundamental to set theory use the properties of the natural numbers fairly extensively. Therefore the natural numbers are

*prior*to set theory.In the case of probability, I think set theory fails because probability is a feature of reality, like cause and effect, for example, and only secondarily a construction in math.

Great post, though; I'll be following this blog.

posted by jamjam at 4:45 PM on December 5, 2010

*probability is a feature of reality*

This is -- to say the least -- a controversial claim.

Say what you will about the measure-theoretic notion of probability, but it says it's going to do something and then it does it. It does not, as far as I know, produce any paradoxes. (The existence of Banach-Tarski sets is no more paradoxical than the existence of a square root of negative one.)

You are right, of course, that the measure-theoretic formulation shines light on the "cracks" in the naive interpretation of probability. So much the worse for the naive interpretation.

posted by escabeche at 5:14 PM on December 5, 2010 [1 favorite]

*probability is a feature of reality*

**This is -- to say the least -- a controversial claim.**

Certainly true, though I do take some comfort from the evident difficulty of putting together a quantum mechanics in which "God does not play dice with the world."

*Say what you will about the measure-theoretic notion of probability, but it says it's going to do something and then it does it. It does not, as far as I know, produce any paradoxes. (The existence of Banach-Tarski sets is no more paradoxical than the existence of a square root of negative one.)*

I brought up Tarski's paradox because I've often seen arguments in which probabilities are identified with lengths, areas, and volumes-- an approach ratified by the standard measure-theoretic definitions of these quantities if you accept a measure-theoretic basis for probability-- but when it comes to volumes, that could lead to some problems if part of your sample space is identified with the unit sphere, and the events of interest with subsets of that sphere, and the probabilities of those events in turn with the volume of those subsets, because Tarski's tells you you can decompose your sphere into five not so easy pieces and reassemble those into a sphere of twice the volume.

And that would directly contradict the property that the probability of an event is always the sum of the probabilities of its constituent sub-events (not to mention that this sum can never exceed 1), because that sum could be shrunk and expanded at will with a Tarski decomposition.

posted by jamjam at 7:18 PM on December 5, 2010

Ah, measure-theoretic probability. The probabilist's way of solving the problem of there being too few nontrivial problems to solve.

posted by Earl the Polliwog at 10:25 PM on December 5, 2010

posted by Earl the Polliwog at 10:25 PM on December 5, 2010

Somewhat interestingly, the collections of (Lebesgue)-measurable subsets of the real numbers and that of the non-measurable subsets both have the same cardinality, namely 2^c, where c is the cardinality of the continuum.

You can see this by concentrating on the interval [0,2]. Suppose V is a non-measurable subset of [1,2] -- any set of finite measure contains a non-measurable subset-- and let CS be the standard Cantor set, which is a subset of [0,1].

Now CS is a set of outer measure zero (with uncountably many elements), so it's automatically measurable, as are all of its subsets. There are 2^c such subsets. Taking the union of any such subset of CS with V gives us a different non-measurable subset of [0,2]. [[Note that the union of a measurable set and a disjoint non-measurable set has to be non-measurable: this follows from the fact that A, B measurable => their difference A\B is also measurable.]]

So both collections are "as big as they can be" (as the set of all subsets of R also has cardinality 2^c). There's still an oddly seductive ill-formed question along the lines of "is a random subset of R likely to be measurable or not?", but I don't think there is any sane way to make sense of that...

posted by pjm at 5:32 AM on December 6, 2010 [1 favorite]

You can see this by concentrating on the interval [0,2]. Suppose V is a non-measurable subset of [1,2] -- any set of finite measure contains a non-measurable subset-- and let CS be the standard Cantor set, which is a subset of [0,1].

Now CS is a set of outer measure zero (with uncountably many elements), so it's automatically measurable, as are all of its subsets. There are 2^c such subsets. Taking the union of any such subset of CS with V gives us a different non-measurable subset of [0,2]. [[Note that the union of a measurable set and a disjoint non-measurable set has to be non-measurable: this follows from the fact that A, B measurable => their difference A\B is also measurable.]]

So both collections are "as big as they can be" (as the set of all subsets of R also has cardinality 2^c). There's still an oddly seductive ill-formed question along the lines of "is a random subset of R likely to be measurable or not?", but I don't think there is any sane way to make sense of that...

posted by pjm at 5:32 AM on December 6, 2010 [1 favorite]

It's worth noting that even when dealing with finite probability, measure theory has a natural role to play. Algebras of events reflect "information". For example, given a random variable, the pullback algebra gives you a notion of distinguishable elements in the state space, and measurability with respect to this means that you are not magically looking at data outside of that offered by the random variable. As a stochastic process evolves in time, you get refinements of this algebra -- this is the filtration associated with it -- and this combinatorial thing is a qualitative measure of the "information flow" associated with the process. For all the wackness that crops up when you introduce limits into the theory ( sigma-algebras, countable additivity, etc), there are still a fair number of measure theoretic concerns that are not mere technical details -- they are in fact core to the problem at hand.

posted by aconcagua at 5:59 PM on December 6, 2010

posted by aconcagua at 5:59 PM on December 6, 2010

jamjam : Any set theory satisfying the axiom of choice contains non-measurable sets, even V=L. Afaik, there is only axiom contradicting choice that any applied mathematician might ever even consider using, which is the axiom of determinacy (AD).

A few consequences of AD are all subsets of the real numbers are measurable, have the property of Baire, and every uncountable set of reals has the same cardinality of the continuum. AD and AD-like axioms basically say the uncountable world behaves very much like the countable world. Indeed, they are usually equiconsistent with choice plus some large cardinals, which incidentally makes it impossible to prove their consistency relative to ZF. Just fyi, large cardinals are sets so much larger than any set below them that they exhibit some subtle property one usually only expects of the integers relative to the finite sets.

To me, the deepest set theoretic question with significant applications to "real" mathematics appears to be exploring how much determinacy is consistent with choice, and desirable for mathematics, or equivalently deciding what large cardinals we wish to believe in. And all this is intimately related to why set theorists feel the continuum hypothesis is false.

posted by jeffburdges at 3:52 AM on December 8, 2010 [1 favorite]

A few consequences of AD are all subsets of the real numbers are measurable, have the property of Baire, and every uncountable set of reals has the same cardinality of the continuum. AD and AD-like axioms basically say the uncountable world behaves very much like the countable world. Indeed, they are usually equiconsistent with choice plus some large cardinals, which incidentally makes it impossible to prove their consistency relative to ZF. Just fyi, large cardinals are sets so much larger than any set below them that they exhibit some subtle property one usually only expects of the integers relative to the finite sets.

To me, the deepest set theoretic question with significant applications to "real" mathematics appears to be exploring how much determinacy is consistent with choice, and desirable for mathematics, or equivalently deciding what large cardinals we wish to believe in. And all this is intimately related to why set theorists feel the continuum hypothesis is false.

posted by jeffburdges at 3:52 AM on December 8, 2010 [1 favorite]

« Older Brush your teeth...never? | Brandon Draws Comics, Draw Brandon Draw Newer »

This thread has been archived and is closed to new comments

posted by stratastar at 11:27 AM on December 5, 2010