Win Steve Landsburg's Money
January 1, 2011 2:00 PM   Subscribe

Google is known to ask the following question in job interviews: In a country in which people only want boys every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. What is the proportion of boys to girls in the country? Think you know the answer? If so, Steve Landsburg may be willing to bet you up to $5000.

Google's official answer is 50% (scroll to "In a country in which people only want boy..."). But the highest rated answer on MathOverflow, from Douglas Zare, asserts that the correct answer is, on average, less- approximately (1/2) - (1/4k) for a population of k families.
When physicist Lubos Motl asserted that 50% was indeed the correct answer, economist Steve Landsburg publicly offered to bet Motl $15,000 that computer simulations would show that the lower answer was indeed correct (specifically, across 3000 simulations of a ten family country), and opened the bet up to any other interested takers as well.
posted by gsteff (277 comments total) 52 users marked this as a favorite
 
Landsburg's latest salvo.
posted by gsteff at 2:03 PM on January 1, 2011


I find this pretty interesting. It took a bit to get my head around, but it does show something interesting about how we think about problems. I vote not just for the math weenies.
posted by meinvt at 2:21 PM on January 1, 2011


Exactly? Impossible to say, pending the next census.

Approximatley? 50%, obviously.

Oh, the exact question? What is the proportion of boys to girls in the country? 1/1, but y'know, these things can never be exact.
posted by idiomatika at 2:21 PM on January 1, 2011 [1 favorite]


Yeah, there are tons of things on the front page that may be too "art weenie" or "politics weenie" or "history weenie" or "this is just plain too obscure" for some. That's the fun of Mefi, right?

I for one find stuff like this fun. I don't have time to read this all yet, but my intuitive answer was definitely 50%.

This reminds me of the Monty Hall problem.
posted by Defenestrator at 2:22 PM on January 1, 2011 [2 favorites]


Poor Landsburg is both wrong and very arrogant.
posted by esprit de l'escalier at 2:33 PM on January 1, 2011 [3 favorites]


The correct answer is, "Whatever Google says it is, because it's cool to work at Google; they offer free massages."
posted by Cool Papa Bell at 2:35 PM on January 1, 2011 [14 favorites]


Google is known to ask the following question in job interviews ...

Ah, memories of:
Apple is known to ask the following question in job interviews ...

Microsoft is known to ask the following question in job interviews ...

I.B.M. is known to ask the following question in job interviews ...

McKinsey & Company is known to ask the following question in job interviews ...
posted by ericb at 2:36 PM on January 1, 2011 [13 favorites]


Why do people read "What is the proportion of boys to girls in the country?" and interpret it as "What is the expected percentage of girls in any random family?". Those are different questions.

Yeah, I didn't understand that, either. I figured it was over my head. Also, wouldn't there be more girls than boys? I feel dumb.
posted by (Arsenio) Hall and (Warren) Oates at 2:36 PM on January 1, 2011


Lubos's response to Landsburg's latest salvo includes this:
I didn’t use this simple math problem to dismiss candidates for jobs in my company even though I fully understand why Google did. It’s a very good task to find and throw away the people who will get distracted from common sense and from simple, fundamental math arguments by noise and who will immediately start to think about complicated yet irrelevant technicalities – which is exactly what you did which is why you couldn’t work at Google but you instead work in the Academia that often supports this contrived way of thinking that is detached from the reality and everything important in it.

I find this humorous. Despite their IQ tests, Google's employees' IQs drop to mediocre after they are hired, at least with respect to their employer's projects. Any imbecile outside Google could have told you that putting everyone's Gmail contacts into Buzz would be a very bad idea. But could any genius within Google say this? They'd probably lose their jobs for going against the company. So they become idiots for job security. I've seen several things come out of Google like this that represent a definite lack-of-thinking.
posted by eye of newt at 2:36 PM on January 1, 2011 [11 favorites]


Here's my implementation of the simulation, according to Landsburg's rules (written for clarity, not efficiency). It's giving me just about 47.5% for 3000 simulations.
posted by gsteff at 2:37 PM on January 1, 2011 [17 favorites]


There's not enough data in the question to answer it properly. You'd also need to know reproductive age ranges, cultural reproduction policy, and broad spectrum mortality information.
posted by felix at 2:38 PM on January 1, 2011 [12 favorites]


This reminds me of the Monty Hall problem.

That's the probability puzzle that both Einstein and Erdos got wrong, isn't it?

I'll make a conjecture before I read it. Maybe there's a time interval involved, concerning how long it takes to conceive and gestate. For a given block of time the odds are 50/50, and are unknown during the conception phase. When a boy is born, the next conception phase is always canceled, denying girls the statistical equality they would otherwise have got in that cycle, advantage boys.
posted by StickyCarpet at 2:40 PM on January 1, 2011


I find this humorous. Despite their IQ tests, Google's employees' IQs drop to mediocre after they are hired, at least with respect to their employer's projects. Any imbecile outside Google could have told you that putting everyone's Gmail contacts into Buzz would be a very bad idea. But could any genius within Google say this? They'd probably lose their jobs for going against the company. So they become idiots for job security. I've seen several things come out of Google like this that represent a definite lack-of-thinking.

C+
posted by eugenen at 2:40 PM on January 1, 2011 [3 favorites]


I guessed 50%, where is my job offer google. I know you are reading this so get off your ass already.
posted by Ad hominem at 2:41 PM on January 1, 2011 [9 favorites]


I cannot even begin to follow these arguments. I really wish I could but that story about 4 families doesn't make sense. You have a bunch of families in a country that all stop having children only after the first boy is born. So each family has a 50% chance of 0% girls, 25% chance of 50% girls, 12.5% chance of 67% girls, etc etc. And you have, realistically, tens of thousands of these families. So . . . why are we talking about countries of 4 families, since we have computers that can do much more complex calculations?
posted by jeather at 2:43 PM on January 1, 2011 [4 favorites]



I thought that all the pedants that get sucked into these kinds of questions would already know that working from a 1:1 ratio of male to female births is hopelessly naive.

"The natural sex ratio at birth is estimated to be close to 1.1 males/female"

The actual sex ratio you see amongst new-borns in a given country is a fascinating place to theorise. Indicator of little understood biological mechanisms? Widespread sex-selective abortions? Lizard rulers manipulating the population?

But getting back to the actual question... the theory may be beyond my feeble mind (probability is notoriously counter-intuitive) but it would seem trivial to computer model over a large population and a large number of simulations.... it'll line up with one of the "obvious" answers I'd imagine.
posted by samworm at 2:44 PM on January 1, 2011 [1 favorite]


can i get an interview at google ? cause i got it right even though im rubbish at maths.
posted by sgt.serenity at 2:46 PM on January 1, 2011 [2 favorites]


I was just stuck on how a guy who just died could be offering money.
posted by stevil at 2:51 PM on January 1, 2011 [4 favorites]


they offer free massages
Actually, they offer subsidized massages. The food, however, is free.
posted by CheeseDigestsAll at 2:52 PM on January 1, 2011 [1 favorite]


There's not enough data in the question to answer it properly. You'd also need to know reproductive age ranges, cultural reproduction policy, and broad spectrum mortality information.

Well, we can make some assumptions: assume boy girl ratio is the same in all age groups, assume that the cultural reproduction policy is what is given by the question, and assume that mortality affects boys and girls equally.

And we can also assume that we start with 1/1 boys and girls, and consider what change will happen in one generation.

Half of the families have boys, half have girls.

The half that had girls, have another kid. Half of these are boys, and half are girls.

Then that one quarter that had girls have another child, and have of these children will be boys and half girls.

It's half and half the whole way down! So the ratio should remain constant, at about 1/1. Really, the question presents information on when the families stop having children - the chance of any child being born a boy or a girl is not changed by that. The average family size is affected by that, and the expected boy to girl ratio in any given family is also affected... but the overall ratio of boys to girls in the country is not affected.
posted by molecicco at 2:53 PM on January 1, 2011 [11 favorites]


I get the 50/50 thing but I don't get what the Landsburg guy is saying.
posted by (Arsenio) Hall and (Warren) Oates at 2:55 PM on January 1, 2011 [1 favorite]


Quick correction to my simulation code: I was simulating 10 families, but Landsburg's challenge and predicted results are for four families. When I rerun it for four families (change the "10" on line 30 to a "4"), I get numbers just below 44%, exactly what he predicts.
posted by gsteff at 2:58 PM on January 1, 2011 [2 favorites]


My intuitive answer was 50% too, but I'm not sure I could explain it in a satisfactorily math-y way before reading the links. I also just knew, biologically, sex ratios are pretty much always roughly even. Sadly, what causes a real difference is death, not premature selection. Female infanticide and war, usually.

What I got from Lansburg was more or less niggling technicality. So his point is that there will be more families with a single male child, and some of the other families with a firstborn girl will likely have more than one girl, and so be bigger? Duh?

I...don't think I get it.
posted by Nixy at 3:00 PM on January 1, 2011


In a country in which people only want boys every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. What is the proportion of boys to girls in the country?

What do you mean "In a country," like it's some notional place that doesn't exist? Doesn't pretty much every family do this? If not, how come the phrase, "Are you gonna keep trying for a boy?" even exists? Or is that just the Catholics I grew up with?
posted by toodleydoodley at 3:01 PM on January 1, 2011 [1 favorite]


Landsburg is wrong, even if it's 50%-1/(4k), because for fucks sake, every country on earth has at least a million people in it.
posted by empath at 3:02 PM on January 1, 2011 [2 favorites]


Yeah- the trouble with this problem is that, with a gut-check, it feels like there should be a lot more girls than boys. What changes that, however, is that you've got all those families with one boy and no girls:

Slow and steady ties the race.

Modelling this with a little second-semester calculus, you get a ratio of infinite series:

Allow the index of summation to equal the total number of children in the family. Then, create two series to model the relative number of girls and boys in the population:

Population of girls = Σ(n=1, ->∞)[100*.5^n][n-1] = 0/2 + 100/4 + 200/8 + 300/16 + 400/32 and so forth. This sum, for n (family size) sufficiently large, equals 100.

Population of boys = Σ(n=1, ->∞)[100*.5^n][1], which is a simple geometric series, sum = 100. (50 + 25 + 12.5 + 6.25 + ...)

100/100 is, of course, equal to 50% boys, 50% girls.

The funny part about the whole thing is, where you would expect a surplus of girls to iron itself out as the population trends towards infinity (gut check), what actually happens is that a surplus of boys is ironed out as family size approaches infinity.

tl;dr, the problem's "official" answer is actually, (unusually, for some of these puzzles) right on.

Do I get Landsburg's money now?
posted by fifthrider at 3:02 PM on January 1, 2011 [14 favorites]


I did the math and find that as the number of families in the sample size increased, the percentage approaches 50%. However, for lower numbers of families, I got a lower percentage. I'm not really sure what there is to argue about.
posted by mai at 3:02 PM on January 1, 2011 [1 favorite]


If "ask this brainteaser" were an effective screening tactic in job interviews, more companies would use them.
posted by l33tpolicywonk at 3:03 PM on January 1, 2011 [1 favorite]


Or what fifthrider said in more detail.
posted by mai at 3:03 PM on January 1, 2011


So the greater number of families there are, the closer the ratio is to a straight 1:1 - but if we have only four families in, let's say, The People's Republic of Only Eight People, then the ratio is significantly skewed? Am I getting it correct now?
posted by (Arsenio) Hall and (Warren) Oates at 3:03 PM on January 1, 2011 [2 favorites]


(even in the case of tiny Tuvalu with 10,000 people living there, the answer, even using his calculation is 49.9996%)
posted by empath at 3:04 PM on January 1, 2011


SOMEONE CALL JAMIE HYNEMAN AND ADAM SAVAGE, STAT.
posted by chasing at 3:06 PM on January 1, 2011 [10 favorites]


If they have a girl, they have another child

Well, with the information given, there is at least one family with an infinite amount of girls then, isn't there?

[/hatesDoingMotherfuckingBrainTeasersJustToGetAnHTMLJob]
posted by drjimmy11 at 3:07 PM on January 1, 2011 [6 favorites]


SOMEONE CALL JAMIE HYNEMAN AND ADAM SAVAGE, STAT.

What the hell are they going to do? They'll need Kari Byron for this one.
posted by Cool Papa Bell at 3:09 PM on January 1, 2011 [23 favorites]


You could use mice.
posted by empath at 3:09 PM on January 1, 2011 [1 favorite]


If you think Landsburg is wrong then you should probably go to vegas and make big, big money using the martingale system.
posted by I_pity_the_fool at 3:10 PM on January 1, 2011 [6 favorites]


My offhand guess, not being a math/stats person, was I would be willing to bet the official answer is wrong. The only way the statistical answer would hold true is if all people have the same likelihood of having boys or girls. If some people are biologically more likely to have girls than the average, then those people will tend to have larger families and therefore more girls. If this proclivity was genetic, the effect would tend to increase as generations passed. You'd also have to look at the death rate, given that the proportion asked for isn't the ratio of births but rather the ratio of existing boys and girls. Oh, and immigration... and how are we defining "boy" and "girl" anyway?

This is why I sucked at economics especially in college. (Well, I mean, I got my A's, but my profs hated me.) Look, I don't care what you can mathematically prove is the case in your grossly oversimplified version of how you think the world works, if that doesn't actually apply in the real world then I question what the you're actually contributing.
posted by gracedissolved at 3:10 PM on January 1, 2011 [7 favorites]


So this is just one guy who can't do a simple puzzle because he doesn't know how puzzles are phrased? If the size of a population isn't specified, it's arbitrarily large. If a time span isn't given, it's arbitrarily long. If a binary random choice is offered, each choice is equally likely.

In any real country where people want boys, they kill some of the girl children. So we're not talking about the real world. The only point of the puzzle is to see who will realize that it's the same as asking: if you have an infinite number of people each flipping fair coins until they get heads, what proportion of heads to tails will you get? And that the answer is 1:1, because it doesn't matter who flips the coin or when they stop.
posted by nicwolff at 3:11 PM on January 1, 2011 [16 favorites]


(What I'm saying is, obviously at some point people can only have so many children in their life. But questions in an interview are going to have a pretty basic answer that allegedly tests your "thinking skills" or some shit, they're not going to involve a bunch of complicated math, and thus you can know the answer they want is "50%" without any thought whatsoever, and so like all interview questions it tests nothing but your ability to tell them what they want to hear.)
posted by drjimmy11 at 3:11 PM on January 1, 2011 [3 favorites]


What is wrong with this idea? Given 50/50 birth rate:

50% families with one boy
25% families one girl, one boy
12.5% families two girls, one boy
6.25% families three girls, one boy
(etc etc)
posted by Meatbomb at 3:12 PM on January 1, 2011 [1 favorite]


I'm with 23skidoo. Is Landsburg just changing the question being asked? If you follow the premise of the question, it seems like the proportion of boys to girls nationwide would be 1:1, but the average within-family proportion would have a higher number of boys compared to girls.

That's ignoring all of the ways the premise can be changed - a probability of boys versus girls being born not being 50/50, longevity, the fact that realistically you would not have a family that has 500 girls and 1 boy, and so on.

Can someone explain to me, then, whether Landsburg and Motl are actually arguing mathematics versus merely the semantics of the question? Or alternately does it hinge on Motl assuming an infinite number of families and Landsburg assuming a finite number?
posted by Chanther at 3:13 PM on January 1, 2011


I don't know if I'm doing something wrong, but I ran a simulation and at 3000 families I got 49.9% boys the first time, 49.8% the next.

I also discovered that rand() in PHP is really, really broken. Reminder: use mt_rand().

Here's my code:

http://pastebin.com/7rzV0rXv
posted by justkevin at 3:16 PM on January 1, 2011 [1 favorite]


And then there's the people who give birth to twin, triplet, etc boys on the first try.
posted by drjimmy11 at 3:16 PM on January 1, 2011 [4 favorites]


Meatbomb, that's correct.
posted by atrazine at 3:17 PM on January 1, 2011


My friends and I wrote a paper about our own argument about something quite similar. Here it is.
posted by rbs at 3:18 PM on January 1, 2011


50% families with one boy
25% families one girl, one boy
12.5% families two girls, one boy
6.25% families three girls, one boy


And a 0.00019073486328125% chance of 18 girls and one boy. Which is why the ratio depends on the number of families in the country.
posted by I_pity_the_fool at 3:18 PM on January 1, 2011 [3 favorites]


An actual useful job application question, based on input from that shining window on the human condition Passive Agressive Notes, would be:

"This is the break room fridge. Hey look. There's a sandwhich in there. Should you eat it?"
posted by Babblesort at 3:18 PM on January 1, 2011 [13 favorites]


Yeah, any way I can find to look at it, it comes out to 50-50, given a large enough sample. I freely admit that I could be missing something; but Landsburg's point about the average expected percentage of girls in each family makes no sense to me: in his four-family example, the proportion of boys to girls is 1:1, right? Is this just a quibble over the wording of the question, or does he have a serious point?
posted by steambadger at 3:20 PM on January 1, 2011


justkevin, try running it with just four families, 3000 times. What does that give you?
posted by molecicco at 3:21 PM on January 1, 2011


I don't know if I'm doing something wrong, but I ran a simulation and at 3000 families I got 49.9% boys the first time, 49.8% the next.

You're allowing each family to have children until the family is complete, which, although it wasn't how the problem was originally stated (originally it had a maximum number of years), Landsburg later said he'd be ok with that kind of simulation too. But the main reason you're getting different numbers is that you're using 3000 as the number of families, when Landsburg requested four families, but run the simulation for all of them 3000 times.
posted by gsteff at 3:21 PM on January 1, 2011


Landsburg is wrong because he's working off four families. The question specified "a country", to the best of my knowledge that implies significantly more than four families. Orders of magnitude more.

And that's going to change things. I'm sure, if you take four families, simulate that a few hundred times and average the results you'll get Landsburg's numbers. But so what? That isn't even close to addressing the question Google asks.

As gsteff observed, even bumping the number of simulated families to 10 brings the answer closer to 1/1. Bump the number of simulated families to 100, or 1000 and the answer will become even closer to 1/1.

Ergo Landsburg is full of it, and Google's answer of "50%" is correct.
posted by sotonohito at 3:21 PM on January 1, 2011 [2 favorites]


justkevin,
My PHP isn't super hot, but I think you're terminating when there is a girl, rather a boy on line 20.
posted by atrazine at 3:22 PM on January 1, 2011


He was awesome on Barney Miller.

.
posted by davelog at 3:22 PM on January 1, 2011 [4 favorites]


I just wrote my own simulation, and got the same results as gsteff (and Landsburg) for the People's Republic of Only Eight Adults.

For populations over a few hundred, for instance for the residents of the Democratic Kingdom Of My Apartment Building, the effect is very small.

It's really an infinite versus finite population assumption, and as nicwolff says, if a population isn't specified, it's arbitrarily large. Wombs are assumed to be arbitrarily fertile, and so on.

It's an interesting edge case which is only really useful in answers that don't assume the standard infinite population of problems where the limit is not specified, and in answers that don't assume the population is implicitly large, given the term "country" is used and countries tend to have millions of people; one generally doesn't talk about "the population" when they mean the residents of a single fourplex.
posted by Homeboy Trouble at 3:23 PM on January 1, 2011 [2 favorites]


The correct answer depends on how much you can sell the girl children to Americans for.
posted by gjc at 3:24 PM on January 1, 2011 [1 favorite]


I just reread Landsburg's post past the description of the problem. It turns out my mistake was not realizing that most countries have an average of just four families.
posted by justkevin at 3:24 PM on January 1, 2011 [10 favorites]


Quick correction to my simulation code: I was simulating 10 families, but Landsburg's challenge and predicted results are for four families. When I rerun it for four families (change the "10" on line 30 to a "4"), I get numbers just below 44%, exactly what he predicts.

That's because your code runs the families to completion (30 years is basically completion.) If you were creating families as you went, for example, after 20 years the kids pair up and form new families, then you would find an answer much closer to 50%, but I guess after thinking about it a bit more Landsburg might be right (though still arrogant) that the answer is lower than 50%.
posted by esprit de l'escalier at 3:26 PM on January 1, 2011


Here's a quick rundown of the principle in play here:

Imagine we roll a die and ask what the average roll is. It's fairly well-known that the answer is:

1/6 * 1 + 1/6 * 2 + ... + 1/6 * 6 = 3.5

Now, what if one were to roll a die and then take the reciprocal of the result? In that case, what would the average value be? Well:

1/6 * 1/1 + 1/6 * 1/2 + ... + 1/6 * 1/6 ~= .408

Now, the really surprising fact is that 1/.408 ~= 2.45. That is to say, the reciprocal of the average is not the average of the reciprocal.

How is that applicable to the current situation? Well, the Google solution (and Motl's argument) is that we want to know the average value of X, the number of girls that will show up in k families, and their contention (correctly) is that that number will be k, so there will be on average the same number of boys and girls (and here is there mistake) thus the proportion on average is 50%.

The correct solution that Zand and Landsberg offers is that we in fact want to know the average value of X/(X+k) which is markedly different, since we're concerned with putting that X in the denominator of the fraction. As soon as you start taking reciprocals of random variables, funny things start happening.
posted by TypographicalError at 3:27 PM on January 1, 2011 [7 favorites]


Is there any way to involve a treadmill in this problem?
posted by five fresh fish at 3:28 PM on January 1, 2011 [5 favorites]


atrazine:
justkevin,
My PHP isn't super hot, but I think you're terminating when there is a girl, rather a boy on line 20.


You're right, I swapped boys/girls result is still the same, basically 50/50 for any reasonable sized country.
posted by justkevin at 3:28 PM on January 1, 2011


You could use mice.

You're no fun at all.
posted by Cool Papa Bell at 3:28 PM on January 1, 2011


I met Doug Zare in grad school (he was a year ahead of me). He was rarely wrong about anything.
posted by Horselover Fat at 3:29 PM on January 1, 2011


I have always gotten hopelessly confused at any discussion of ratios or proportions, but this question strikes me as really weird and not really answerable with a specific figure at all: It's not "what are the odds or chances of a girl being born" which of course would always be 50%. It's "What is the proportion of boys to girls in the country?" or in the alternate phrasing, "What fraction of the population is female?" It seems like those words qualifying the question make it a lot less simple than just tossing off "50%" as the answer.

So Lubos' assertion that "The chance that a birth produces a girl remains 50% regardless of the laws," doesn't seem to answer the question. And the other article's assertion that "Imagine you have 10 couples who have 10 babies. 5 will be girls." Says who? What if all ten couples had a girl and tried again? And again? (The question is also not, "What is the average percentage of girls," or anything similar.) I fiddled around with it in Excel, because I have nothing better to do, and came up with the following "example data":

All ten couples have a girl = 10 girls.
All ten couples try again and have a girl again = 10 more girls.
All ten couples try again and two have boys = 8 more girls, 2 boys.
The eight remaining couples try again and two have boys = 6 more girls, 2 boys.
The six remaining couples try again and two have boys = 4 more girls, 2 boys.
The four remaining couples try again and all have girls = 4 more girls.
The four remaining couples try again and two have boys = 2 more girls, 2 boys.
The two remaining couples try again and one has a boy = 1 more girl, 1 boy.
The one remaining couple tries again and has a girl = 1 more girl.
The last couple tries again and finally has a boy = 1 boy.
Total babies: 46 girls, 10 boys. Way more than 50%.

Apart from the fact that that's an awful lot of babies to ask those women to keep bearing, am I missing something here? It seems like there's no possible way to answer the question as asked without qualifying it somehow. It's like trying to predict the lottery numbers, when you really can't because it's not like any particular number gets removed from the pool once it's chosen. Just because a coin toss comes up heads ten times in a row, there's no mathematical law that means it has to come up tails next time -- right?

In fact, suppose we re-jiggered the question to be about a coin toss rather than babies, and all the baggage that everybody brings to baby questions. Would the answer be the same?
posted by Gator at 3:29 PM on January 1, 2011


Huh, I could've sworn none of those other comments were there when I previewed.
posted by Gator at 3:31 PM on January 1, 2011


This answer calculates the wrong thing. He calculates the expected fraction of girls in a family, not the expected fraction in a population.
posted by atrazine at 3:32 PM on January 1, 2011


Keep in mind that Motl is a quantum physicist, and when you talk about probabilities and expectation values, etc, in quantum mechanics, you are talking about the expected results of a theoretical infinite number of observations, which kind of makes the 3,000 experiment/4 family test a bit silly and arbitrary.

The expected ratio of females hits a limit of 1/2 as the number of families approaches infinity.
posted by empath at 3:39 PM on January 1, 2011


The expected ratio of females hits a limit of 1/2 as the number of families approaches infinity.

That's very true, although it requires the sort of asymptotic analysis that Zare does with digammas and such. It's actually a nontrivial problem to solve correctly, which is the whole point of this discussion.
posted by TypographicalError at 3:41 PM on January 1, 2011 [1 favorite]


Apart from the fact that that's an awful lot of babies to ask those women to keep bearing, am I missing something here?

You start with "all ten couples have a girl," and "What if all ten couples had a girl and tried again?"

Well, if you start there, yes, you'll have a freaky result. But the odds of 10 randomly selected couples having 10 girls is pretty thin. Flip 10 coins and get 10 heads in a row? One in a thousand. One in 1024, to be exact.
posted by Cool Papa Bell at 3:41 PM on January 1, 2011


Douglas Zare's answer (the top rated one) make's a lot of sense though. For those of you who don't understand what a digamma function is, here's why the percentage of girls is lower for a smaller number of families.

The heaviest contributors to the weighted average (expectation function) for boys are very likely outcomes with a low number of boys (just the one).
For girls, there is a lot of expectation smeared out along the tail of the distribution, so there are outcomes (theoretically in the context of the puzzle, obviously not in the real world) where a couple has 10,000 or 10,001 girls and one boy, even in the puzzle context those are very rare outcomes.

Now if you have a finite number of families, you will undercount the unlikely but "weighty" outcomes where there are a huge number of girls.
Of course in a real country, if you set termination conditions where a couple will have max 20 children (still ridic. high) then the number of girls will be higher because there will be families with 20 girls and no boys.

Finally, during a real Google interview, if you bring up subtle arguments about real life live birth ratios, maximum family sizes etc it will reflect well on you. These questions are intended to be solved by people during the interview to show their thought processes, not to get totally correct answers.
posted by atrazine at 3:49 PM on January 1, 2011 [7 favorites]


But the odds of 10 randomly selected couples having 10 girls is pretty thin.

Yeah, but that's kind of my point. The question isn't about odds, at least as worded. It's a pretty specific number question. I don't think it's any more freaky for a completely random question to start with all ten being girls than it is to just conveniently assume each generation will be exactly 50/50 girls/boys (thereby making it a self-fulfilling equation).

If the question as phrased above was about coin flips instead of babies, it would not be "What are the odds of getting tails after X flips," it would be more like "What is the proportion of heads to tails?" Or, "What fraction/percentage of the coin flips will be tails?" Without qualifying the question further, I don't see how it's answerable.
posted by Gator at 3:49 PM on January 1, 2011


Poor Landsburg is both wrong and very arrogant.

I assume all of you are going to win $15,000 then? Or is it just posturing?
posted by Justinian at 3:51 PM on January 1, 2011 [2 favorites]


I assume all of you are going to win $15,000 then?

No, just four people, for some arbitrary reason.
posted by (Arsenio) Hall and (Warren) Oates at 3:53 PM on January 1, 2011 [16 favorites]


Yeah, the answer is 50%-\epsilon, where \epsilon is a function of the number of families in the country which goes to zero as the number of families increases. The economist is getting hung up on the presence of the \epsilon, and choosing only four families in order to play up the epsilon.
posted by kaibutsu at 3:53 PM on January 1, 2011 [4 favorites]


Uh, $5000. Whatever.
posted by Justinian at 3:54 PM on January 1, 2011


My intuitive answer was that the number approaches 50% as the number of families approaches infinity. That's not at all the same thing as the number being 50%, which makes some people's certainly kind of weird.

Definitely like the Monty Hall problem!
posted by Justinian at 3:56 PM on January 1, 2011


If you think Landsburg is wrong then you should probably go to vegas and make big, big money using the martingale system.

Correct me if I'm wrong (and I may be): using a martingale system on a game with 1:1 odds (such as a coin toss) gives you an expectation of 0; that is, you won't lose any money, but you won't win any, either (or rather, your chances of winning are balance by your chances of catastrophic loss). Is this analogous to a fifty-fifty split between boys and girls?
posted by steambadger at 3:58 PM on January 1, 2011


Actually, the correct answer is using a single question like this is a woefully deficient way to screen applicants, and I wouldnt doubt it was there, if there really an official institutional use for it, more as a way to cause applicants to blame themselves for not getting the job, as opposed to Google.
posted by JHarris at 3:58 PM on January 1, 2011


If you read deep into the comments, landsburg claims to be the person who introduced the Monty hall problem to mathematicians. That's incredible if true. The Monty hall is not obvious and not intuitive without Bayes rule, but how could anyone who knows Bayes rule not see it? Landsburg's credit is what exactly?
posted by scunning at 4:00 PM on January 1, 2011


Ergo Landsburg is full of it, and Google's answer of "50%" is correct.

More boys than girls are born. Google are wrong for the real world.

It’s a very good task to find and throw away the people who will get distracted from common sense and from simple, fundamental math arguments by noise and who will immediately start to think about complicated yet irrelevant technicalities

You mean, like getting the numbers right? Oooookay. I mean, I guess it depends which part of Google you're hiring for, but I'd hope that the people trying to model real-world behaviour cared more about real-world information and facts than "common sense".
posted by rodgerd at 4:00 PM on January 1, 2011


I assume all of you are going to win $15,000 then? Or is it just posturing?

He's right about his theoretical country of four couples, I don't see any need for computer simulations to prove that either. He may be an ass (a characteristic shared by Luboš Motl), but he's right.
posted by atrazine at 4:03 PM on January 1, 2011 [1 favorite]


Actually, the correct answer is using a single question like this

They probably use a lot of questions like this.
posted by kenko at 4:05 PM on January 1, 2011


Ruby says Landsburg is wrong. http://codepad.org/nEiGEUlU

Or, he is basing his answer off of different assumptions. Telling a computer how to simulate it removes a vagaries of english. He could probably clear this up with 10 lines of code.

Same thing with the Monty Hall problem. You quickly realize when you simulate it that you have to bake in the idea that the show host knows what is behind the doors.
posted by stp123 at 4:05 PM on January 1, 2011


37.6%
posted by clavdivs at 4:06 PM on January 1, 2011


suppose we re-jiggered the question to be about a coin toss rather than babies

Gender distribution in a population is, more or less, a coin toss. The outcomes you put forward from your Excel spreadsheet are very, very, very unlikely to occur in a real, randomly selected population.
posted by obiwanwasabi at 4:08 PM on January 1, 2011


I'm pretty sure Landsburg's assumption of 4 families wasn't meant to pick a degenerate case of the problem as a trick. It's just that he wanted to illustrate that the answer is somewhat less than 50%, as opposed to 50.0% exactly. Sure, he had to make the initial conditions silly to make the difference big enough to see, and in practice the difference would be negligible.

I'm not sure I understand any of the explanations for why it's not exactly 50%. I kind of like atrazine's, but I'm not convinced. If there was going to be a family with 20 girls and no boys, it should just as likely show up right away as later, right? Why would the ratio start out excessively boy-heavy and tend to even out? It should hover around 50%, drifting slowly toward boys with the more occasional big step change back toward girls?

I'm going to have to code a simulator for it myself - not that I don't trust anyone else's code, just that that's part of figuring stuff like this out for me.
posted by ctmf at 4:08 PM on January 1, 2011


And rodgerd is right. In the question as actually asked, the proportion of boys to girls normally born is not an irrelevant technicality, and neither is the fact that in the real world couples don't produce hundreds of daughters before giving up on the dream of a single son.

There may well be a reason for ignoring facts like that and going with a more tractable abstraction, but whether that's a reasonable course of action depends on the nature of real problem you're trying to answer and how it will interact with other things you're trying to do. It won't always be responsible. In the interview setting, of course, there isn't anything further you're actually trying to do and there is no context for the question, which suggests that they should ask the question they actually want you to answer straight out, rather than forcing you to divine it.
posted by kenko at 4:09 PM on January 1, 2011


I mean, seriously, the words "odds," "average," and "probability" don't appear anywhere in the actual question, hence my own questions here. I freely admit that math was never my best subject, but I did grow up in a house full of people who loved brain teasers (and who, more than anything else, LOVED to crow derisively when I got the answer wrong), so I've learned to pay very close attention to the EXACT WORDING of these kinds of questions, because invariably you're thrown off by the assumptions you bring to them. Like in this case, seemingly, "Well, let's calculate the odds..."

Gender distribution in a population is, more or less, a coin toss. The outcomes you put forward from your Excel spreadsheet are very, very, very unlikely to occur in a real, randomly selected population.

Again, my point. The odds of boy/girl or heads/tails are the same each individual time, but that's not what this question is asking. (Besides which, and this is just as irrelevant, I've known plenty of people in real life who've had a bunch of girls and no boys, or just one boy.)
posted by Gator at 4:11 PM on January 1, 2011


Ruby says Landsburg is wrong

Indeed, he probably didn't think the answer to the problem was:
Line 6: warning: parenthesize argument(s) for future version
Line 3:in `times': no block given (LocalJumpError)
from t.rb:3
posted by kenko at 4:11 PM on January 1, 2011 [3 favorites]


ctmf, the 20 girls no boy case was an example of an additional constraint, in the context of the puzzle every family has a boy as their youngest child.

If you are coding it yourself try and plot the % of girls as number of families rises.
posted by atrazine at 4:13 PM on January 1, 2011


And when I suggested re-jiggering the question to be about a coin toss, I meant along these lines:
In a place full of people flipping coins, everyone only wants tails. Every person continues to flip a coin until they get a result of tails. If they get heads, they flip again. If they get tails, they stop. What is the proportion of tails to heads in the results?
Or, using the other wording:
There’s a certain place where everybody flipping a coin wants tails. Therefore each person keeps flipping until they get tails; then they stop. What fraction of the flips are heads?
If you take the "Well, nobody would ever have ten girls in a row" aspect out of it, but leave all the rest of the wording exactly the same, do you approach the question differently?
posted by Gator at 4:18 PM on January 1, 2011


If you take the "Well, nobody would ever have ten girls in a row" aspect out of it, but leave all the rest of the wording exactly the same, do you approach the question differently?

I would.
posted by kenko at 4:21 PM on January 1, 2011


kenko, I disagree that asking straight out is the what they want. If you ask them in response, "after how many girls does the average family give up trying?" they're going to tell you, "assume they keep trying forever, until they have a boy." No practical effect on the problem over what you probably would have assumed anyway without asking. You're still going to get brownie points for thinking about the problem more so than someone who just grabs a calculator and starts blundering through without questioning his assumptions.

I think you'd adequately answer the question by saying "well, you would think it's 50% wouldn't you? Lets mock it up real quick and see if it does." If you could actually do that in a couple of minutes, of course. Then when it didn't, be able to start speculating on why not, like people in the thread are doing.

Certain problems are like that. They probably don't want you over-analyzing "what is the angle between the hands of a clock at 3:15" by asking stupid shit like "are any of the hands bent?"
posted by ctmf at 4:23 PM on January 1, 2011 [2 favorites]


I met Doug Zare when he was doing a summer program while he was still in undergrad. There were water balloon fights involved. (We were trying to do 3D tilings using water balloons and a chest freezer. We forgot to account for the expansion of water upon freezing.)
posted by sciencegeek at 4:24 PM on January 1, 2011


From what I've seen of Google interviews, one would only get half points for getting the 'correct' answer.

One would get 3/4 points for noting how the results would be different based on the number of families in the simulation.

Full points for considering (if only to ignore it later) longevity, max number of children a couple can bear in a lifetime, max family size, abortion, child murder and all kinds of real world constraints.

One would get a recommendation and maybe an offer if one came up with a way to optimize the simulation for whatever initial parameters, running on different hardware architectures, specially google-like datacenters.

I think Motl would get a better interview, just because he is considering a sustainable country where older women from this generation can find same aged men from the previous generation (in families with girls and and boys, the girls are always older than the boys) to be counted against (and maybe marry and have more kids), while Landsburg is considering a single run of the experiment, where the grown up children do not have children of their own until every family has had their boy and stopped making babies.

I showed this to a statistician friend who works at google. He got all Bayesian and Widths of Distributions and Series and Digammas on my ass and I got completely lost. There may be more to this question than we think here.
posted by Dr. Curare at 4:27 PM on January 1, 2011 [5 favorites]


Statistics. How does it f*ckin' work?
posted by skippyhacker at 4:35 PM on January 1, 2011 [2 favorites]


kenko: you didn't actually run it, did you? Runs fine. That is just the codepad website giving warnings.
posted by stp123 at 4:36 PM on January 1, 2011


Landsburg's argument, in short:

If you look at just one family instead of a whole nation, the expected ratio of girls to population is less than 1/2. The math on this is easy to follow, and he's right.

This is of course not the same as the original question, and Landsburg knows it -- but it's just the first step in his argument. He's pointing it out to make it clear that the number of families affects the answer. You can do the (slightly more complicated) math for a combination of two families, and it still comes out less than 1/2. The same applies for four, or ten, or any finite number. The expected ratio does in fact converge to 1/2 as the number of families approaches infinity. But Landsburg does not believe in infinitely-large nations.

Here he differs from the usual approach to this sort of statistical problem, which is to assume that the population is effectively infinite -- that is, large enough that any difference from the answer you get from an infinite population is swamped by the standard error. And that's a good assumption, if you're dealing with populations the size of actual nations: if we ran the simulations he proposes with nations of a million families, there would be no significant difference between the results and 1/2. This is why he wants to do the simulations with nations of only four families.
posted by baf at 4:37 PM on January 1, 2011 [3 favorites]


steambadger: If you think Landsburg is wrong then you should probably go to vegas and make big, big money using the martingale system.

Correct me if I'm wrong (and I may be): using a martingale system on a game with 1:1 odds (such as a coin toss) gives you an expectation of 0; that is, you won't lose any money, but you won't win any, either (or rather, your chances of winning are balance by your chances of catastrophic loss). Is this analogous to a fifty-fifty split between boys and girls
Not just analogous, but basically that gets to the heart of the problem in both cases.

In the martingale system, it's foolproof on paper: you will eventually win back your original stake assuming infinite wealth to bet with and no house limits. In reality, you don't have infinite money and time, and the house does have betting limits.

Similar, if the birth rate is exactly 50/50 for any given birth, there are effectively countless families to start with at a perfect 1:1 ratio of men and women, and wombs are infinitely reproductive and mothers live forever, and the sampling occurs at the end of time when all iterations are completed.... the ratio of boys to girls in n families is n:n, the same way 1/2 + 1/4 ... = 1. However, this largely assumes there are no complications like infant mortality, adoption, unexpected births/abortions, genetic screening, twins, genetic proclivities leaning towards female births (a trait that gracedissolved noted above as eventually altering the society by making it heavily female). In reality, all these factors and many more come into play, so your simulation will "go bankrupt" or hit "house betting limits" long before it reaches 50%.

The formula to compute how much it falls short is when the only limitation is a less than infinite number of families is what Landsburg is defining. In some simulations with 4 families you'll get all 4 families having multiple girls before their first boy; in others all 4 families will have boys on the first try. The distribution of outcomes for 4 families across s computer simulations would presumably distribute along the same curve as per-family ratios, where some trials would be all families with a single male child, and some small percentage of trials being 4 families with 10 daughters each and one son; I'm assuming without proof that the average ratio of males to females across all trials is some number less than 50%, as Landsburg claims. The metafilter commenters pointing out those other factors I've mentioned above are adding additional wrinkles that further pull the question from "mathematical abstraction" to "census".
atrazine: Finally, during a real Google interview, if you bring up subtle arguments about real life live birth ratios, maximum family sizes etc it will reflect well on you. These questions are intended to be solved by people during the interview to show their thought processes, not to get totally correct answers
Having gotten these kinds of questions in the past (Microsoft doesn't do them anymore I hear, but way back when I interviewed with them in the late 90's they definitely asked those questions), the relevant part is not only getting the question right but offering up the thought processes that got you to your answer- right or wrong. If asked well and answered well it shows that you:
  • Are able to tackle a problem without being intimidated, breaking it into smaller parts or rewording it to better understand it
  • Can proffer real-world factors, but at the same time recognize when reducing it to a purer mathematical abstraction helps solve the problem to a "good enough" state for deciding on a course of action in terms of programming or problem solving
  • Are able to think and talk through a problem, communicating your current thoughts to people, and show that you can adjust your thinking or revisit assumptions (and can articulate the assumptions you are making that people have pointed out)
Having been a veteran- but not a star or ever on a top team (I think our highest placement was around 7th in the TimeCorps event) of the apparently discontinued Microsoft Puzzlehunt, I can say that the kind of people who could jump into such a problem with relish, and be excited about solving it, fertile with ideas, tangents, assumptions, clarifications, factors that all might come into play, and ultimately focused enough to drive to a conclusion without being endlessly distracted by (otherwise interesting) minutae... these are the kind of people you'd want to be working in your technology company.

I assume Ballmer hated the puzzle interview questions, because that company has long been overrun by middle-management suckups with no technical chops who probably loathed those interview questions as "too hard"- not realizing the whole point was to show someone who just loved solving problems.
posted by hincandenza at 4:39 PM on January 1, 2011 [6 favorites]


Changing the simulation to include in a gradually increasing number of families and iterating 3000 times, does give the 44%. But as others have pointed out, that seems to answer a different question that the one posed by Google. On preview, baf sums it up well.
posted by stp123 at 4:43 PM on January 1, 2011


I mean, ultimately, unless births are being terminated prematurely based on sex, there will be just as many girls born as boys, except in the degenerate case of a few families who happened to have boys first and stopped having children. The degenerate case is in no way representative of a 'country'.
posted by empath at 4:48 PM on January 1, 2011


Phrasing this as a purely statistical question might be better, then you won't have people asking, e.g., "did you factor for Madonna and Angelina Jolie adopting and removing those girls from the country's population? Why or why not?"
posted by boo_radley at 4:48 PM on January 1, 2011


Guys, I have to point out that fifthrider and Lubos Motl have answered this question definitively. The answer is 1:1 and you don't need computer simulations, and you don't need to look at the tendency over N possible families or anything like that. The math for this is not elementary but it's not super advanced, either. We use Taylor series convergence to calculate precise values for average number of boys per family and average number of girls per family.

Here's how it works.

The average number of boys, let's say, can be calculated with an infinite series, where each term represents the probability of a given scenario multiplied by the number of boys in that scenario (always one). For instance:

Scenario1: one boy is born (P = 1/2, boys = 1, so weight = 1/2)
Scenario2: one girl then one boy is born (P = 1/2 x 1/2, boys = 1, so weight = 1/4)
Scenario3: two girls then one boy is born (P = 1/8, you get the idea)
etc.

Adding up this infinite series of terms gives us the average number of boys per family. The formula for each term is simply (2 ^ -N) [that's two to the power of negative N, in case the notation seems weird] This can be computed quite easily using Taylor series convergence...it equals one.

How about the average number of girls per family? We make another series.

Scenario1: one boy is born, weight = 0
Scenario2: one girl then one boy, weight = 1/4
Scenario3: two girls and one boy, weight = 1/8 x 2 = 1/4
Scenario4: three girl and one boy, weight = 1/16 x 3 = 3/16
Scenario5: four girls and one boy, weight = 1/32 x 4 = 4/32 = 1/8

The formula for each term is (2 ^ -N) x (N - 1). I have to brush up on my Taylor series a little bit, but this is probably a pretty simple exercise for anyone who learned this stuff without forgetting 95% of it (that's me). Eyeballing it, I'm pretty sure this series converges to one as well, because (2 ^ -N) has a much higher, um, "magnitude," than (N - 1), so I believe that only that portion of the term is significant as one extends the series to infinity. We're reaching the limits of what I can remember from twenty years ago.

So what does this mean? It means that the average number of boys and girls per family is the same, and that number is one. I'm sure someone out there knows this stuff far better than I, and can either correct my errors or affirm my assertion. The point is that I'm very confident that this can be mathematically determined without any room for uncertainty, and I think I may even have it precisely, though I'm much less sure of that.

There is one unspoken assumption here: that the probability of having a boy versus having a girl is always a random 50/50 proposition. OK, now it's spoken.
posted by Edgewise at 4:49 PM on January 1, 2011 [1 favorite]


Or another way to sum it up: the Economist warps things to fit his worldview, the Physicist gets it mostly right but glosses over some unimportant details, and the Mathematician makes us realize none of us really understand half of it.
posted by stp123 at 4:49 PM on January 1, 2011 [8 favorites]


kenko: you didn't actually run it, did you? Runs fine. That is just the codepad website giving warnings.

OK, so how do you run it? I take it that there's something more to it than just making sure that the "Run code" checkbox is checked and hitting the "sumbit" button? Do we need to install Ruby locally or something?

Also, I don't know a lot about Ruby, but when I see the word "error" in the output, I think it's more than just a warning.
posted by baf at 4:54 PM on January 1, 2011


Of course, I'll point out that yes, this scenario also has the unrealistic condition that family size is not limited by any practical means, which is obviously not the case in the real world. The ratio of boys is higher if you were to impose any arbitrary limit of N, the number of children that a couple could have.
posted by Edgewise at 4:54 PM on January 1, 2011


(keep in mind that while edgewise's explanation seems to require families with arbitrarily high numbers of girls to reach 50%, in the real world, those families with a very large number of girls will simply stop having children without having a single son, which I'm fairly sure makes the ratio work out the same)
posted by empath at 4:55 PM on January 1, 2011


So what does this mean? It means that the average number of boys and girls per family is the same, and that number is one. I'm sure someone out there knows this stuff far better than I, and can either correct my errors or affirm my assertion. The point is that I'm very confident that this can be mathematically determined without any room for uncertainty, and I think I may even have it precisely, though I'm much less sure of that.

No offense, but this is all very well known and all the principals have done this math. The issue is that Motl is answering the wrong question, as he (and you) are misunderstanding the fact that that simply because the number of girls equals the number of boys in expectation does not imply that the proportion is 50/50 in expectation.
posted by TypographicalError at 5:00 PM on January 1, 2011


Empath, you are correct, but I came back and qualified myself in the nick of time, if you glance above.
posted by Edgewise at 5:00 PM on January 1, 2011


DTMFA
posted by Bokononist at 5:04 PM on January 1, 2011


Edgewise: Looks to me like you're showing that E(#boys)=E(#girls), which is obviously true. However, it doesn't follow from there that E(girls/pop)=E(girls/(girls+boys))=.5. As someone pointed out up thread, weird things happen to expectations when you divide by random variables.

Accepting for the moment that E(girls/pop)!=.5, what about E(girls/boys)? I'm too lazy to either work it out or write a sim right now.
posted by PMdixon at 5:05 PM on January 1, 2011


> OK, so how do you run it? I take it that there's something more to it than just making sure that the "Run code" checkbox is checked and hitting the "sumbit" button? Do we need to install Ruby locally or something?

Yes, you need ruby locally. If you're on a Mac, it ships with it. Just create any file e.g. foo.rb, cut'n'paste the code in, launch a terminal window, and type in 'ruby foo.rb' in the directory you saved the file in.

It looks like codepad is using an out of date ruby interpreter.
posted by stp123 at 5:06 PM on January 1, 2011


Google: "What is the proportion of boys to girls in the imaginary country?"
Me: "Can I assume that there's a 50% probability that any child is a boy?"
Google: "Of course!"
Me: "...and the other half are girls? No other genders, or freak litters of rabbits or anything?"
Google: "Certainly. No tricks here!"
Me: "Oh noes this is difficult I will need a computer!"
posted by nowonmai at 5:11 PM on January 1, 2011 [1 favorite]


I'd just give the google interviewer the start of the answering sentence and then wait and see what completion shows up in the drop down list of suggestions.
posted by srboisvert at 5:12 PM on January 1, 2011 [11 favorites]


The issue is that Motl is answering the wrong question, as he (and you) are misunderstanding the fact that that simply because the number of girls equals the number of boys in expectation does not imply that the proportion is 50/50 in expectation.

To be fair, this is actually pretty subtle if you're not used to probability.
posted by atrazine at 5:13 PM on January 1, 2011


There's a joke in here somewhere about Google being blocked in China.
posted by box at 5:15 PM on January 1, 2011 [4 favorites]


I'm trying, I really am, but I still don't see how the question as asked has anything to do with averages or probabilities.
posted by Gator at 5:19 PM on January 1, 2011


Typical MeFi overthinking. 50% of pregnancies result in a girl. The number of pregnancies is immaterial.
posted by w0mbat at 5:21 PM on January 1, 2011


w0mbat: " 50% of pregnancies result in a girl."

If I knew the unicode, I would seriousface you. I don't think this has ever been the case -- stats classes use as a FUN FACT CALLOUT BOX the actual ratio, which slightly favors boys (105:100, if I recall). In China, determinant gonadtyping (gonado-something? tropism?) boosts that ratio to 115:100.
posted by boo_radley at 5:27 PM on January 1, 2011


I wrote out a naive simulator in Haskell.

My first version (not posted) merely simulated the outcomes in a country with an arbitrarily large number of families. Simulating up to 5,000,000 families, I kept getting averages flitting around 50:50.

The posted version is parameterized by both the number of simulations and the number of families. For a low number of families, I consistently get a smaller fraction of females. As the number of families goes up, the proportion approaches 50:50.

(The code is pretty raw and obfuscated and w/o comments. There is basically no error-checking. I rely upon the assumptions that birth rates are 1:1 m:f and that the GHC System.Random module is a fairly decent PRNG.)
posted by adoarns at 5:33 PM on January 1, 2011 [1 favorite]


I suck at math but Landsburg is saying that running a simulation with a small set of families you do not get 50%. Mathoverflow is saying he is right, but it approaches 50% as the sample size approaches infinity. So as asked, 50% is correct but you get Lansburg's answer if you run a simulation with a small sample size.

I guessed 50% just by thinking it will even out eventually, for every family with 7girls there will be seven families with 1boy.
posted by Ad hominem at 5:33 PM on January 1, 2011


Here's a flexible Python script to play around with this kind of simulation. It assumes families run "to completion" and keeps going until they do, and lets you set both the number of families in the population, and the number of times to run the simulation. Have fun.
posted by ZsigE at 5:35 PM on January 1, 2011 [2 favorites]


This is a puzzle of logic and math. Questions of "at what point do families give up" or "but maybe this couple is more disposed towards producing girls" are missing the point. As for Landsburg's artificially small population count, that's what limits are for. That's also why the initial problem doesn't phrase it by specific population count, because that's also completely missing the point.

The average number of boys in a family will be 1. The average number of girls, then, needs to be 1 for the 50/50 chance to bear out. Half of families have no girls, so the average among the remaining families after their first child - call this step 2 - has to be two. Half of the step 2 families have one girl, so the average among the remaining has to be two. Half of step 3 has two girls, so the other half has to average four. And so on. At this arbitrary low limit of four families, you don't have enough families to adequately populate the statistical trees, so you get a number lower than the actual probability. This is why there exists a concept of a statistically significant sample size, which as an economist you'd think Landsburg would know about.

The blather in his post about the expected fraction of girls in a family is just nonsensical, though, particularly in the original post. He doesn't seem to understand the difference between a median and an average. And his defense is just dishonest and silly.
posted by kafziel at 5:37 PM on January 1, 2011 [3 favorites]


b1tr0t. Consider the distribution the plane with axes for number of boys and number of girls. You are calculating the "maximum likelihood" or mode of the distribution, and you're right that it's on the line y=x. However, that doesn't mean that E(x/(x+y)) = 0.5.
posted by esprit de l'escalier at 5:38 PM on January 1, 2011


I guessed 50% just by thinking it will even out eventually, for every family with 7girls there will be seven families with 1boy.

Six, actually, since the family with 7 girls also has one boy. The issue is that when you take those seven families and pick two at random, you are guaranteed to find two boys, but have only a 40% chance of getting any girls at all. If you then average enough of those random samples, you wind up with an average of 2.142 kids per family, instead of the actual population-wide average of 2.857 - and a conclusion that there's a 78% chance of any given birth being a boy, because you have 30 samples saying there's a 100% chance and 12 samples saying there's a 25% chance - because your samples are too small to be representative of the population.
posted by kafziel at 5:47 PM on January 1, 2011


Yes, you need ruby locally. If you're on a Mac, it ships with it... It looks like codepad is using an out of date ruby interpreter.

Note that Tiger and Leopard shipped with Ruby 1.8.6, the version that throws the LocalJumpError here; so, if you haven't upgraded to Snow Leopard (or updated Ruby) you're going to get the error locally, too.
posted by steambadger at 5:51 PM on January 1, 2011


So as asked, 50% is correct but you get Lansburg's answer if you run a simulation with a small sample size.

The problem is not sample size, or statistical noise. You can run many, many runs of this model on countries with small number of families and converge very precisely on a number that is not 0.5.
posted by gsteff at 5:57 PM on January 1, 2011



Poor Landsburg is both wrong and very arrogant.

I assume all of you are going to win $15,000 then? Or is it just posturing?


No, I was wrong. I admitted it further down thread.
posted by esprit de l'escalier at 6:00 PM on January 1, 2011 [1 favorite]


The issue is that Motl is answering the wrong question, as he (and you) are misunderstanding the fact that that simply because the number of girls equals the number of boys in expectation does not imply that the proportion is 50/50 in expectation.

Landsburg and Motl are answering different questions, each one is answering his own question correctly, and each one would lose a bet whose terms were set by the other. We all agree on that, right?

Which question, if either, is the "wrong" one is not a mathematical dispute and can't be settled by computation or simulation.

The question as phrased asks "what is the proportion of boys to girls in the country?" Landsburg reads that as "what is the expected value of the proportion of boys to girls?" Motl (I infer, I didn't read his post) reads it as "what is the proportion of the expected population of boys to the expected population of girls?" I see no principled reason to prefer one to the other.
posted by escabeche at 6:02 PM on January 1, 2011 [5 favorites]


I worked for Google for a while and I interviewed a bunch of software engineer candidates, but I never asked one of these tired fucking brainteaser questions. (How many golf balls will fit in a school bus? I heard that one in college, a long time ago.) I asked questions that would show the candidate understood the basics about algorithms and data structures, and when I got tired of that, I asked them to write code to solve a concrete problem. Pretty much all the other engineers would ask similarly pragmatic questions. (I know this because accompanying each candidate there was a sheet of paper. Each engineer would write down the questions they had asked the candidate, so that nobody would accidentally ask a repeat question or even a question on a general topic that previous interviewers had already covered.) The one thing I'll say for this question is that it has answers that can be justified with fairly reasonable arguments, so it gives an intelligent, well-spoken candidate the chance to shine. But that same candidate had better be able to code, or they're not getting hired.

Most of the questions in the "15 interview questions" link are for Product Manager positions, which don't necessarily require coding skills, although the culture is such that a PM won't get respect unless they have technical sk1llz. I just googled for "google software engineer interview questions" and this was the top result, and the questions you see there are pretty consistent with my experience as a software engineer there.
posted by A dead Quaker at 6:10 PM on January 1, 2011 [5 favorites]


Maybe to emphasize the verbal slipperiness here: I think everyone would agree that, given some finite number of families, it is indisputable that "on average, the proportion of boys is is slightly less than 1/2; furthermore, the number of boys is on average equal to the number of girls."

Given this, it's hard for me to see how one can make a strong claim that either Landsburg or Motl is "wrong."
posted by escabeche at 6:12 PM on January 1, 2011


Assuming the odds of it being a boy or girl for any given pregnancy within a population is always 50%, then the answer couldn't more obviously be 1/1, or 50/50, because that's just axiomatic.

If, on the other hand, there's some biological factor present that changes the odds of conceiving a child of a particular gender after the birth of a previous child (for instance, if boys are more likely to be born to families that already have girls, or vice versa), then that changes things.

It all depends on whether we're taking this proposition as one pertaining strictly to reality (in which case, the answer may not even remain consistent over time, being contingent on changing biological and environmental factors), or as one pertaining to the basic analysis of probability.

In the former case, the answer depends on on the data, which likely means the precise answer changes over time. In the latter case, the answer is just 50/50.
posted by saulgoodman at 6:20 PM on January 1, 2011 [1 favorite]


But now I'll show my own hand: I think both Landsburg's and Motl's interpretation of the question are both perfectly justified, as I've said. But aesthetically I shade to Motl's side, for the following reason.

1. Asking whether one random variable (number of boys) has a greater or smaller expected value than another (number of girls) seems to me a very natural question. Asking whether the ratio of two random variables has an expected value greater than or less than 1/2 feels very weird. If a Google interviewer asked me this question and phrased it in a way that made it clear that Landsburg's version was the one intended, I think I'd have to ask, "Really? That's what you want me to compute? I mean, I want this job, so, OK, I'll do it. But.... weird."

2. The answer to Motl's version of the question doesn't depend on the number of families, while the answer to Landsburg's version does. So the failure to specify population size in the question might be taken as evidence in favor of Motl's interpretation.

3. More generally still, I'm with the commenters who are inclined to treat the population as infinite from the get-go; this is unrealistic, but so is a hell of a lot else about this question. If somebody asks me "What is the probability that two random integers are relatively prime?" I am not going to say, "It depends -- what is the size and location of the range from which these integers are drawn?" I am going to say "Six over effing pi squared" because I know that's what the question intends.
posted by escabeche at 6:21 PM on January 1, 2011 [3 favorites]


My answer is "One moment, let me Google the solution."

Tell HR I take a large jumpsuit. Darker colors work best and I'd like a flame design along the legs.
posted by Spatch at 6:25 PM on January 1, 2011 [2 favorites]


Sigh. When will the internet figure out that Steven Landsburg is a very clever variety of troll? And one shouldn't feed trolls...

Anyway, he's right, but only according to his technicality.

To get a 50% sex ratio we have to make a number of assumptions, including:

1. Couples never give up trying to have a son, even after an arbitrary number of children.

2. The odds of any given child being a son is precisely 50%, there's no sex-specific abortion, etc.

3. An arbitrarily large population - in other words, we want the limit as the number of families approachs infinity.

None of these assumptions are realistic, but the problem strongly implies all of them - otherwise there's insufficient information to come up with a precise number.

Now Landsburg chooses to ignore just the population assumption, but keeps all the others. Then, when he gets a slightly different result he claims everyone else is "wrong". This is self-aggrandizing pedantry.
posted by Wemmick at 6:34 PM on January 1, 2011 [2 favorites]


Typical MeFi overthinking. 50% of pregnancies result in a girl. The number of pregnancies is immaterial.

This is only true if the sex of the baby has no bearing on whether you bring an additional pregnancy to term. But that's not the case here.
posted by Justinian at 6:34 PM on January 1, 2011


Assuming the odds of it being a boy or girl for any given pregnancy within a population is always 50%, then the answer couldn't more obviously be 1/1, or 50/50, because that's just axiomatic.

Would you also say it is axiomatic that, given a prize behind one of two doors, the chance of it being behind either door is 50%? Or do other factors matter besides the number of doors?
posted by Justinian at 6:37 PM on January 1, 2011 [1 favorite]


If I were Google by the way, I would totally hire this thread.
kafziel: This is a puzzle of logic and math. Questions of "at what point do families give up" or "but maybe this couple is more disposed towards producing girls" are missing the point. As for Landsburg's artificially small population count, that's what limits are for. That's also why the initial problem doesn't phrase it by specific population count, because that's also completely missing the point.
I believe you're totally wrong here. Mathematically, Landsburg's on very solid ground, from what I can tell (and this seems to be well-corroborated by theory and by computer simulation); the answer is computable from k families, and he shows that an exact answer is actually dependent on the size of k- it's just that past a ludicrously small number of families in a tiny island nation, the answer rounds to 50% within the first several digits, and any computer or real-world simulation will come awfully close to the mathematically pure "infinite child family" abstraction.

The question was "what is the ratio" and "here is a formula to produce those expected values for a given family count of k" is a more valid answer than "For all countries of any size, the answer is always 1:1". The former is mathematically true, the latter is mathematically true only for an infinitely large country or a limited number of significant figures.

It's not just a question of logic and math, though: if it were, the question would simply be phrased as "What is the sum of the series 1/2 + 1/4 + 1/8 ... 1/n?". You can call it beanplating, but the human facets of the question- and the real world limitations- are fair game when you take it from a pure mathematical problem to a brainteaser describing a plausible scenario (society putting value only on sons and not daughters). The questions about "giving up" aren't missing the point, they are the point; those are not only valid real world questions but good questions for the interview process to demonstrate that you can think beyond the limited constraints of the presented problem. Google et al don't just want to hire mechanical turks to crunch through existing fabricated problems, they want people to think of the problems- and solutions- the interviewers won't, which is what those "missing the point" questions demonstrate. Some of those interview questions have also been ones like "How many piano tuners are there in the US" where you could guesstimate the answer as well as show the variables (population, number of pianos owned, average and suggested frequency of tuning, etc) that would affect your answer. No one who asked that question knew the right answer, they just wanted to see how you'd handle a question without a clear answer, and how you think on your feet about new problems you wouldn't likely have encountered before and how you'd structure your approach to solving the problems.


This thread is going on a long time for what is really a simple answer: in a purely mathematical sense Landsburg's answer is a more generalized form for the pure mathematical problem, whereas the Google answer of exactly 1:1 is a solution to that formula for a specific value of k = infinity. The debate, including with you, seems to be "how big are we assuming k is?". You assume it's large enough that his four-family problem won't arise because it would be statistically invalid to even estimate, and Google is assuming in classic brainteaser fashion that the family count is infinite.

In a non-mathematical sense, the only correct answer is "We cannot possibly answer this with precision and accuracy given the stated information, but a rough estimate for a reasonably sized country negating infringing real world factors such as [some factors alluded to above] is 1:1".
posted by hincandenza at 6:40 PM on January 1, 2011 [4 favorites]


Axiomatic? That's ridiculous, especially given the amount of discussion in this thread. No need to be condescending.
posted by archagon at 6:47 PM on January 1, 2011 [2 favorites]


#!/usr/bin/python
### converges to 1 as the number of families goes up
families = 1000000
import random
m=1.0
f=1.0
for i in range(families):
    s=0
    while not s:
        if random.random()<0>

posted by signal at 6:48 PM on January 1, 2011


Reminds me of the old joke.
A physicist, mathematician, and an engineer are watching a basketball game in a gym at at high school. During a break, the physicist gives the other two the following problem:
"Line up all the girls at one side of the gym, and all the boys at the other side. Then every minute the boys move toward the girls for 1/2 of the distance between them. How long before they reach the girls?"
The mathematician gets very excited:
"This is a trick question. They never reach the girls!"
The engineer smirks.
"They get close enough for all practical applications."

Landsburg is the mathematician. Motl is the engineer.
posted by eye of newt at 6:48 PM on January 1, 2011 [8 favorites]


The epsilon factor is on a readmill, see, and that's why it can't take off.
posted by five fresh fish at 6:49 PM on January 1, 2011


Woops, pastebin is my friend.
posted by signal at 6:49 PM on January 1, 2011


Myself, earlier: Accepting for the moment that E(girls/pop)!=.5, what about E(girls/boys)? I'm too lazy to either work it out or write a sim right now.

I am dumb. Obviously E(girls/boys) = E(girls)/#boys, because #boys is a constant = # of families. And we all agree that E(girls) = # of families. So E(girls/boys)=1, even though E(girls/pop)!=1. Interesting.
posted by PMdixon at 7:11 PM on January 1, 2011


Axiomatic? That's ridiculous, especially given the amount of discussion in this thread. No need to be condescending.

No, I only meant it's axiomatic because of the specific way I formulated the problem, not axiomatic in any deeper sense. No condescension intended.
posted by saulgoodman at 7:11 PM on January 1, 2011 [1 favorite]


I just wrote my own simulation, and got the same results as gsteff (and Landsburg) for the People's Republic of Only Eight Adults.

This is the thing that puzzles me. Why is it so skewed for small populations? An individual population, sure, but for a large set of similarly seeded populations? I'm missing something. (That was probably explained above, I know. Bear with me.)
posted by ChurchHatesTucker at 7:26 PM on January 1, 2011


What is the proportion of boys to girls in the country?

I guess the answer depends on if it's Tuesday.
posted by mazola at 7:27 PM on January 1, 2011


FWIW, this is exactly why ecologists and the like have the concept of Ne i.e. effective population. The question as originally posed can be considered a reduced (i.e. special) case of that, where Y chromosome frequency is substituted for allele frequency, assumed to be 0.5 (i.e. 50:50) (the original version I saw years ago specified an equal chance of boy or girl), the population is not given (but, thanks to specifying 'country', can safely assumed to be large - which ecologists will treat as 'infinite' ;-), and variances are assumed to be zero or near-zero.

The thing is, ecologists care about this because they know non-infinite populations fuck up one of those underlying assumptions, but they also know that particular issue can be quite reasonably ignored for anything except small populations (while happily debating what constitutes 'small' in any given case ;-). Landsburg plays with this by specifying an initial population of 4 pairs (intimating 50:50 sex ratio), or reserving the right to change his prediction if another population size is chosen. Under those conditions, he's right - but he's making an assumption outside the wording of the original question.

In the second post, he's making the point that it will asymptotically approach, but never be, 50:50 - he says "I say the answer depends on the number of families in the country, but in no case is it 50%". Again, he's right - but, in that case, his example is also wrong, and for the exact same reason. It's worth noting that, for his preferred N of "four couples" (i.e. N=8, assuming 50:50 sex ratio), he gives nothing more precise than "the answer will be just a hair under 44%". He never comes out and says it, but his argument boils down to 'Motl's precise figure is wrong, but my imprecise figure is correct'.

And, overall, that's true - but he's also being extremely disingenuous by treating Motl's prediction of 50% as exact, adding extra qualifiers into the problem that Motl had no reason to consider, being deliberately vague with his own prediction, and confounding the issue by 'generously' offering to split the difference at 46.5%…

(For a small population like 4 pairs, under the rules of the question with the assumption of 50:50 sex ratio and minimal variance, a smart ecologist would predict "3 males, 3 females, give or take one each way", and arrive at 50±10% ;-)

(On preview: ChurchHateTucker asks "Why is it so skewed for small populations?". Answer: It's not skewed because of the small population, it's skewed because of the rules of the question - parents who have boys have to stop, while parents who have girls get another go and still only have a 50:50 chance of producing a boy.

That skew is minimised, but never completely disappears, as the parental population is increased.)
posted by Pinback at 7:40 PM on January 1, 2011 [3 favorites]


I_pity_the_fool writes "If you think Landsburg is wrong then you should probably go to vegas and make big, big money using the martingale system."

Martingale system doesn't work if only because one doesn't have infinite wealth. And if you did you'd be foisted by table limits.

"Of course, I'll point out that yes, this scenario also has the unrealistic condition that family size is not limited by any practical means, which is obviously not the case in the real world. The ratio of boys is higher if you were to impose any arbitrary limit of N, the number of children that a couple could have."

If N is say, 22; what is the ratio? Because while not common now families like the Duggars and Bates weren't uncommon 100-200 years ago. One of my grandfathers was the 16th of 22 children.
posted by Mitheral at 7:44 PM on January 1, 2011


How is this different than the following experiment?

Take a piece of paper, draw two columns on it -- Heads and Tails (heads are boys, tails are female).

Start flipping a fair coin.

After each coin flip, mark the next row either Heads or Tails, corresponding to the results.

Every time you roll heads (a boy), draw a line underneath that row, and consider all the rows between that line and the previous line as a single family.

At the end of any arbitrary number of flips, add up the heads and tails.

What is the ratio that you'd expect to see?
posted by empath at 7:53 PM on January 1, 2011


It seems to me that in most of these word-games, there is no math problem. There is a semantics problem. Using terms like 'family', 'girl', 'boy', 'country', etc., only serve to confuse the reader and bring all sorts of unnecessary complications related to human emotion, culture and society. If this were really a question about families and children, the mathematical simulations would make no sense. So statements along the lines of "in the real world" are irrelevant. Landsburg is playing a game with (a trick on?) readers, dressing up a pure math problem in confusing English-language words.

If we were told a specific number of families and asked to get the result, we can do it, no problem, no questions asked. If the number of families is assumed to be infinite, likewise. Like others have pointed out above, Landsburg and Motl are both right, it is just that they are answering different questions. Therefore it seems clear to me that any confusion arises not from the math itself, but from the phrasing of the problem.
posted by jet_manifesto at 7:54 PM on January 1, 2011 [2 favorites]


Man lots of folks in this thread are gonna be rich! Share the wealth!
posted by lazaruslong at 8:02 PM on January 1, 2011


Landsburg is right. However, as the number of families in the population increases, the proportion of girls approaches 0.5. Most people who imagine the original problem are imagining a large population, so they are reasonable in saying "about 50%".

Here's my R code for the simulation. This version simulations 1,000 populations of 100 families each. It's quite close to 0.5. However, change the number of families to 4, and it's closer to 0.44, as Landsburg suggests.

npops<>0) {
nboys<-nboys+1;
exitfam<- 1;
}
if (junk<0) {
ngirls<-ngirls+1;
}
}
}
popratios[j]<- ngirls/(nboys+ngirls);
}

mean(popratios)
posted by mikeand1 at 8:08 PM on January 1, 2011 [1 favorite]


Dammit -- Metafilter does NOT like R code!
posted by mikeand1 at 8:10 PM on January 1, 2011


. . . But since Lubos seems unable to follow the mathematics . . .

Perhaps Luboš Motl is wrong, but this sneer accomplishes nothing but the diminution of Steve Landsburg.
posted by Neiltupper at 8:19 PM on January 1, 2011


Here's my ruby code! Like everyone else, it converges very close to 50%.

def children
  rand(2) == 0 ? "M" : "F" + children
end

results = Hash.new(0)

10000000.times do
  children.each_char do |c|
    results[c] += 1
  end
end

p results
posted by anomie at 8:29 PM on January 1, 2011 [1 favorite]


eye of newt got it right. Also, hincandenza has the right take on it from the math POV.

Reminds me of one of those other famous interview questions: "Why is a manhole cover round?"

Motl answers the question by saying "So they can't be rotated to fall into the hole when you are pulling the lid up or putting it back in. Or something."

Landsburg is the guy who argues that it's not round at all, spheres are round, and a manhole cover is actually a cylinder, and hooboy, did you see how stupid Motl fell for that? That dumb Motl. He is so dumb.
posted by Xoebe at 8:32 PM on January 1, 2011 [3 favorites]


Here's my simulation in R for a population of only 4 families, showing Landsburg is correct:

http://i834.photobucket.com/albums/zz266/mikenmar1/Rcode1.png

Here it is for a population of 100 families; the ratio gets closer to 0.5:

http://i834.photobucket.com/albums/zz266/mikenmar1/Rcode2.png
posted by mikeand1 at 8:40 PM on January 1, 2011


Dammit, again. Here it is with the links active:

Here's my simulation in R for a population of only 4 families, showing Landsburg is correct:

http://i834.photobucket.com/albums/zz266/mikenmar1/Rcode1.png

Here it is for a population of 100 families; the ratio gets closer to 0.5:

http://i834.photobucket.com/albums/zz266/mikenmar1/Rcode2.png
posted by mikeand1 at 8:42 PM on January 1, 2011 [1 favorite]


This is the thing that puzzles me. Why is it so skewed for small populations? An individual population, sure, but for a large set of similarly seeded populations? I'm missing something. (That was probably explained above, I know. Bear with me.)

A population of 4 families will have 4 boys. It needs 4 girls to get the correct 50%. There are an infinite number of possibilities for how those families will turn out, and but there are a finite number of possibilities that generate three or less. By my math, those finite number of possibilities account for 49.21875% of the total possibilities. Those where there are exactly 4 girls account for 27.34375%. Those where there are more than 5 girls are thus the remaining 23.4375%, but to get a set of four families with five girls is a 1/512 chance - among 3000 tests, it would have come up five or six times. Six girls is 1/1024 - less than three occurrences in his test. The numbers just aren't going to come up, and it's going to skew towards less women.

Of course, now I'm running tests with millions of iterations, not 3000, and my numbers are coming much closer to the predicted "4 boys, average of 4 girls". So part of this might be he's got a shitty RNG.
posted by kafziel at 8:42 PM on January 1, 2011 [1 favorite]


For reference, my Python code.
posted by kafziel at 8:46 PM on January 1, 2011


I only have my iPad, anyone interested in graphing the results with 4 families through several million families to see the asymptotic approach to 50%.
posted by Ad hominem at 8:49 PM on January 1, 2011


What if you wanted to keep some extra manhole covers down in the tunnel?
posted by ctmf at 8:52 PM on January 1, 2011 [2 favorites]


I'm pretty sure I solved the same problem, or at least sort of solved it, a week ago, when I was thinking about the beginning of Rosencrantz & Guildenstern are Dead, when Rosencrantz keeps flipping a coin and getting heads. I was trying to figure out what the average number of heads you'd get if you kept flipping as long as you kept getting heads, and stop when you get tails. Did it in my head for a while and then wrote a ruby program to figure it out...

I saw it converging on an average of 1, as you did a larger and larger number of trial flip-sequences.

I think that means I'm on Google's side of this.
posted by edheil at 8:54 PM on January 1, 2011


What if you wanted to keep some extra manhole covers down in the tunnel?

Drop them down the storm drains and pull them back up later?
posted by Ad hominem at 8:57 PM on January 1, 2011


Let Z be the space including x, y, z as well as one additional dimension ℵ, chosen such that the manhole cover has 2r less than that of the manhole.
Let R⊥ℵ be a 90° rotation about the ℵ axis, and permit this operation only to city public works employees.
Then apply the R⊥ℵ operation to any manhole covers needing to be stored, to transport them across the 3-space manhole. Before attempting to use any such transformed manhole covers to close a manhole, be sure that a public works employee applies the inverse transform R⊥ℵ-1.
posted by nervousfritz at 9:13 PM on January 1, 2011 [1 favorite]


"I only have my iPad, anyone interested in graphing the results with 4 families through several million families to see the asymptotic approach to 50%."

Here's the plot for family size ranging from 1 to 100.
posted by mikeand1 at 9:25 PM on January 1, 2011 [1 favorite]


Very cool, thank you!
posted by Ad hominem at 9:35 PM on January 1, 2011


So, some people are getting Landsburg's incorrect number in their own models, and some people are getting the correct 50%. I have to wonder if there's just some fundamental difference in how we're modeling it.
posted by kafziel at 9:35 PM on January 1, 2011


Presumably, some people are solving the problem whose correct answer is Landsburg's answer, and some of you are solving the problem whose correct answer is 50%.
posted by escabeche at 9:40 PM on January 1, 2011 [6 favorites]


Clarifying, above: I just wanted to point out that even population size isn't the only complication, if we're taking this as an on-the-level question about reality (as opposed to a more theoretical one about the math): if the odds of conceiving girls or boys can change over time, too, then the "as of when?" question also matters.

I've heard, for example, of species of animals that are more or less likely to have off-spring of a particular sex depending on the relative abundance or scarcity of food. Isn't it reasonable to assume there are (or at some point in time could be), similar kinds of effects in human populations? If so, do I take my mathematical snapshot of the country during its 30-year famine or not? Or is the time frame assumed to be infinite, too?

The problem with simulating this stuff is that the simulations will only ever be as complete as our understanding of the problem, and real-world problems are extremely contingent on context. In real life, flipping a boolean on and off simply isn't the same thing as giving birth. So any especially precise or absolute answer is always going to have to frame the problem in an extremely well qualified way.
posted by saulgoodman at 9:41 PM on January 1, 2011


"For reference, my Python code."

I see your problem. You need to reset m and f to be 0 inside the loop on iterations, as well as calculating mt and ft inside the loop. Otherwise, it's no different than calculating the ratio for a population having a number of families equal to families*iterations.
posted by mikeand1 at 9:50 PM on January 1, 2011 [1 favorite]


kafziel, try this:

Before you enter the first loop, make a vector having iterations elements. After the second loop (on the number of families), calculate f/(m+f) and assign it to the counter-th element of the vector.

Then at the end of the program, take the average of the elements of the vector. You'll see that family size definitely makes a difference, and you should get Landsburg's number.
posted by mikeand1 at 10:01 PM on January 1, 2011


It’s a very good task to find and throw away the people who will get distracted from common sense and from simple, fundamental math arguments by noise and who will immediately start to think about complicated yet irrelevant technicalities – which is exactly what you did which is why you couldn’t work at Google but you instead work in the Academia that often supports this contrived way of thinking that is detached from the reality and everything important in it.

Wow, way to tout the business world methodology, to heap praise and worship on answers being shouted from the bandwagon with the brightest tinsel and loudest horns, while insulting anybody who takes an opposing position. Lookout! Someone is being distracted by irrelevant technicalities! Let's dismiss their argument with poorly applied generalizations which allow us to be blissfully unaware of possibly fatal flaws in our own positions!

I'm not saying I know who is correct here, (with so little information I doubt it is possible to be exact), but this is very annoying.
posted by SomeOneElse at 10:05 PM on January 1, 2011 [3 favorites]


Lansburg cleverly stipulates 4 families in his bet, that number was not in the original question. But he also states that the number will never be 50%.

Like I said I am no mathmatician but the Mathoverflow answer predicts Landsburg's results for small populations and states that the answer will approach 50% as the population approaches infinity. Does the graph ever intersect .5? I suspect Lansburg is correct and the equation is asymptotic and the answer will be infinitely close to 50%.

Of course none of this is really in the spirit of the google question. Since it is a thought experiment.
posted by Ad hominem at 10:10 PM on January 1, 2011


Here it is for a population of 100 families; the ratio gets closer to 0.5:

Note that this is what "approaches .5" means. It also means that it may never actually reach .5% -- possibly at ∞, possibly not.

In either case, Landsburg is *exactly* correct. The answer is never 50%, unless your country has infinite population, because .4999...999 ≠ .5.*

So, if you answered "About 50%", I'd mark it as correct. If you answered exactly 50%, I'd mark it as exactly wrong. The real lesson here is don't assert exactitude without proof and need.

But we're not actually answering the stated question!

Now take into account that all the families aren't complete. That is, that there are a significant fraction of families at any given moment without a son. Not many countries just stop having kids, after all, so a country with this particular social oddness is going to consist of two sets of family, the set of families with sons, who have stopped having kids, and the set of families who have not yet had a son.**. The problem, as worked, assumes that all families are complete, you have families with B, GB, GGB, etc., however, if you were taking a snapshot in time, there would be a number of G, GG, GGG, etc. families who were still working on having a son. The problem, as stated, assumes that you are taking a snapshot in time -- that is, right now.

So the real answer is "there will be more girls than boys, until everyone stops having kids, but only ends after they have a son, then the answer will be about 50%. Calculating the exact ratio involves so many assumptions that any asserted answer is probably indefensible. Of course, then the entire country dies out in about a century, because nobody's having kids."

So, the Real Real Lesson from Google asking this is "People are lousy at writing specifications...and reading them"

To which, I can only add, "duh."




* Do understand the difference between .4999...999 and .499999....

** Oh, and the set of Non Families, defined as couples without children***....and the set of Non Coupled individuals. Let's not forget them.

*** And this already included infertile couples****

**** And gay couples are, as a couple, infertile, so they're already covered as well.*****

***** See what I mean about assumptions here? We haven't even gotten into adoption. Would anyone adopt a daughter?
posted by eriko at 10:16 PM on January 1, 2011 [1 favorite]


Deep in the thread Landsburg seems to have been convinced why he's wrong. The discrepancy hinges on the "terminal half-boy" -- that the simulation always ends in a boy. For very small samples -- Landsburg's four-family nation for instance -- the effect of the terminal half-boy is fairly large.

But real countries don't have this problem; they don't suddenly "end," they keep going on and on with grandchildren and great-grandchildren and so on. And without the distortion introduced by arbitrarily ending with a terminal half-boy, the ratio stays 50-50...
posted by gerryblog at 10:17 PM on January 1, 2011 [2 favorites]


For those (like mikeand1) who've done Monte Carlo simulations: what's the variance look like @ N=4? I'm curious, but don't have any tools to hand. (I also should be able to figure it out - but hey, it's a sunny Sunday afternoon ;-).

empath : "How is this different than the following experiment?"

It's not, it's exactly the same. You're just running each individual family, rather than each individual 'round of births', consecutively rather than concurrently. In the end, once you aggregate your data (i.e. N = number of families, ratio = total number of boy-childs : total number of girl-childs), things work out exactly the same either way.

"What is the ratio that you'd expect to see?"

That depends on how many times you stop, draw a line, and start again with a new "family". If you do that 4 times, you'd expect to see (cribbing mikeand1's results) 44 boys:56 girls, give or take the variance I ask for above. The more families you throw in, the closer that will approach 50:50. At an infinite number of families (and ignoring the apparently mathematically-ludicrous but biologically-important case of it being impossible to have < 1 boy or girl once you get any closer than 49:51 ;-), you'll be as close as damnit to 50:50.

(On preview: see mikeand1's response here. Wiggles in the graph are due to random variations from running it 3000 times for each parental population.)

If (outside the effect of finite resolution in your model or system) you actually reach 50:50, then congratulations - you've also reached infinity. Enjoy the walk back ;-)

saulgoodman: Welcome to the world of ecologists ;-) Compared to the environmentally- and biologically-caused fluctuations you mention, estimating and modelling non-deterministic randomness (i.e. stochasticity) is a doddle. You just guess, pray, and include it in your calculation of variance ;-)

Now, let me tell you about spherical cows…
posted by Pinback at 10:17 PM on January 1, 2011


Reading on I see that Landsburg doesn't actually concede that he's wrong, so much as understand better what's driving his (bad) result.
posted by gerryblog at 10:20 PM on January 1, 2011


Or here's Motl explaining the same thing:

However, the picture of “k” families that Steve promotes has another lethal bug that leads to a different number smaller than the right result which is 50%.

The bug is that at the end of his experiments, his “nation” is composed of families that have already have their sons – and that haven’t had any children for a long time. And the reproduction in each family is controlled by the stopping rule so that girls never come after boys. Consequently, sons’ average age is always lower than daughters’ average age at the end of the experiment.

But this is only possible because his “experiment” is not a sustainable society. It is a one-attempt episode of a group of people who go extinct. In a sustainable society with any rules, the average age of boys is equal to the average age of girls, whether or not their numbers coincide. This statement is directly contradicted by the finite-time experiments.

If one has a sustainable society, there will also be boys who are older than girls – because new families may come into being that produce younger girls than the boys who already lived when the girls were born. A sustainable society always has the same average age of girls and boys, and when you use this fact in a correct calculation, you will find out that the percentage of the boys and girls are exactly equal as well, namely 50% each.

The problem is not that the number of families in Steve’s model is finite: every nation has a finite number of families, after all. The problem is that he doesn’t allow new families to be created so he introduces a bias to the age of the couples who can have children – a bias that can’t arise in a real country. This bias is a source of the incorrect asymmetry between boys and girls. The whole asymmetry only arises because the society stops reproducing after a very short time.

posted by gerryblog at 10:23 PM on January 1, 2011


So he is convinced that for infinite populations the answer is 50% and the equation is not asymptotic? Very cool.
posted by Ad hominem at 10:30 PM on January 1, 2011


Here's a similar problem with the confusing "when do we stop having children" algorithm stripped out.

Suppose families have three children: a boy, then a girl, then a third child of random (50/50) sex. What is the expected ratio of girls to boys in such a family?

The intuitive answer is E[G]/E[B], which is ((3+2)/2)/((2+3)/2), which is 1.

The literal answer is E[G/B], which is ((2/1)+(1/2))/2, which is 1.25. This answer is correct by the definition of expectation, but it feels nonsensical, obviously. The expected ratio of girls to boys and the expected ratio of boys to girls are both greater than one! If I'd just specified ordinary families without throwing in a fixed boy+girl pair at the start, the expected ratios would both be infinite!

Asking for a ratio of girls to total population gives a problem that's a bit harder to turn nonintuitive (and hence you need to add in the weird Google postulates, plus a finite population assumption, plus an additional assumption that Landsburg doesn't seem to realize he's making, to make it crazy), but the paradox still seems to wrap around the fact that our brains expect statistical expectation and simple nonlinear functions to commute, but they usually don't.
posted by roystgnr at 10:36 PM on January 1, 2011 [4 favorites]


kafziel, try this:

Before you enter the first loop, make a vector having iterations elements. After the second loop (on the number of families), calculate f/(m+f) and assign it to the counter-th element of the vector.

Then at the end of the program, take the average of the elements of the vector. You'll see that family size definitely makes a difference, and you should get Landsburg's number.


Indeed I do. So, back to what I said earlier: he is setting arbitrary conditions under which the experiment breaks down, due to tiny sample size. There's a reason you don't do studies on the patterns of disease spread with only five subjects.
posted by kafziel at 10:37 PM on January 1, 2011


"The whole asymmetry only arises because the society stops reproducing after a very short time."

Yeah, but that's exactly what the definition of the problem specifies. It says nothing about creating sustainable populations. Motl's just trying to cover up for the fact that he's wrong.
posted by mikeand1 at 10:38 PM on January 1, 2011 [2 favorites]


"Indeed I do."

I don't think you do.

Unless I'm misunderstanding your code (I don't know Python, and there's no "end" in the loop syntax, so I'm guessing about where the loops end), you never actually calculate the f/(m+f) ratio for a population of four families.

It looks to me like you just keep adding up the m's and f's and calculate the ratio at the end.
posted by mikeand1 at 10:41 PM on January 1, 2011


"The whole asymmetry only arises because the society stops reproducing after a very short time."

Yeah, but that's exactly what the definition of the problem specifies.


No it doesn't. There's nothing in the problem that suggests a single-generation "nation" seeded from some finite number of families with no grandchildren. Landsburg introduced that arbitrary assumption to make the problem easier to simulate, and now he's clinging to the inaccurate model to cover up for that fact that *he's* wrong.

Nations don't terminate after one generation. Kids grow up and have kids too. The coin never stops flipping.
posted by gerryblog at 10:43 PM on January 1, 2011


I don't get it. This simulation yields 50%. I don't understand how it possibly is different than the originally posed problem. https://gist.github.com/762357
posted by brendano at 10:53 PM on January 1, 2011


"Nations don't terminate after one generation."

That's just another way of saying that the number of families in a population is never as small as four. In which case the answer is that the ratio approaches 0.5, but it never reaches it.

How big do you want the country to be, if you want this to be more real-worldish? A billion families? Guess what, the answer is still not 0.5. Close to it, but not 0.5.

Besides, it's a hypothetical, not a real world of real nations. Landsburg is simply (and correctly) pointing out that the answer depends on the number of families in your hypothetical nation.
posted by mikeand1 at 10:54 PM on January 1, 2011


"I don't get it. This simulation yields 50%."

You're using a population of 100,000 families, in which case it should be close to 0.5. Try using Nfam = 4. What do you get? Repeat your program over and over, and take the average of your results. (Or stick it in another big loop, and calculate the average automatically.)
posted by mikeand1 at 10:58 PM on January 1, 2011 [1 favorite]


Oops, posted too soon. Yes.
posted by brendano at 11:03 PM on January 1, 2011


That's just another way of saying that the number of families in a population is never as small as four.

No, it's not; the problem with Landsburg's answer isn't that the size of the single-generation population size isn't infinite. (An infinite sample size would also yield 50%, for what it's worth, but that isn't the problem being asked either.)

The problem is that a discrete, single-generation model always terminates in a half-boy when the last family "completes" its reproduction. But such termination is a *new assumption* that Landsburg himself has introduced into the problem in pursuit of trying to simulate it. If you don't assume a single-generation nation, then there is never a terminal half-boy, and there is no expected difference from .5.

Landsburg's answer is only right if you assume an arbitrary synchronic limit not specified in the original problem.
posted by gerryblog at 11:21 PM on January 1, 2011


Heh, I figured it was 1:1 but suspected that Landsburg would disagree, and when I clicked through I was delighted to see that he did. When I saw that he was pulling apart the notion of the expected ratio of boys to girls and the the notion of ratio of expected boys to expected girls, I cackled. That's wonderful. That is exactly the kind of subtlety that I expect from him. I am going to try to remember and internalize the differences between these two notions. It seems to me that most people who disagree with him just aren't noticing the difference between these two (although I haven't looked to closely at the comments threads).

I've enjoyed Landsburg's past books immensely. He's a very clever man, although he tends to be pretty flippant. He mostly cares about pure math, so it's hard to tell when he's serious about some crazy proposed application of the math and when he's just taking the piss. (I don't think he really thought that the government should offer a bounty on used condoms, even though he's dead serious about the economic model that led him to propose it.) He's really good at discovering and pointing out common fallacies in cost-benefit analyses and probabilistic reasoning. Some of his reinterpretations of economic games are fiendishly clever.
posted by painquale at 11:27 PM on January 1, 2011 [1 favorite]


"If you don't assume a single-generation nation..."

What difference does it make to introduce multiple generations? As long as subsequent generations follow the same stopping rule, and as long as the number of families is finite, there is always some asymmetry. All you're doing is spreading the families out over time.
posted by mikeand1 at 11:27 PM on January 1, 2011


I mean, I guess in a way that's another way of saying that the number of families in a population is never as small as four, in the sense that a sequence that doesn't terminate can be said to be "infinite."

But the point is that the number of families in a population isn't something you can just draw a line under and say "okay, that's everyone." Landsburg is assuming a discrete population in which everyone starts reproducing at the same moment and all reproduction terminates after one generation; that's a complete misstatement of what we mean when we say "In a country in which people only want boys every family continues to have children until they have a boy."
posted by gerryblog at 11:33 PM on January 1, 2011


"But the point is that the number of families in a population isn't something you can just draw a line under and say "okay, that's everyone.""

Why not? Otherwise, all you're doing is stretching the number of families out to infinity, in which case I agree, it's 0.5 (although I'd point out that the idea of an infinite number of families is far less real-worldish than a finite number, if we're going to consider real worlds).


"Landsburg is assuming a discrete population in which everyone starts reproducing at the same moment and all reproduction terminates after one generation"


No he isn't. There's nothing in his formulation that says anything about when they reproduce, or from what generation they come. The only assumption is that there is a finite number of families, all of whom follow the stopping rule. And if you stick to those criteria, then yes, sooner or later the population stops reproducing. So what?
posted by mikeand1 at 11:43 PM on January 1, 2011


What difference does it make to introduce multiple generations? As long as subsequent generations follow the same stopping rule, and as long as the number of families is finite, there is always some asymmetry.

Again, the asymmetry is only introduced because you artificially designate a "last" baby born in the country (the terminal half-boy), terminating the sequence arbitrarily. The terminal half-boy is the source of the difference from .5.

If there is no last boy the ratio will stay .5.

We have no reason in the problem as stated to assume a society that goes extinct after a single generation. As long as people keep reproducing and these is no terminal half-boy, the proportion of boys to girls in the country will stay 1 to 1.
posted by gerryblog at 11:48 PM on January 1, 2011


"Again, the asymmetry is only introduced because you artificially designate a "last" baby born in the country"

Why is that artificial?

"If there is no last boy the ratio will stay .5."


And if there is no last boy, then the population will continue to reproduce infinitely.

Look, I agree that the hypo is under-specified. But why is finite reproduction somehow more "artificial" than infinite reproduction?
posted by mikeand1 at 11:53 PM on January 1, 2011


There's nothing in his formulation that says anything about when they reproduce, or from what generation they come. He writes:
We’ll randomly choose five graduate students in computer science from among the top ten American university departments of computer science and have them write simulations for a country starting with, say, four couples, each having one child per year and stopping when they have a boy. We’ll let this run for a simulated 30 years and then compute the fraction of girls in the population.
He specifies in the comments that there are indeed only four couples having children.

So yes, that assumes a single-generation nation that terminates reproduction altogether after the last of the four couples has their first boy. But the question as written only uses the much less specific notion of a "country" in which "every family continues to have children until they have a boy" -- a concept that would include, for instance, generations of children and great-grandchildren, as well a family trying for nine decades to get a boy before it finally does, as well as the surplus boy from generation two marrying a nice young woman from generation four and having x children of his own, and so forth.

In a non-terminating sequence in which generations are not discrete and the population as a whole never stops reproducing, you can't artificially designate some "last child" -- and if you can't designate a last child, you don't have any discrepancy from .5.

Landsburg has incorrectly introduced a new assumption into the problem, and then declared the original question wrong as a result. The original question doesn't assume the temporal or generational limits he does.
posted by gerryblog at 12:04 AM on January 2, 2011 [1 favorite]


You're not even wrong gerryblog.

The purest statement makes no assumptions, save two strongly implied: the number of families is infinite/number of births per family is infinite, and that the gender of a given child is exactly 50/50. Under that model it's 50% when all families finish reproducing.

Landsburg adds one and only one wrinkle: the number of families is not infinite. This yields his modified formula which generates a ratio asymptotically less than 50%.

The other 200 posts in this thread are valid real-world modifications or limiters, including your own. Adding in an arbitrary sampling point unveils that the iterations take an infinite number of gestations to reach 50%; adding in generations prevents a stopping point from being clearly defined.

Generally, the simplest rule is the only meaningful one: the proportion of genders is exactly equal to the probability of each gender at birth, since they are treated as distinct events. Without constraints, this will hold true. With constraints this will vary. The Google statement assumes no limiters but the stopping rule per family, which this proves moot in an infinite population. Landsburg presumes a finite number of families but otherwise single generation; Motl adds in a number of real world presumptions. Metafilter added many more. Every constraint alters our actual (well, expected/simulated) proportions of genders in their own way, preventing us from hitting 50% exactly.

This could not be clearer to me, and I'm baffled at this thread.
posted by hincandenza at 12:08 AM on January 2, 2011 [2 favorites]


But why is finite reproduction somehow more "artificial" than infinite reproduction?

Because it introduces starting assumptions and temporal limitations not specified by the problem. We're obviously not supposed to start from Adam and Eve and ascertain what the population of the country is in 1536. Likewise, Landsburg's answer starts at f=4 and t=0 and tells us the population at t=1. But that's not what the question asks; that's his own problem that he invented himself.

Because the question doesn't assume any particular beginning or particular ending to the sequence of births in the country, it can only be asking for a general result. And that answer can only be the one we get from the nonterminating sequence: half boys, half girls, on average over time.
posted by gerryblog at 12:12 AM on January 2, 2011


"For those (like mikeand1) who've done Monte Carlo simulations: what's the variance look like @ N=4?"

Here's the distribution for N=4 at a family size of four.

Note that it's skewed.
posted by mikeand1 at 12:15 AM on January 2, 2011


"So yes, that assumes a single-generation nation that terminates reproduction altogether after the last of the four couples has their first boy."


Sorry, you're right about that -- I was thinking of the original problem (as formulated by Google), not his formulation of it for simulation purposes. There's nothing about generations or time in the original formulation, nor in the expectation-based calculation that he makes.

My point is that it isn't the generations or time that's the issue, it's the finite population versus the infinite population that matters. As to which is more natural or artificial, I guess we'll have just to disagree about that.

But Landsburg has greater insight to the problem precisely because he figures out that it makes a difference if we assume a finite population.

If I were still a professor teaching probability theory, I would give an extra point to him for noticing, as compared to a student who automatically assumed an infinite population and answered "0.5".

However, I'd then subtract a point for his arrogance!
posted by mikeand1 at 12:26 AM on January 2, 2011


"Because it introduces starting assumptions and temporal limitations not specified by the problem."

The problem specifies neither a finite nor an infinite population. Assuming either one is introducing an unspecified assumption, it seems to me. The best answer realizes it makes a difference, and gives both answers (which Landsburg essentially did).
posted by mikeand1 at 12:31 AM on January 2, 2011 [1 favorite]


hincandenza: Landsburg adds one and only one wrinkle: the number of families is not infinite.

Well, he also adds (at least) (1) that families always consist of two people from the same generation and (2) that people can only belong to one reproducing family in their lifetimes. From these assumptions taken together he can then designate the "last baby" born to a particular generation.

But none of those assumptions originate within the question. And outside those assumptions you can't designate some arbitrary last baby, which means you have no deviation from .5.
posted by gerryblog at 12:32 AM on January 2, 2011


"Well, he also adds (at least) (1) that families always consist of two people from the same generation and (2) that people can only belong to one reproducing family in their lifetimes."


He's just doing this to make it easier to think about. The generational part -- or who can belong to what family -- makes no difference.

Make your simulation as generational as you want. Let your families get divorced and remarried if you want. Heck, let them gay marry and reproduce by artificial insemination. Let their children and grandchildren and great-grandchildren do all that too.

None of that matters. As long as every family follows the stopping rule, and the number of families is finite, the expected ratio will be less than 0.5.
posted by mikeand1 at 12:36 AM on January 2, 2011


He's just doing this to make it easier to think about.

Well, we've been round and round on this and we still don't agree. I know the intent is to simplify the problem for easier simulation, but in doing so (I contend, and I'm not alone!) he introduces the artificial "last baby" that creates the entire discrepancy he subsequently uncovers. You haven't shown me how a discrepancy from .5 can arise when we don't arbitrarily designate a last baby.

In any event, it's been nice arguing with you; I've really got to go to bed.
posted by gerryblog at 12:52 AM on January 2, 2011 [1 favorite]


Plane takes off?
posted by whyareyouatriangle at 12:55 AM on January 2, 2011


The last baby isn't "artificial" at all if the population is assumed to be finite. There must be a last baby in a finite population.


"You haven't shown me how a discrepancy from .5 can arise when we don't arbitrarily designate a last baby."


And you haven't shown me how you can get to 0.5 unless you assume the population reproduces infinitely!

Have a good evening.
posted by mikeand1 at 12:59 AM on January 2, 2011


Even if the number of families is infinite, as long as the kids get married, both answers are going to be wrong.

Given these rules, the population growth is going to approach an exponential rate, so, even assuming immortality and infinitely fertile people, you are always going to have the same proportion of families in the "young enough not to have had a boy yet" range. So there will be more girls than boys in a steady-state solution. And given the question as asked, trying to find a steady state solution that a large enough country might approach, given long enough is the only interpretation that even tries to make sense. Anything else is adding constraints to the problem.

Heck, if you don't add in a delay before the children become reproducing adults, I suspect there is a fairly exact solution, independent of fertility rate.
posted by Zalzidrax at 1:00 AM on January 2, 2011


"Heck, if you don't add in a delay before the children become reproducing adults, I suspect there is a fairly exact solution, independent of fertility rate."

Yes: if everybody and their children reproduce without any delay, the population immediately goes to infinity. :^)
posted by mikeand1 at 1:13 AM on January 2, 2011


Hmmm...

The expected value for number of children per couple is two, I believe. (1/2 chance of 1 child + 1/4 chance of two children + ... gives an expected value of:
1*1/2 + 2*1/4 + 3*1/8 + 4*1/16 + ...
which tends quickly to 2.)
This is less than the 2.1 child replacement rate, which would mean that the population of this country would, in time, tend to zero.
posted by kaibutsu at 3:22 AM on January 2, 2011 [2 favorites]


Landsburg is being kind of a dick about this. Expectation is a weird mathematical concept that I'm not sure applies here. It certainly isn't mentioned in the original question. Then, if i read it right, he adds a ratio of mathematical expectations versus expectation of ratio red-herring to the mix.
posted by gjc at 3:29 AM on January 2, 2011


I would not be impressed by any interviewee who answered a question like this by gut feel. A thoughtful person can see that it needs a moment's thought. I would expect an interviewee to be able to identify the reason for their answer.

Here's how it seems to me (assuming an initial breeding population of P and a 50% chance of boy or girl in any birth):

1. The first generation has P/2 boys and P/2 girls.
2. The second generation has P/4 boys and P/4 girls.
3. The third generation has P/8 boys and P/8 girls.
4. The fourth generation has P/16 boys and P/16 girls.

So it doesn't matter how many generations we have, after any number of generations, N, the number of boys and girls will be the same: the sum for N of P/(2 to the power of N)

I cannot imagine Google intended its interviewees to factor in things like the slight difference in birth rates between girls and boys. That is not what questions like this are about and in fact I would count it as a distinct negative mark against any interviewee who failed to see that, because it would suggest to me that they were the sort of person who does not see the wood for the trees.
posted by Decani at 4:30 AM on January 2, 2011


It's an interesting question, but I'm pretty sure that questions like this are asked not to see if someone gets the right answer, but to see how they approach the problem. You ask someone a problem that requires thought, and you get to see how they tackle it while under pressure.
posted by molecicco at 4:33 AM on January 2, 2011


Even before I looked it up on wikipedia I knew humans have more male children AT BIRTH than females.
So don't all these proofs that presume a 50/50 chance like a coin toss fail by default?
http://en.wikipedia.org/wiki/Human_sex_ratio
posted by zog at 5:04 AM on January 2, 2011 [1 favorite]


What if every couple wanted to have a girl, so if they had a boy first, they would keep trying until they had a girl?
posted by snofoam at 6:02 AM on January 2, 2011


The last baby isn't "artificial" at all if the population is assumed to be finite. There must be a last baby in a finite population.

That's not true, is it? I mean, not in the sense that we're using "last baby". Our last baby terminates the simulation, and is always a boy. But, assuming a country like this could actually exist, the sequence would be terminated when people got tired of the custom, or when an asteroid struck the country, or by the heat death of the universe; and the last baby wouldn't necessarily be a boy.
posted by steambadger at 7:03 AM on January 2, 2011 [1 favorite]


I would not be impressed by any interviewee who answered a question like this by gut feel.

I'm happy that a few thoughtful people in this thread took the time to write out simulation code — it not only gives good approximations to the correct answer, but it clearly spells out the assumptions underlying the question. So, cheers to those folks.
posted by Blazecock Pileon at 7:08 AM on January 2, 2011


Having slept for several wonderful hours, I hesitate to return to this thread, but here I am anyway.

There must be a last baby in a finite population.

Depends on what you mean. If by "last" you mean "most recent," sure, there must be -- but the gender of the most recent baby isn't determined by the stopping rule but by random chance. There's only a "last" baby in the sense of a "terminal" baby when we assume (as Landsburg does for the purposes of his model) a population that "completes" its reproduction and then reproduces no more forever. That's why I'm arguing his attempted simplification is actually introducing the very deviation he finds.

Imagine the following variations on the problem:
In a country in which people only want boys every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. One day a comet hits and kills everyone in the world. What was the proportion of boys to girls in the country on the last day?

In a country in which people only want boys every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. One day the entire population signs a suicide pact agreeing that no born after a certain date will reproduce? What is the proportion of boys to girls in the country on the day the last baby is born?

Landsburg's model only models the second situation: a situation in which a last baby whose gender identity was determined by the stopping rule was born. That's a completely artificial situation that doesn't map onto the question even when we remain steadfastly agnostic about things like population size, finitude, and time.

In the first case the nonterminating sequence of coin flips was still ongoing when people unexpectedly stopped having babies, and so the exact proportion will be unknowable -- making the only possible answer "1:1 on average over time."

At least that's how I see it.

I think another way to get at the artificial added constraint implicit in Landsburg's model is to look at what Zalzidrax was saying, that in an arbitrarily cut-off sequence (terminated randomly, prior to completion by the single-generation stopping rule) you would expect to have extra half-girls. So the finite vs. infinite thing we got bogged down in last night is a little bit of a red herring, as Landsburg's slight advantage to males doesn't generate the right answer for a randomly terminating finite population either.

The problem isn't that Landsburg assumes a finite population; it's that he assumes a "last baby" whose identity is determined not by chance but by a rule he himself had introduced into the problem. That's not found in the problem, only in his model of it.
posted by gerryblog at 7:18 AM on January 2, 2011 [2 favorites]


On preview: what steambadger said.
posted by gerryblog at 7:19 AM on January 2, 2011


Google should be asking interviewees to come up with some code that abstracts the correct answer from these metafilter comments...
posted by 445supermag at 7:59 AM on January 2, 2011


This was already touched on above.

Answer: Very close to 100%. Agriculture ultrasounds are around $3K. Do sexing of the fetus and abort with a hormone shot, rinse and repeat. Should be a a few dollars per go around for a population of a medium sized village. Why it would not be 100% is error in hormone shot (ie not aborting) and getting the sexing wrong.

This all depends on what "only want" means and the financial costs people will go to for "only want". I doubt the morality of abortion enters into "only want", but who knows.

Any answer that ignores sex selective abortion is not realistic because it is occurring every minute of the day in certain countries.
posted by sety at 8:05 AM on January 2, 2011 [1 favorite]


All the graphed simulations I've looked at show that the ratio is so close to 1:1 that the difference between the real ratio and 1:1 is insignificant. And it only takes a population over 100 or so to get there. Is there any country in the world with less than 100 families?
posted by r_nebblesworthII at 9:12 AM on January 2, 2011


Gerryblog, you are saying if we implement a random stopping point we will get 1:1? So can we now get a value of .5 via simulation instead of the vanishingly close values we were getting just by adding more families?

The way I was thinking about it in my head was an infinite number of families reproducing at at once, not successive generations.
posted by Ad hominem at 9:29 AM on January 2, 2011


The expected value for number of children per couple is two, I believe . . . This is less than the 2.1 child replacement rate, which would mean that the population of this country would, in time, tend to zero.

Ding-ding-ding-ding-ding-ding-ding!

Here's the correct answer, and an interesting one to boot.

"What!?" you're saying, an outraged voice. "That doesn't answer the question posed, which was 'What is the proportion of boys to girls in the country?'"

Actually it does answer, and in a pretty concrete way.

Here is a perfectly legitimate way to interpret the question. Start with these assumptions:

1. A country with large population (millions to billions, say)
2. All families in the country start this practice now and the practice continues for an indefinite period (not necessarily infinite in the mathematic sense but certainly dozens, hundreds, or even thousands of generations--however long necessary to arrive at a clear answer to our question)
3. The 'stop at first boy baby' rule is followed rigorously

Just for simplicity, let's assume that the odds of any single birth being a boy vs a girl are exactly 50/50.

Finally, what is the actual question we are trying to answer?

Let's assume it is this: Does the proportion of male to female births, measured, let's say, in a table summarizing them by year, approach a steady state over time, and if so, what is the proportion?

One of the assumptions we all accept as reasonable, is that when you say 'country' you're talking about a pretty big population. So in looking at the stats for any particular year, we're talking about 'very large number of boy births' / 'very large number of girl births' = some percentage.

An unstated assumption that most of us accept without thinking too hard about it, is that this number, especially if large in the beginning, will only get larger in the future.

This is what gives us our 'hunting license' to go about the problem by summing infinite series, simulating the problem by stochastic methods (ie, do a very large number of trials & take the average), talking about 'expected value' or other ideas from probability theory (all of which rely in one way or another on the 'law of large numbers'), etc.

But what if this unstated assumption is wrong?

What if, as time goes forward, the number of births per year declines inexorably to zero?

Because the one thing we know about this problem, definitely, is that deaths are going to outnumber births, every year, by about 5%.

That means that in a surprisingly short period of time--something less than 30,000 years, whether you're talking the population of Samoa, the U.S., China, or the entire earth--the proportion we're looking at is not going to be 'very large number' / 'very large number' but 'small number' / 'small number' and finally 0/0.

At that point we can have a long and outraged discussions on MeFi about whether Motl's or Landsburg's way of thinking about the problem is more correct when the actual total population is 15 and declining.

A little later on, we can have a 900 post MetaTalk thread about the proper way to express the ratio 0/0 as a percentage.
posted by flug at 9:31 AM on January 2, 2011 [2 favorites]


Of course, ultimately the important skill in the candidate isn't that he can gut out the 50% number, or even that he can on further thought or experiment, see that it's wrong. It's whether he hand-waves it away as insignificant and gives it no more thought or if he finds that discrepancy interesting and wonders if there's a way to exploit that somehow. I'd think you might want the forest-for-the-trees focus on the practical answer for a manager, but the investigate-the-anomaly instinct for an engineer.
posted by ctmf at 9:41 AM on January 2, 2011


Well in the real world we would just hardcode .5 in with a comment // HACK fix this later.
posted by Ad hominem at 9:59 AM on January 2, 2011 [3 favorites]


"But, assuming a country like this could actually exist, the sequence would be terminated when people got tired of the custom, or when an asteroid struck the country, or by the heat death of the universe; and the last baby wouldn't necessarily be a boy."

In that case, the families have not all followed the stopping rule; that violates the conditions of the hypo.
posted by mikeand1 at 10:10 AM on January 2, 2011


"One day a comet hits and kills everyone in the world...."

"One day the entire population signs a suicide pact agreeing that no born after a certain date will reproduce..."


Then they have no longer followed the stopping rule. Again, that violates the primary condition of the hypo.
posted by mikeand1 at 10:13 AM on January 2, 2011


"This is less than the 2.1 child replacement rate, which would mean that the population of this country would, in time, tend to zero."

The only reason you need a 2.1 total fertility rate to continue the population is that some women die before having any children. The hypo says nothing about when people die, or whether they die at all for that matter.
posted by mikeand1 at 10:16 AM on January 2, 2011


"Landsburg is being kind of a dick about this. Expectation is a weird mathematical concept that I'm not sure applies here. It certainly isn't mentioned in the original question."

There's no other way to answer the question sensibly. If you disagree, how would you answer it?

Note that if you say, "simulate it a large number of times, and the law of large numbers will give you the right average", then you're saying the same thing. And I would suggest that this is also the most sensible answer.
posted by mikeand1 at 10:20 AM on January 2, 2011


Note that if you say, "simulate it a large number of times, and the law of large numbers will give you the right average", then you're saying the same thing. And I would suggest that this is also the most sensible answer.

But then you have to simulate it for a population in which children can also grow up and reproduce themselves, just like real people do, or else you've concocted an answer that only works for a magical country where no one ever has any grandchildren. For a place that (1) everyone always fulfills the stopping rule (2) has children that grow up and have kids of their own, the population goes to infinite and so the right answer is 1/2.

Which is why what you write here:
Let your families get divorced and remarried if you want. Heck, let them gay marry and reproduce by artificial insemination. Let their children and grandchildren and great-grandchildren do all that too.

None of that matters. As long as every family follows the stopping rule, and the number of families is finite, the expected ratio will be less than 0.5.
is wrong. If there's children and grandchildren and so on down the line, and every family follows the stopping rule, then the number of families *can't* be finite. Which means the answer is 1/2.

But this is where I came in, so I think I must be going.
posted by gerryblog at 10:50 AM on January 2, 2011


"If there's children and grandchildren and so on down the line..."

But I didn't say "and so on down the line." You added that part!
posted by mikeand1 at 11:04 AM on January 2, 2011


Can we talk about Zeno's paradox instead? Because I think there are some people in this thread that would argue that motion is impossible based on their understanding of limits.
posted by empath at 11:10 AM on January 2, 2011


Or whether 1.99999...... = 2 or not.
posted by empath at 11:10 AM on January 2, 2011


There's no other way to answer the question sensibly.

There are dozens of ways to answer the question sensibly. The original question says nothing about finite or infinite populations, nothing about the size of the starting population, and nothing about what, if any, real-world conditions should be taken into account (and note that restricting the population to a finite number is, itself, a real-world condition).

Recognizing that the answer is dependent on the unstated assumptions in the question is what's made this such a great thread -- far more interesting that the initial (rather simple) brainteaser, or even than Landsberg's nifty reframing.
posted by steambadger at 11:12 AM on January 2, 2011


Answer: The plane flies. The treadmill has no effect on airspeed.
posted by KirkJobSluder at 11:20 AM on January 2, 2011


"There are dozens of ways to answer the question sensibly."

OK, aside from using expected value, how would you calculate the answer? Specify as many of those conditions as you want -- finite/infinite, whatever.
posted by mikeand1 at 11:26 AM on January 2, 2011


Answer: The plane flies. The treadmill has no effect on airspeed.

Only if we assume there is zero friction between the wheels and their axles, allowing the wheels to spin completely freely at 2X speed.
posted by kafziel at 11:32 AM on January 2, 2011


OK, aside from using expected value, how would you calculate the answer?

That depends on why the question was being asked. Note that I didn't claim to have a better means of calculating the answer: my quibble was with your use of the word "sensibly". Apparently, Google was using the question to screen out people who couldn't count on their fingers, since that's all that's required to come up with the answer they were looking for; so, in that context, I'd consider "fifty percent" to be a sensible answer. Another sensible answer there would be "I don't know, but here's how I'd design a simulation to find out." Since my experience with expectation begins and ends at the poker table, if I were asked the question on a graduate-level probability exam, my only sensible answer would be "Holy shit! What am I doing in a graduate-level probability exam? Feet, do your duty!" In another context -- say, here on the Blue -- I'd consider many of answers above to be sensible.
posted by steambadger at 11:53 AM on January 2, 2011


There's also the folk tales (e.g. stress interviews) from places like Harvard Medical School in which an interviewer has nailed shut a window from the exterior and then asked nervous applicants to open the window as soon as they entered the room.
posted by ericb at 11:56 AM on January 2, 2011


Incidentally, I forgot to mention in my previous comment that I ran my original haskell simulator (ie, arbitrary numbers of families) while keeping track of the family with the longest streak of girls. In one simulation run of 5000000 families, the largest number of consecutive female births (with an assumed 50% probability of a female birth) was 22.
posted by adoarns at 12:15 PM on January 2, 2011


"Apparently, Google was using the question to screen out people who couldn't count on their fingers, since that's all that's required to come up with the answer they were looking for; so, in that context, I'd consider "fifty percent" to be a sensible answer."

When you answer 50%, you are using expected value, you just don't know it. (In your mind, you're probably trying to invoke the law of large numbers when you give this answer, but note that that won't get you *exactly* 50%.)

In other words, if you didn't invoke expected value, you'd say something like: "There's randomness involved, so there's not going to be a single point answer; rather, there's a distribution of possible outcomes." Which doesn't really give a satisfying answer to the question.

Note that here I reproduced this distribution for a population of four families and an N of 4. As you can see, it doesn't make sense to give a single point answer, because the actual results are all over the place.

The same thing is true regardless of the number of families, even an infinite number of families -- there's really a distribution of possible outcomes, not a single outcome.

So when you give a single answer, like 50%, you are intuitively, without knowing it, summarizing that distribution with the expected value. (Which is very close to saying that's the average of the distribution when you create it with a large number of simulations.)

Bottom line, there's no other way to give a single numerical answer.
posted by mikeand1 at 12:50 PM on January 2, 2011 [2 favorites]


Yes. There is. If you divide the expected number of boys by the expected population, you get 50%. Exactly 50%. This is not what Landsburg is doing. But it is what other people are doing in response to the Google question, which doesn't specify where or even whether expectation should be applied. Still, I see no reason to call Landsburg's response "incorrect" -- his interpretation of the Google question is also reasonable.
posted by escabeche at 1:12 PM on January 2, 2011


Any imbecile outside Google could have told you that putting everyone's Gmail contacts into Buzz would be a very bad idea. But could any genius within Google say this? They'd probably lose their jobs for going against the company. So they become idiots for job security.

Clearly you've never worked at Google. If they fired people for loudly voicing opinions that go against the OC/management, half the company wouldn't be there. I'm not saying top management always _listens_, but there is quite lively and impassioned debate on internal mailing lists about these kinds of things.

But of course, for a given product / feature, it's not like they will/won't do something just because other Googlers think it's a good/bad idea.

(Also, as has probably been said, we don't really use these kinds of questions, at least not for anything important)
posted by wildcrdj at 1:47 PM on January 2, 2011


Having exchanged periodic emails with Landsburg all day long over this, I'm just about ready to switch teams on the basis of his aggressively legalistic reading of the question as he poses it. His case is more or less predicated entirely on the idea that the phrase "fraction of girls" mandates an expected ratio calculation as opposed to an expected value; perhaps I've just been worn down, but I guess I can see his point.

"Proportion of boys to girls in the country" -- the OPP's version, which I was working with for most of the day -- still suggests expected value to me though.

I think I maybe just hate math now.
posted by gerryblog at 1:48 PM on January 2, 2011


Well, I should say we don't / aren't encouraged to use them for engineers. PMs and such might have more use for general thinking type questions, I don't know.
posted by wildcrdj at 1:50 PM on January 2, 2011


Again, remember that Landsburg is not claiming his answer is the right one to Google's question, just that his answer is the right one to his question! And of course he is right about this.

I think he also feels that his interpretation of Google's question is the one they had in mind, but there I think he's on shakier ground, given that the answer Google wanted was apparently the one that corresponds to the other interpretation. And in any event, "what was in Google interviewer's mind" is not a question we can answer by computation.
posted by escabeche at 1:52 PM on January 2, 2011


"Yes. There is. If you divide the expected number of boys by the expected population, you get 50%. Exactly 50%. This is not what Landsburg is doing."


You're using expected value. I'm saying if you want to give a single point answer, you have to use the concept of expected value.

You're responding to a response I made to another poster who claimed that Landsburg was being a dick by resorting to the expected value concept.
posted by mikeand1 at 2:23 PM on January 2, 2011


Whoops - this is the poster I was responding to.
posted by mikeand1 at 2:27 PM on January 2, 2011


This argument is stupid and pointless unless we're talking about a "country" the size of Vatican City, in which case hooooboy are there some other factors that need consideration.
posted by Dr.Enormous at 3:07 PM on January 2, 2011


You're using expected value. I'm saying if you want to give a single point answer, you have to use the concept of expected value.

Sure! I'm just saying that "using expected value" doesn't commit you to Landsburg's version of the question (and thus not to his answer.)
posted by escabeche at 3:10 PM on January 2, 2011


Even before I looked it up on wikipedia I knew humans have more male children AT BIRTH than females.
So don't all these proofs that presume a 50/50 chance like a coin toss fail by default?
http://en.wikipedia.org/wiki/Human_sex_ratio


Yes, which makes all the math nerd rage here pretty hilarious. The "Well it comes really close to 50% but..." position being argued back and forth here from a position of pure mathematics completely overlooks the real-world biological fact that human sex ratios are not naturally 50-50. For every 100 girls born, anywhere from 103 to 107 boys are born (it's higher in countries where girls are selectively aborted, but the ratio is still naturally skewed toward boys). That's the very first thing I would have mentioned if this question had been posed to me in an interview, which probably would have gotten my ass kicked out the door for not using my common sense and realizing this was a simple math question, dammit!
posted by Marla Singer at 3:15 PM on January 2, 2011


That's the very first thing I would have mentioned if this question had been posed to me in an interview, which probably would have gotten my ass kicked out the door for not using my common sense and realizing this was a simple math question, dammit!

Why is this assumption made? Does Google strike you as a company that values simplistic "common sense" solutions over realistic, novel analysis of real world data?
posted by dirigibleman at 3:32 PM on January 2, 2011


I sent this as a MeMail, but decide to post this here because I like the example.

We could simply ask "What's the area of a triangle?" The rightest answer is "ab/2", which is what Landsburg- well technically I guess Douglas Zare- has offered. The answer "6" only works if we have a 3/4/5 triangle on a plane.

Likewise the answer is only "exactly 50%" with an infinite population, or "basically 50%" for some value of k that rounds to 50% at or beyond meaningful significant figures. It so happens that most country populations (~15 million, with 2 orders of magnitude wiggle from the smallest to the largest) will having varying numbers of 9's after the decimal, all of which are a "good enough for government work" value of "basically 50%". So realistically the "It's basically 50%" is like saying "The area of a triangle is 6", because just about any realistic country is a 3/4/5 triangle. :) Landsburg's just pointing out "Hey, if you ever HAD a 5/12/13 country... this formula is more generalized".

He could have generalized further with a 1/n + ... to allow for a number of families k and a non-50% probability of male/female in a given birth expressed into n (where currently he and we *assume* 50% chance of each gender and thus use 1/2 + ...). That resultant formula would then work for all "perfect, finite" countries using no assumptions outside of the original question. Plug in families k and probability of male female birth of say 1.07:1 in favor of males as n and you'll yield a result that is effectively precise and accurate.

All the comments here read to me mostly as "Well, of course we can *assume* a 3/4/5 triangle" or "If I obscure part of the triangle I don't actually measure it's whole area, do I?" or other variations on that theme. Still others are postulating our triangle is on the surface of a saddle shape or a sphere. :)

In an interview, those observations and refinements would be beneficial to to make along with stating "But in the purest example, it's approaching but never quite 1:1 expected ratio using this formula" to demonstrate meaningful correctness as well as beanplate-worthy thinking.
posted by hincandenza at 3:53 PM on January 2, 2011 [2 favorites]


Why is this assumption made? Does Google strike you as a company that values simplistic "common sense" solutions over realistic, novel analysis of real world data?

Well, you have to get past HR to actually work there, right? And according to their little quiz of inane questions (which includes the ever-popular "Why are manhole covers round?") I would give the wrong answer to the boy-girl problem, so yes I would expect to be excluded. My previous experience with HR people, particularly the kind that have a cheat sheet of inane questions, leads me to expect such. There's also this little nugget from Motl (previously quoted in this thread) which indicates Google would not be the only company showing me the door:
Please, if you think that 50% is a wrong answer, contact Google that had 50%-50% as the official right answer to this puzzle during their hiring process. Complain over there. I didn’t use this simple math problem to dismiss candidates for jobs in my company even though I fully understand why Google did. It’s a very good task to find and throw away the people who will get distracted from common sense and from simple, fundamental math arguments by noise and who will immediately start to think about complicated yet irrelevant technicalities – which is exactly what you did which is why you couldn’t work at Google but you instead work in the Academia that often supports this contrived way of thinking that is detached from the reality and everything important in it.
Emphasis mine.
posted by Marla Singer at 4:18 PM on January 2, 2011


mikeand1: When you answer 50%, you are using expected value, you just don't know it.

Thanks, Mike. That makes perfect sense.

Dr. Enormous: This argument is stupid and pointless unless we're talking about a "country" the size of Vatican City...

I vehemently deny that this argument is stupid and pointless. It has kept me amused for two days, now.
posted by steambadger at 4:19 PM on January 2, 2011


I vehemently deny that this argument is stupid and pointless. It has kept me amused for two days, now.

Agreed, I have been hung over for 2 days. This is great.

Mark me down for team "approaches .5 " just from my gut and Mike's graph of the Monte Carlo simulation I don't see the mathoverflow equation intersecting .5. I also accept that .5 is close enough for Google.
posted by Ad hominem at 4:32 PM on January 2, 2011


We just some country boys
Country walk
Country talk

Don't bring it round here less you know for sho whats jumpin off.
posted by pressF1 at 4:59 PM on January 2, 2011


If an infinite series approaches a limit, it equals the limit. And yes, 1.99999.... = 2
posted by empath at 5:08 PM on January 2, 2011


Like I said I am not a mathmatician. I accept that 1.999... = 2 but does .4999... = .5
posted by Ad hominem at 5:37 PM on January 2, 2011


x=.49999999....

10*x =10*.49999999....

10x = 4.999999....

10x-x = 4.99999999...-.49999999999....

9x = 4.5

x = 4.5/9

x = .5

QED
posted by empath at 6:00 PM on January 2, 2011


Great thanks, I am convinced.
posted by Ad hominem at 6:16 PM on January 2, 2011


All kinds of series come to mind about first tries, second tries, etc - but in the end, this doesn't matter.

Assuming the possibility of having a boy or girl is 50/50, and assuming an arbitrarily large enough sample set..... it's 50/50.

The mistake is to start looking at what proportion of families have how many girls/boys - but the question isn't about that. Only births matter - not which family they are in.

Every birth has an equal chance of girl/boy - so there you go. 1:1
posted by TravellingDen at 7:21 PM on January 2, 2011


"finite" introduces an assumption that requires one to set a variable. That immediately—and unnecessarily—complicates the question.

If you want the plane to take off, you need to avoid overthinking your plate of beans.
posted by five fresh fish at 7:31 PM on January 2, 2011


I think he also feels that his interpretation of Google's question is the one they had in mind, but there I think he's on shakier ground, given that the answer Google wanted was apparently the one that corresponds to the other interpretation. And in any event, "what was in Google interviewer's mind" is not a question we can answer by computation.

I strongly suspect that Google is less interested in a correct answer to the question than in how the interviewee goes about trying to model or solve the problem. My response would likely be along the lines of, "interesting question, but what does the census say because these models rarely survive real-world human behavior?"
posted by KirkJobSluder at 6:27 AM on January 3, 2011


I doubt that's the answer that interests them. I suspect they want to hear "approaching 50%" as time and population increases" and to see some effort in developing a useful model as proof.
posted by five fresh fish at 7:54 AM on January 3, 2011


Er... which is what the first part of your post says. Why blow the answer by dismissing it, then?
posted by five fresh fish at 7:57 AM on January 3, 2011


Er... which is what the first part of your post says. Why blow the answer by dismissing it, then?

Because the problem-solving method central to the education and experience demonstrated on my resume is empirical research. "What is the proportion of boys to girls in the country," is an empirical question. If they want it solved as a math problem, they should have invited a mathematician.

If looking at it as an empirical question costs me the job, then I'm not the right person for the job and the job isn't the right position for me. That is what the interview process is for after all.
posted by KirkJobSluder at 10:15 AM on January 3, 2011


"What is the proportion of boys to girls in the country," is an empirical question. If they want it solved as a math problem, they should have invited a mathematician.

The real question is to figure out what they're looking for. Are they expecting a mathematical answer, as if it were a 'word problem' in sixth grade. Or are they expecting someone to think 'outside the box' and ask additional questions/make different assumptions.

Given Google's rep, I'd actually expect the latter. But then, given the situations when I've actually been given questions like this, I'd expect the former.

Sadly, the 'correct' answer is the former. Which would winnow out people who (a) are giving serious thought to the situation being addressed, and (b) people who are savvy enough to think they're being tested on non-obvious solutions.

Which leaves Google with what?
posted by ChurchHatesTucker at 10:56 AM on January 3, 2011


I took a shot at simulating this in Perl because I didn't grok the mathematical arguments.

use strict;
my $boys=0;
my $girls=0;
for(my $i = 0; $i < 5000000; $i++)
{
my $test = int(rand(2));

if($test==0)
{
$boys++;
next;
}
elsif($test==1)
{
while($test==1)
{
$girls+=1;
$test = int(rand(2));
}
$boys++;
next;
}
}
print "Boys = $boys, Girls = $girls\n";

Result:
Trial 1: Boys = 5000000, Girls = 5002195
Trial 2: Boys = 5000000, Girls = 5005480
Trial 3: Boys = 5000000, Girls = 4997420

This agrees with google but disagrees with the people who are actually doing math. I think the point they're making is that expected p may not be int(rand(2)) but in context of the problem, there are only 2 cases - male/female... What would the math people change about the above formulation of the problem?
posted by Veritron at 2:44 AM on January 4, 2011


Veritron, not to choose sides, but you're simulating 5 million families in one generation. Change your 5000000 loop termination criteria to 4, and see what happens. Run THAT 5 million times and average all the results. The math people are saying that comes out different.
posted by ctmf at 6:41 AM on January 4, 2011


His latest blog post. He's now changed the wording of the problem for the 2nd time.

Original:

In a country in which people only want boys every family continues to have children until they have a boy. If they have a girl, they have another child. If they have a boy, they stop. What is the proportion of boys to girls in the country?

He changed it to the following since the word proportion is used incorrectly (see his comments on his blog):

There's a certain country where everybody wants to have a son. Therefore each couple keeps having children until they have a boy; then they stop. What fraction of the population is female?

And today:

In a certain country, each family continues having children until it has a boy, then stops. In expectation, what fraction of the population is female?

For a problem that seems to hinge at least in part on ambiguity of wording, it seems dodgy to iteratively tighten it up to fit your interpretation.
posted by stp123 at 10:46 AM on January 4, 2011


Ok, so I got a chance to mock it up in (clumsy amateur) C, and I'm getting around 50% near enough even starting with 5 families. What bad assumption did I put in there? I stopped counting the initial parents, to see if that would "help" but it didn't.

Pastebin to my attempt
posted by ctmf at 10:18 PM on January 5, 2011


Interestingly, when I put the commented out code in there to use successive generations, the population gets smaller and smaller and eventually dies off no matter how many families I start with. I guess it would have to, since the maximum number of new families is the number of families in the previous generation when each family can only have 1 boy.

So I guess the question is moot. The answer is, eventually undefined. They go extinct.
posted by ctmf at 10:36 PM on January 5, 2011


In case anyone's still reading down here, I just blogged about this.
posted by escabeche at 4:35 PM on January 10, 2011 [2 favorites]


escabeche: you nailed it. From your post -- "Because averaging ratios with widely ranging denominators is kind of a weird thing to do." The pedantic point he makes about ratios and expectations gives a surprising result because it is non-sensical and not what people are really interested when they are posing this type of question (unless it is being asked in a stats class).
posted by stp123 at 10:05 PM on January 10, 2011


>>Typical MeFi overthinking. 50% of pregnancies result in a girl. The number of pregnancies is immaterial.

>>This is only true if the sex of the baby has no bearing on whether you bring an additional pregnancy to term. But that's not the case here.
>>posted by Justinian at 6:34 PM on January 1 [+] [!]

No Justinian that is not true. Irrespective of motive and past history, 50% of pregnancies result in a girl, so 50% of the total kids born will be girls. That really is the entire answer to the question.
posted by w0mbat at 2:35 AM on January 28, 2011


« Older 1006 Navy Chair   |   Rather be alive than remembered Newer »


This thread has been archived and is closed to new comments