# we have complex, messy models, yet reality is startlingly neat and simple

May 21, 2009 3:35 PM Subscribe

"Zipf's Law (pdf) states that if you tabulate the biggest cities in a given country and rank them according to their populations, the largest city is always about twice as big as the second largest, and three times as big as the third largest, and so on. In other words, the population of a city is, to a good approximation, inversely proportional to its rank."

Also, as cities grow, they benefit from economies of scale(pdf) . For example, the number of gas stations a city needs grows in proportion to 0.77 power of population. Other measures of infrastructure also decrease, per person, as population increases. Interestingly, a very similar power law is found in nature, namely in the metabolic needs of mammals (pdf) , which "grow in proportion to its body weight raised to the 0.74 power." (via)

Also, as cities grow, they benefit from economies of scale

AceRock: "

And thus dies another stupid "see how you think" question.~~RIP~~ BIH

posted by Plutor at 3:47 PM on May 21, 2009

*For example, the number of gas stations a city needs grows in proportion to 0.77 power of population.*"And thus dies another stupid "see how you think" question.

posted by Plutor at 3:47 PM on May 21, 2009

In another phorum I post on someone has made graphs of various stats like users' total posts, posting rates, threads started, etc.

We've found that users' total post count, posting rate and total threads started all seem to follow a power law. We haven't got far enough to do any serious calculations on it, but when you sort the bars by height the graphs do form amazingly smooth log curves.

Has anyone ever tried doing something similar with metafilter posts?

posted by metaBugs at 3:48 PM on May 21, 2009

We've found that users' total post count, posting rate and total threads started all seem to follow a power law. We haven't got far enough to do any serious calculations on it, but when you sort the bars by height the graphs do form amazingly smooth log curves.

Has anyone ever tried doing something similar with metafilter posts?

posted by metaBugs at 3:48 PM on May 21, 2009

What is that paper talking about? In particular, why do they talk about estimation methods? Isn't this something where you just take the data and check?

posted by smackfu at 4:04 PM on May 21, 2009

posted by smackfu at 4:04 PM on May 21, 2009

Actually, a quick look at the first link does rather debunk both "Zipf's Law" and the title of this post. From the conclusions:

Looks to me like nothing more than numerology. But then, I'm from a country whose two larger cities (Madrid and Barcelona) are of very similar sizes...

posted by Skeptic at 4:19 PM on May 21, 2009 [1 favorite]

*Using either method, we reject Zipf’s Law much more often than we would expect based on random chance. Using OLS, we reject the Zipf’s Law prediction that the Pareto exponent is equal to 1, for the majority of countries: 53 of the 73 countries in our sample. This result agrees with the classic study by Rosen and Resnick (1980), who reject Zipf’s Law for 36 of the 44 countries in their sample. We get the opposite result using the Hill estimator, where we reject Zipf’s Law for a minority of countries (29 out of 73). Therefore, the results we obtain depend on the estimation method used, and in turn, the preferred estimation method would depend on our sample size and on our theoretical priors – whether or not we believe that Zipf’s Law holds.*Looks to me like nothing more than numerology. But then, I'm from a country whose two larger cities (Madrid and Barcelona) are of very similar sizes...

posted by Skeptic at 4:19 PM on May 21, 2009 [1 favorite]

It's not that simple. Lots of distributions can look sort of like power laws if you squint. I'm a little uncomfortable with formal tests of the sort "is something distributed

Some preferential attachment mechanisms in networks will produce power law distributions, so there's a cottage industry devoted to testing for them. Cosma Shalizi has blogged about this a few times, and is well worth reading if you're interested.

posted by shadow vector at 4:21 PM on May 21, 2009 [2 favorites]

*X*," which theoretically have good asymptotic properties but suck when facing real data, since you're making inferences basically only over information in the tail of the distribution.Some preferential attachment mechanisms in networks will produce power law distributions, so there's a cottage industry devoted to testing for them. Cosma Shalizi has blogged about this a few times, and is well worth reading if you're interested.

posted by shadow vector at 4:21 PM on May 21, 2009 [2 favorites]

In other news related to power laws (zipf's law is an instance of the more general concept of a power law): just because you've got a a straight line on a log-log graph, doesn't mean the internet is scale free.

(on preview: shadow's vector's link is relevant)

There is an interesting history to Zipf's law. It was first observed for English --- the frequency at which words were spoken/written followed the distribution of the cities in the post above. When the same pattern was observed in other languages, people believed it to be a sign of God's handywork. However, IIRC, it was observed that Zipfian distributions were shown to be a naturally occurring distribution that appears whenever something results from the product of a series of random variables. (Whereas the normal distribution is the result of the summed effect of a series of random variable.) But I should probably do some research before posting incorrect things on the Internets about maths...

posted by about_time at 4:23 PM on May 21, 2009

(on preview: shadow's vector's link is relevant)

There is an interesting history to Zipf's law. It was first observed for English --- the frequency at which words were spoken/written followed the distribution of the cities in the post above. When the same pattern was observed in other languages, people believed it to be a sign of God's handywork. However, IIRC, it was observed that Zipfian distributions were shown to be a naturally occurring distribution that appears whenever something results from the product of a series of random variables. (Whereas the normal distribution is the result of the summed effect of a series of random variable.) But I should probably do some research before posting incorrect things on the Internets about maths...

posted by about_time at 4:23 PM on May 21, 2009

Here's something about Zipf's law and languages that's better than what I posted above.

posted by about_time at 4:29 PM on May 21, 2009

posted by about_time at 4:29 PM on May 21, 2009

*Also, as cities grow, they benefit from economies of scale...*

That explains why it's so much cheaper to live in NYC than Boise.

posted by ZenMasterThis at 4:48 PM on May 21, 2009

Cornell Math Prof Steve Strogatz just blogged about this over at nytimes.com yesterday.

posted by zachlipton at 4:51 PM on May 21, 2009

posted by zachlipton at 4:51 PM on May 21, 2009

*It was first observed for English*

heh, i saw that here too :P which got me thinking about benford's law [1,2] and agenticity!

posted by kliuless at 4:57 PM on May 21, 2009

In Hungary, Budapest is eight or nine times more populous than the second biggest (Debrecen.) And the next several cities aren't terribly smaller than Debrecen. The next three cities are only slightly smaller than Debrecen.

In Romania, Bucharest had about 2,000,000 people. The next seven biggest cities all have populations of close to 300,000 people (or about a seventh of Bucharest's population each.)

After this, I didn't bother checking any more.

posted by Dee Xtrovert at 5:09 PM on May 21, 2009 [1 favorite]

In Romania, Bucharest had about 2,000,000 people. The next seven biggest cities all have populations of close to 300,000 people (or about a seventh of Bucharest's population each.)

After this, I didn't bother checking any more.

posted by Dee Xtrovert at 5:09 PM on May 21, 2009 [1 favorite]

In Australia, the two largest cities are Sydney (with 3.6 million) and Melbourne (with 3.3 million). There goes another theory.

posted by acb at 5:12 PM on May 21, 2009

posted by acb at 5:12 PM on May 21, 2009

The connection of organism size to city size is really surprising! Cities are essentially two-dimensional, when considering resource distribution, while organisms are three-dimensional - usually jumping up a dimension changes the exponent one considers.

posted by kaibutsu at 5:18 PM on May 21, 2009

posted by kaibutsu at 5:18 PM on May 21, 2009

I wrote about the New York Times article on my weblog: It is easy enough to get the data from Popluation.net on world cities with at least a million people, copy and paste it into Excel, draw a x/y plot, and ask Excel to generate a power law regression line.

You get population = 110.98r^-0.744, where, population is the (predicted) population in millions, and r is the city's rank. For example, it estimates that the population of San Francisco (the 41st largest city) to be 7 million; it's actual population was 7.35 million — only off by 350,000 estimated by on rank alone!

posted by wfitzgerald at 5:30 PM on May 21, 2009 [2 favorites]

You get population = 110.98r^-0.744, where, population is the (predicted) population in millions, and r is the city's rank. For example, it estimates that the population of San Francisco (the 41st largest city) to be 7 million; it's actual population was 7.35 million — only off by 350,000 estimated by on rank alone!

posted by wfitzgerald at 5:30 PM on May 21, 2009 [2 favorites]

*In Hungary, Budapest is eight or nine times more populous than the second biggest (Debrecen.) And the next several cities aren't terribly smaller than Debrecen. The next three cities are only slightly smaller than Debrecen.*

Perhaps the power law holds if you count the entirety of the former Habsburg empire as one "country".

posted by acb at 5:55 PM on May 21, 2009

Auckland, New Zealand: ~1.2 million. Wellington & Christchurch: ~400,000 ea.

posted by rodgerd at 5:59 PM on May 21, 2009

posted by rodgerd at 5:59 PM on May 21, 2009

*"Perhaps the power law holds if you count the entirety of the former Habsburg empire as one 'country'."*

That being one of the big problems with this, the lines drawn are pretty arbitrary. A good example of this is the difference between the composition of Toronto and Vancouver.

posted by Mitheral at 6:31 PM on May 21, 2009

According to Decker, the distribution of city size, is not a power law, but a log-normal distribution. This is proven by using satellite data of the world at night to identify urban areas. A note from the paper on this methodology:

This illustrates that Zipf's law is not true, when you consider small 'cities'.

posted by a womble is an active kind of sloth at 6:52 PM on May 21, 2009

*"Nighttime light cluster area (in km2) correlates very well with population: for the United States, the area of DMSP-OLS light clusters predicts population with an r2 between 0.63 and 0.93 depending on how the data are transformed"*This illustrates that Zipf's law is not true, when you consider small 'cities'.

posted by a womble is an active kind of sloth at 6:52 PM on May 21, 2009

*There is an interesting history to Zipf's law. It was first observed for English --- the frequency at which words were spoken/written followed the distribution of the cities in the post above. When the same pattern was observed in other languages, people believed it to be a sign of God's handywork. However, IIRC, it was observed that Zipfian distributions were shown to be a naturally occurring distribution that appears whenever something results from the product of a series of random variables. (Whereas the normal distribution is the result of the summed effect of a series of random variable.) But I should probably do some research before posting incorrect things on the Internets about maths...*

I was actually just reading this the other day. There's a great book on the mathematics of language called Foundations of Statistical Natural Language Processing. It sounds dry but what I've read so far was actually pretty interesting.

Zipf had some crazy theory about how the distribution was caused by people wanting to use minimal effort, or something like that. But it turned out that if you created a language by randomly picking letters, you would get a zipf's law distribution, because shorter would show up more often.

It seems incredible to think that the distribution of words like 'the' and so on would be random, and in fact Zipf's law broke down for the

*most*common two or three words IIRC.

posted by delmoi at 7:13 PM on May 21, 2009

*"Perhaps the power law holds if you count the entirety of the former Habsburg empire as one 'country'."*

Neat. I was anxious for the day gerrymandering would finally break out of the shackles of politics and move into other fields.

posted by qvantamon at 7:15 PM on May 21, 2009 [3 favorites]

Maybe they`re messing around with the definition of a city and a metropolitan area. This theory definitely needs a population density and sprawl size variable. Probably another one to correct for countryéadminstrative area (sorry, my hardware is borking up my keyboard) to ``city`` ratio.

posted by porpoise at 10:25 PM on May 21, 2009

posted by porpoise at 10:25 PM on May 21, 2009

Anomaly deluxe: my current place of residence, Kuwait City, is home to something like 94% of the country's residents.

posted by ambient2 at 10:59 PM on May 21, 2009

posted by ambient2 at 10:59 PM on May 21, 2009

I have read this nine times now and it doesn't appear to make sense:

It appears to say that things are proportional to themselves.

Maybe I'm not understanding what's meant by "biggest" but I interpret it as, if you list the cities by their population, you find a correlation between that and ... their population?

posted by AmbroseChapel at 11:47 PM on May 21, 2009

*if you tabulate the biggest cities in a given country and rank them according to their populations, the largest city is always about twice as big as the second largest, and three times as big as the third largest, and so on. In other words, the population of a city is, to a good approximation, inversely proportional to its rank.*It appears to say that things are proportional to themselves.

Maybe I'm not understanding what's meant by "biggest" but I interpret it as, if you list the cities by their population, you find a correlation between that and ... their population?

posted by AmbroseChapel at 11:47 PM on May 21, 2009

AmbroseChapel, the interesting bit is "twice as big." It's not merely the fact of correlation, it's

posted by wemayfreeze at 12:16 AM on May 22, 2009

*how*: the rank (the third largest city ...) is tied to relative size (... is a third as large as the largest).posted by wemayfreeze at 12:16 AM on May 22, 2009

All this seems to somehow smack of phi.

As in, we make it so/because it looks right to us.

*not doing math to back it up*

posted by hypersloth at 12:37 AM on May 22, 2009

As in, we make it so/because it looks right to us.

*not doing math to back it up*

posted by hypersloth at 12:37 AM on May 22, 2009

*"I have read this nine times now and it doesn't appear to make sense:"*

It definitely does make sense. Tenth time lucky.

posted by infobomb at 5:14 AM on May 22, 2009

« Older Mr. Potato Head | The Sinister End-of-the-World Homerun Newer »

This thread has been archived and is closed to new comments

posted by AceRock at 3:36 PM on May 21, 2009