Zipf''s Law
February 28, 2016 5:44 PM   Subscribe

Vsauce's take on Zipf's Law (SLVS). With some useful links in the video description.

Some previous Zipf stuff: 1 2
posted by carter (9 comments total) 6 users marked this as a favorite
I tend to believe that Zipfian and other heavy-tailed distributions tend to be a good first approximation to most systems with correlational structure. Which, to a solid first approximation, are the only systems you give a shit about. Certainly, they work as a good first approximation for number of uses of variable names in programs, length of program files, time between commits in my git repos, amount of time between cleanings of my room, amount of time I procrastinate on tasks on my todo list, etc etc.

I wrote a long thing why that would be (the nature of causation) which is mostly bullshit and would be a self-link if I linked it so I won't link. It was pretty fascinating to me for a while.

Regression on those quantities is hard. You need to pull out massive parametrism or take out some heavy-duty stuff (CR Shalizi on the related task of dynamical attractor reconstruction). I don't know why nobody has pulled out an RNN for predicting that sort of thing, maybe I will
posted by hleehowon at 7:09 PM on February 28, 2016 [3 favorites]

The amount of shit that goes wrong on an individual task in a task list, the amount of time projects take (that's quite empirical, a big ol' literature on it), all sorts of crap about the weather (obviously, we're talking about chaotic dynamical systems here), we had a thing previously talking about the amount of sex you have, which was indeed a shit fit.

Number of usages of abstract base classes, number of usages of classes, number of usages of methods, all sorts of stack depths, Wiki contribution, contribution on any sort of social network, every damn thing about social networks, actually, most measures of any kind of strength of relationships, a solid contingent of economic variables, including but not limited to firm size, money distribution both in income and wealth, whether in Swaziland or Denmark, movement size of any five markets you'd care to name, trading volume, international trade.

I recall the Panopticlick people had a pretty strange scaling relationship with their de-anonymization trick (well, I graphed it like they didn't) where grand numbers of people couldn't be de-anonymized because they were using the newest most generic iPhone with Safari and no web fonts. I was at a small social network that replicated this fact.

Population densities in ecology. Bak made a model of punctuated equilibrium that was supposed to be self-organized critical. Lots of things claimed about food webs, although people really should not be making claims on this topic with n=172 only.

Fits are... remarkably poor in many of them. Many times, it's just really the observation that all these things tend to span a lot of scales.

Sources available on request.
posted by hleehowon at 7:24 PM on February 28, 2016 [1 favorite]

Since Shalizi has been mention, we must link to his wonderfully titled article "So You Think You Have a Power Law — Well Isn't That Special?"
posted by benito.strauss at 7:25 PM on February 28, 2016 [6 favorites]

Oh, the most disturbing one is in passwords. Security doesn't exist! Hooray!
posted by hleehowon at 7:43 PM on February 28, 2016 [2 favorites]

I wrote a long thing why that would be (the nature of causation) which is mostly bullshit and would be a self-link if I linked it so I won't link.
Self-linking is okay in comments!

posted by BungaDunga at 9:29 PM on February 28, 2016

Yes, it would be interesting to add this, hleehowon!
posted by carter at 5:14 AM on February 29, 2016

Uhh, alright then. Causal networks are weird.
posted by hleehowon at 6:09 AM on February 29, 2016

we must link to his wonderfully titled article "So You Think You Have a Power Law — Well Isn't That Special?"

"I trust that I will no longer have to referee papers where people use GnuPlot to draw lines on log-log graphs, as though that meant something..."
posted by clawsoon at 1:19 PM on February 29, 2016

Peter Neumann has some thoughts which I can't say I completely understand yet:
Belevitch considered a wide class of more or less well-behaved statistical distributions (normal or whatever), and performed a functional rearrangement that represents the frequency as a function of rank-ordered decreasing frequency, and then did a Taylor expansion of the resulting formula. Belevitch's lovely result is that "Zipf's Law" follows directly as the first-order truncation of the Taylor series. Furthermore, "Mandelbrot's Law" (which seem even more curious and mysterious to most people) follow immediately as the second-order truncation. ("Pareto's Law" lies in between Zipf and Mandelbrot, with different slope of the 45-degree curve.) There is nothing magical or mystical about it!


Jim Horning once asked me about a possible connection with the 80-20 rule. My response was this: ...

* In 36,299 occurrences of English words (Miller et al.), the most frequent 18% of the words account for over 80% of the word occurrences. That's close to the so-called 80-20 rule.

* In over 11 million occurrences of German words (Kaeding -- fascinating book, incidentally), the most frequent .6% of the words account for over 75% of the word occurrences, which is in some sense roughly 20 times more skewed than the so-called 80-20 rule. Perhaps the wider skewing is due to the fact that conjugated forms and declined forms (such as the most frequent der, die, das, etc.) are counted as different words, which linguistically of course, they are.

Both of these language statistical studies closely follow Zipf-Mandelbrot all the way down to the tails. But the parameters are slightly different. Thus, the supposed 80-20 split does not in anyway follow directly from Z-M. It could be 80-20, or 99-1, or worse!
posted by clawsoon at 8:13 AM on March 1, 2016

« Older Sweet Home Alabama - Where the minimum wage must...   |   This is why you always look up. Newer »

This thread has been archived and is closed to new comments