Join 3,426 readers in helping fund MetaFilter (Hide)


Imagine someone of the type we call neurotic
May 31, 2012 7:02 AM   Subscribe

A not-well discussed property of data: it is toxic in large quantities—even in moderate quantities.

An excerpt from Nassim Taleb's (previously, previouslier, previousliest) forthcoming book Antifragile: Things That Gain From Disorder looks at how too much information can be harmful.
posted by onwords (89 comments total) 28 users marked this as a favorite

 
Johnny Mnemonic: Yeah, the Black Shakes. What causes it?
Spider: What causes it?
[points to various pieces of equipment throughout the room]
Spider: This causes it! This causes it! This causes it! Information overload! All the electronics around you poisoning the airwaves. Technological fucking civilization. But we still have all this shit, because we can't live without it. Let me do my work.
posted by FatherDagon at 7:09 AM on May 31, 2012 [12 favorites]


It actually sounds intuitive, which is why I'm very suspicious of the claims Taleb makes here. Where's the research and data?
posted by Foci for Analysis at 7:10 AM on May 31, 2012 [3 favorites]


I have an unreasonable love for Taleb, despite the fact that The Black Swan was easily twice as long as it needed to be.

The Information Diet by Clay Johnson was a recent book with some of the same basic ideas as this piece. It has its own interesting website.
posted by Sticherbeast at 7:14 AM on May 31, 2012 [2 favorites]


Where's the research and data?

This sounds to me like the laughable whimpering of a pimply neurotic, not a Clear-Thinking, Steely Man Of Action Who Ices Dudes and Doesn't Care About Vacuum Cleaner Facts
posted by theodolite at 7:15 AM on May 31, 2012 [18 favorites]


Signal to noise ratio is a convenient analogy for simple analog thinking, but it would seem to be a wholly inappropriate analogy for the more complicated way the world works.

The more frequently you look at data, the more noise you are disproportionally likely to get (rather than the valuable part called the signal); hence the higher the noise to signal ratio. And there is a confusion, that is not psychological at all, but inherent in the data itself.

I don't know what he means by "inherent in the data itself" but "noise" is random and uncorrelated. Analog systems are noise limited, but digital systems are interference limited and interference whilst it contains noise also contains a great deal of correlated information that a clever system can filter away.

Indeed the "signal" in a CDMA system can be well-buried under the noise, but still easily recoverable by using the appropriate filter.

Taleb's thesis that too much data is the problem cannot be correct. It is too little filtering.
posted by three blind mice at 7:23 AM on May 31, 2012 [13 favorites]


as you consume more data, and the ratio of noise to signal increases, the less you know what’s going on

The man with one clock knows the time. The man with two clocks is never sure.

However, I like to think that the man with five clocks can get a rough estimate while the man with twenty clocks is overwhelmed and/or indecisive.
posted by DU at 7:24 AM on May 31, 2012 [10 favorites]


Taleb's thesis that too much data is the problem cannot be correct. It is too little filtering.

A fair distinction, but not a huge one. You're just presenting the more clear version of the same exact idea. It's safe to assume that Taleb is not suggesting that you limit your data intake through some completely random process, but rather through a filtration process that will separate the noise from the signal.
posted by Sticherbeast at 7:28 AM on May 31, 2012 [1 favorite]


Can't stop the signal.
posted by schmod at 7:30 AM on May 31, 2012


Is this going to be another book that uses a seemingly technical, but ultimately grossly inappropriate analogy to make predictive or even normative statements about a contentious tendency the author identifies in some massive abstraction like "modernity" or "science"?
posted by Nomyte at 7:35 AM on May 31, 2012 [20 favorites]


How about this idea:

If we think of the data processing methodology we use (I don't want to just limit this to the word "filtering") as a sort of metabolism, and the data as the input ("nutrients") then it is more accurate to say that too much data can be toxic to the wrong sort of methodology, just as certain nutrients can be toxic to certain metabolisms, but not others. But, we can clearly also say that small amounts of the "wrong kind" of data can also be toxic--for example, half-digested Austrian school economics :)

A "good" data metabolism will successfully employ various strategies (filtering, dimension reduction, dynamical modeling,...) to extract useful information from the see data while not saturating the system. A "bad" metabolism will result in a pathology because, say, the computational power of the system is exceeded, or false positives become dominant, or... ?

If you think about it, this is what our perceptual systems do all of the time.
posted by mondo dentro at 7:40 AM on May 31, 2012 [4 favorites]


Nomyte, I think so, because I cannot see where he's said anything that isn't thunderingly obvious from a "common sense" [pardon my language] point of view. Nor where he's provided evidence.

Well, now I don't have to read his stuff, based on the impression this has made on me. I'm filtering him out.
posted by tel3path at 7:40 AM on May 31, 2012


For a sample of a composed, call and pondered voice, listen to interviews of “Sammy the Bull” Salvatore Gravano who was involved in the murder of nineteen people (all competing mobsters). He speaks with minimal effort. In the rare situations when he is angry, unlike with the neurotic fellow, everyone knows it and takes it seriously.

So this is the sort of person I'm supposed to aspire to be ... ?
posted by localroger at 7:44 AM on May 31, 2012 [1 favorite]


it is optimal to opt for less options, the opinionated options trader opined, believing the optician's phoropters optional.
posted by TwelveTwo at 7:45 AM on May 31, 2012 [1 favorite]


Here's the thing. I think there is a good point in there: not about the quality of data overall, but rather the fact that a lot of people don't actually understand the techniques you're supposed to use to actually get the information you want out. A lot of people seem to think you can just run some algorithm against it and get something useful out of it without thinking too much.

Look at the Netflix prize for example. It took many teams years and years to actually develop an algorithm that would actually work and get a 10% higher score then the one they had developed originally. While the "signal" they were looking for was there the reality is that it took them, basically, forever to find it.

Or remember Mark Penn from the 2008 presidential primary? Hillary's Chief pollster was Mark Penn who basically ran the show because he had "the numbers". The thing is, her campaign was a disaster Penn literally didn't seem to understand that delegates were proportional and that winning big states by small margins didn't count for more then winning small states by large margins. It Seems like had her campaign seriously gone after the small caucuses and whatnot after Iowa and NH early on she might have won. By the time her Campaign got into gear it was already too late.
I don't know what he means by "inherent in the data itself" but "noise" is random and uncorrelated. Analog systems are noise limited, but digital systems are interference limited and interference whilst it contains noise also contains a great deal of correlated information that a clever system can filter away.
Information entropy is actually a property of a data stream, although 'signal to noise' isn't really the appropriate term for it. A perfectly compressed block of information will have maximal entropy of one 'bit' per bit.

On the other hand, if you're trying to discover a pattern the higher the entropy the higher the entropy, the harder it is to find. Noise is a different problem: information about something you don't care about. It actually does increase the 'entropy' of the data that you're looking at
Taleb's thesis that too much data is the problem cannot be correct. It is too little filtering.
Well, the problem is there are limits to how much filtering you can actually do. With CDMA you have a system that was designed to be recoverable with a well characterized 'interference' pattern (other cell phones) Since you already 'know' what to expect for things other then your signal it's easy to filter it out.
posted by delmoi at 8:00 AM on May 31, 2012 [6 favorites]


That is a good point, but it's not in there.

Anybody can say "data!!!1!!! ZOMG too much data!" but to say what delmoi said, you have to know what you're talking about.
posted by tel3path at 8:19 AM on May 31, 2012


Is this going to be another book that uses a seemingly technical, but ultimately grossly inappropriate analogy to make predictive or even normative statements about a contentious tendency the author identifies in some massive abstraction like "modernity" or "science"?

Yes, it is. The first recommendation on the back cover will be by Malcolm Gladwell. The Charlie Rose interview is already scheduled. The extract will appear in The New Yorker the week after that. Time to sell some books.
posted by benito.strauss at 8:22 AM on May 31, 2012 [3 favorites]


For a man who values the signal to noise ratio so highly, it is ironic that he has written three whole books making the same basic point.
posted by MuffinMan at 8:26 AM on May 31, 2012 [6 favorites]


The trick here is properly tuning your feedback loops. If you're careful, the more input the better... if not, you'll go off on spurious quests.

When you don't have enough information, you tend not to be aware of this problem. It's not new, though.
posted by MikeWarot at 8:51 AM on May 31, 2012


The man with no clocks doesn't have to worry what time it is.
posted by The Whelk at 8:52 AM on May 31, 2012


The man with no clocks is king.
posted by Nomyte at 9:02 AM on May 31, 2012


I like to think that the man with five clocks can get a rough estimate while the man with twenty clocks is overwhelmed and/or indecisive.

Only if he has really lousy clocks.
posted by straight at 9:12 AM on May 31, 2012 [1 favorite]


Does anyone really know what time it is?
posted by philip-random at 9:17 AM on May 31, 2012


>it is ironic that he has written three whole books making the same basic point.

Not at all; "What I tell you three times is true" is a key line from The Hunting of the Snark, and indeed relates to all snark in general.
posted by scruss at 9:23 AM on May 31, 2012 [2 favorites]



The trick here is properly tuning your feedback loops. If you're careful, the more input the better... if not, you'll go off on spurious quests.

When you don't have enough information, you tend not to be aware of this problem. It's not new, though.


The problem is that the people generating the data will be constantly trying to get around your filters.

Regardless if they are advertisers or just somebody who thinks they have something cool.
posted by KaizenSoze at 9:24 AM on May 31, 2012 [1 favorite]


I like to think that the man with five clocks can get a rough estimate while the man with twenty clocks is overwhelmed and/or indecisive.

If you've got one clock, you've only got one time to go on. Time A.

If you've got two clocks, you've now got three options. It's either Time A, Time B ... or a combination of the two.

if you've three clocks, you're suddenly up to seven options:
Time A
Time B
Time C
Some combination of A + B.
Some combination of A + C.
Some combination of B + C.
Some combination of all three.

Five clocks is getting damned complicated. Twenty, I suspect you're well into the billions. But I'm just an artist. Let someone else do that math.
posted by philip-random at 9:24 AM on May 31, 2012



For a man who values the signal to noise ratio so highly, it is ironic that he has written three whole books making the same basic point.


Saying this as a Taleb fan, he has been refining his message. He has been constantly asking for feedback on his Facebook page, trying to make his message as clear as possible in this book.

Also, I enjoy the new analogies that he comes up with each new book.

Don't ever watch his interviews, he's horrible on camera.
posted by KaizenSoze at 9:29 AM on May 31, 2012


was easily twice as long as it needed to be

There's definitely an irony in Taleb telling people about signal-to-noise ratio.

The frustrating thing is he does have a handful of really good insights among all his verbiage and look-at-me anecdotes.
posted by philipy at 9:32 AM on May 31, 2012 [1 favorite]


If you've got one clock, you've only got one time to go on. Time A.

If you've got two clocks, you've now got three options. It's either Time A, Time B ... or a combination of the two.

if you've three clocks, you're suddenly up to seven options:
[...]Five clocks is getting damned complicated. Twenty, I suspect you're well into the billions.


Ah, but that assumes the analysis you use on 3 clocks is the same as the one you use on 20. Suppose that, in the 5-clock case, 3 of your clocks show roughly 6 PM, and the other 2 say it's about 10 AM. You can conclude that it's slightly more likely that it's evening than morning, but it's a shaky conclusion. But if you have 20 clocks, and 18 of them are within +/- an hour of 6 pm, you can be reasonably confident throwing out the 10 AM clocks (assuming all clocks are equally likely to be valid, etc.) Not completely sure, maybe, but you have much better ground for assuming it's about 6 pm than you did when you had 5 clocks.

More clocks doesn't only mean more data points; it also means a better framework for how these data points relate to one another.

(Unless all of them are showing completely different, evenly spaced times. Then you're just fucked.)
posted by kagredon at 9:46 AM on May 31, 2012 [1 favorite]


Don't ever watch his interviews, he's horrible on camera.

Going by this excerpt, he can't write either.
 
posted by Herodios at 9:50 AM on May 31, 2012


Speaking as a man with (at least) ten clocks, I am not at all sure how far this clock analogy goes.

Some of the clocks are more authoritative than others - these are the ones on devices which are online and get synchronised via NTP on a regular basis. My other clocks get synchronised to those manually to within a reasonable margin of error (<1 minute or so) on a relatively regular basis. There's a new (to me) one - the Most Hilariously Hideous Clock In The World, which I recently resurrected and which has some bizarre ancient electronic mechanism involving a 'speed/slow time' knob, and which is still either gaining or losing in the region of four or five minutes a day. I know not to trust it too much. Similarly, I know that the manually synchronised ones cannot be trusted as much as the NTP-enabled ones, and even they need checking every so often.

It's an incredibly narrow domain and pretty simple to establish how far the information from each particular clock can be trusted at any given time, especially after having lived with them a while.

This is not at all like the domain of 'all information sources', where it is really hard to check either the accuracy or the relevance to a reasonable degree of precision on an ongoing basis. The former is about trust and the latter is a moving target based on where you're at. Filters are hard.
posted by motty at 9:52 AM on May 31, 2012 [2 favorites]


> Taleb's thesis that too much data is the problem cannot be correct. It is too little filtering.

Filters can just as well extract "signals" that actually aren't there, and some filters can't be turned off--e.g. seeing faces in the clouds. Look at just one piece of toast and then eat it...probably no alerts. Look at hundreds or thousands of pieces, though, and it becomes more and more likely that you'll go "Hey look! Jesus!"
posted by jfuller at 9:59 AM on May 31, 2012


Is it just me, or does anyone else get really annoyed when people criticize writers for 'not knowing how to write'?
posted by lodurr at 10:02 AM on May 31, 2012


The man with no clocks meebles around in a constant prepanic state, asking everybody every five minutes if they know what time it is, and shrieking "I'm late! I'm late!"

(Actually, everybody I've known who didn't have a watch has been in the habit of asking me what time it is every 2-3 minutes. It is annoying. TIME YOU GOT A WATCH.)
posted by tel3path at 10:04 AM on May 31, 2012 [2 favorites]


I love how this thread is basically segregated into two main sub-threads: people arguing about whether the book is worth reading, and people having discussions about information theory without paying much attention to the book anymore. One of those two I can learn something from.
posted by lodurr at 10:05 AM on May 31, 2012 [1 favorite]


tel3path: Weirdly, I generally know to within about 10 minutes (often 5) what time it is. I don't know why that is exactly, and for several years now (even as I got better at it) I've found it a little unsettling.
posted by lodurr at 10:07 AM on May 31, 2012


I guess I have an idea of what he means by 'noise' and 'real information', but why not cut to the chase analytically without making so much 'noise' then?
posted by quoquo at 10:19 AM on May 31, 2012 [1 favorite]


I have numerous clocks, one of which is currently stopped.

What this has to do with the subject of the post, I don't know.
posted by philipy at 10:20 AM on May 31, 2012


Look at just one piece of toast and then eat it...probably no alerts. Look at hundreds or thousands of pieces, though, and it becomes more and more likely that you'll go "Hey look! Jesus!"

Helps if you put a filter into the toaster to let through IR radiation in the shape of Jesus.
posted by rough ashlar at 10:22 AM on May 31, 2012


Chris, dude writes like someone longing to have lived a century earlier.
posted by MartinWisse at 10:45 AM on May 31, 2012


but why not cut to the chase analytically without making so much 'noise' then?

hard to get a publisher excited about a twelve page book?
posted by philip-random at 10:47 AM on May 31, 2012


I'm surprised yall made it through the whole piece, I'm still pondering over what the dude he was talking about at the start even looks like!

"His necks moves around when he tries to express himself."

What, all of them? Whoa
posted by barnacles at 10:53 AM on May 31, 2012


The more frequently you look at data, the more noise you are disproportionally likely to get (rather than the valuable part called the signal); hence the higher the noise to signal ratio.

This is the core of the argument. And it's wrong.
posted by Mental Wimp at 11:27 AM on May 31, 2012 [3 favorites]


Look at hundreds or thousands of pieces, though, and it becomes more and more likely that you'll go "Hey look! Jesus!"

But think of all the delicious, nutritious toast. Yum!
posted by Mental Wimp at 11:38 AM on May 31, 2012


Weirdly, I generally know to within about 10 minutes (often 5) what time it is. I don't know why that is exactly, and for several years now (even as I got better at it) I've found it a little unsettling.

The Sun, and an internal sense of how long things take.

I bet you're much more likely to be wrong if you've been doing something you haven't done before and are inside.

Artificial lighting would be a good example of too much "noise" over-riding the important signals (where the sun is, how much light there is) that help to establish an internal sense of time. On the other hand, being locked in a perfectly dark closet would do the same thing.

Also, it seems like "as you consume more data, and the ratio of noise to signal increases." is only true if you're indiscriminately consuming data with the hopes of finding a specific signal. Which is a bit like saying "If you eat more the dirt to hotdogs ration increases," based on the assumption that everyone's just randomly eating anything that they can chew.
posted by Gygesringtone at 11:46 AM on May 31, 2012 [1 favorite]


This is a terribly conservative, anti-intellectual concept. It assumes there is some sort of "Natural Man" that technology has despoiled. Go back to Walden Pond, please. You can drive right up to the Thoreau theme park, plenty of free parking.

I remember Nick Carr wrote about how reading online in multiple windows and multitasking ruined peoples' ability to sit down and read books. So I decided to renew my library card and started checking out books. I checked out 3 books and asked the librarian how many books I could check out simultaneously. She said, "There's no limit. But if go over, oh, like 50, it could be a problem."

So after about 6 weeks, sitting on my desk were 15 books, each with a bookmark about 1/4 of the way in. Some had been renewed once, some twice. The only book I ever finished, I had maxed out the renewals, so I had to return it and check it out again. I had to do that twice. And it wasn't really that good a book.

I think my problem is not that my attention span has been altered, it's that I don't have good reading glasses, but I do have good computer glasses and a 30 inch screen. A friend gave me a book for my birthday, my first thought was to cut it out of the binding and run it through my scanner to make it into a PDF.

So I called an end to the experiment. Well, actually, the library called an end to it, when I had over $10 of overdue book fines, and I was too broke to pay it. My card was suspended until I paid. I returned all the books, and I felt like I was seeing a dialog box that I see fairly often on my computer, when I close my browser: "You have 347 tabs open in 28 windows. Do you really want to quit?"
posted by charlie don't surf at 12:45 PM on May 31, 2012 [3 favorites]


philip-random, you'll probably never see this, but I can't resist:

Five clocks is getting damned complicated. Twenty, I suspect you're well into the billions. But I'm just an artist. Let someone else do that math.

The math is actually very simple when you're asking for all possible combinations. Each clock is either included in your selection, or it isn't - one of two choices. So for 20 clocks, there are exactly 2^(20) combinations (2 multiplied by itself 20 times, or 1,048,576).

That's a bit over a million combinations: computer nerds will recognize that as the number of bytes that should be in a Megabyte, but for historically stupid reasons that's actually the number of bytes in a Mebibyte (MiB).
posted by RedOrGreen at 1:27 PM on May 31, 2012 [1 favorite]


Five clocks is getting damned complicated. Twenty, I suspect you're well into the billions. But I'm just an artist. Let someone else do that math.
It depends on the precision. With infinite precision, there are an infinite number of possibilities with just two clocks.

If it's quantized though you can just pick any point between the minimum and maximum, regardless of what the combinations are. So if you have five clocks, and the minimum time is 1:30, and the max clock is 1:50, there are only 20 possible values you can pick (if you're just counting minutes). Then you have 20*60= 1200 possible seconds that you can chose. It doesn't matter if you have 2 clocks or 2 million.

That's ignoring the probability of the different estimates, though. The mean time of all the clocks would be the most likely, and you could estimate a distribution curve based on how 'bunched up' the clocks were. The more clocks you have, the better you can estimate the distribution.
The math is actually very simple when you're asking for all possible combinations. Each clock is either included in your selection, or it isn't - one of two choices. So for 20 clocks, there are exactly 2^(20) combinations (2 multiplied by itself 20 times, or 1,048,576).
No, because "some combination of A and some combination of B" isn't actually something we care about if we are trying to figure out what time it is. In fact, it doesn't even make sense how can the time be a "combination" of different clocks? C1 = 1:30, C2 = 1:30 C3 = 1:40 and C4 = 1:40 and the actual time is 1:33 how can it be a "combination" of C1 and C3, but not C2 and C4?

And if you were to say that the 'contribution' is a weight of the average then there are multiple contributions with different results. 0.2*C1 + 0.8*C2 is different then 0.4*C1 + 0.6*C2. so you'd need to take all the possible weightings into consideration. If they are completely continuous, then you have an infinite number of possibilities.
posted by delmoi at 1:49 PM on May 31, 2012 [1 favorite]


(also, it would be 220-1 since a 'combination' of no clocks, corresponding to binary string 0000000000000000000 doesn't give you a time)
posted by delmoi at 1:51 PM on May 31, 2012 [1 favorite]


The problem isn't the number of clocks, its that 3 clocks that say 1, 12:45 and 1:15 makes you more confident that its 1 then does 1 clock that says 1 does.
posted by JPD at 2:15 PM on May 31, 2012


I should probably read the linked article, but I find Taleb's writing style unpleasant. The Black Swan remains on my bookshelf, unread. Do others find him pompous? Is he really worth reading? Are there interesting ideas behind those words? I'm not sure if I'll ever find out.
posted by DarkForest at 3:45 PM on May 31, 2012


He's totally derivative. Very few of his thoughts are original, so if you dislike his style there are better things to read.
posted by JPD at 4:13 PM on May 31, 2012 [1 favorite]


The comment about the necks above illustrates to me that the fellow may be one of those people who considers spell-checked and proof-read to be synonymous. The one that stood out the most to me:

at the time of writing the personal doctor or the late singer Michael Jackson is being sued for something that is equivalent to overintervention-to-stifle-antifragility

I don't know what to make of this. It might seem to mean something vaguely if the final hyphenation had instead read

overintervention-to-stifle-fragility, or
overintervention-to-stifle-frailty

but as written it makes no sense. I don't have any idea if it's an error of typing, an error or writing, or an error of thinking. Or maybe some soup of all three. In any case he has some important ideas but the writing is not good.

It is obvious that prescribing a guy a surgical sedative for sleeplessness is way over-medicating the poor patient. But Michael Jackson was a very sick man, not a robustly healthy one.
posted by bukvich at 4:29 PM on May 31, 2012


Taleb despises editors, and it's reflected in he details of his writing. But it fits with his whole attitude, so at least he's consistent.

But all that aside, there's some real gems in this article:

"Greenspan kept an eye on such fluctuations as the sales of vacuum cleaners in Cleveland “to get a precise idea about where the economy is going”, and, of course micromanaged us into chaos."

"Consider that every day, 6,200 persons die in the United States, many of preventable causes. But the media only reports the most anecdotal and sensational cases (hurricanes, freak incidents, small plane crashes) giving us a more and more distorted map of real risks."

". . . anyone who listens to news (except when very, very significant events take place) is one step below sucker."
posted by quadog at 5:06 PM on May 31, 2012


but as written it makes no sense.

"antifragility" is the original concept espoused by the book of which this is an excerpt, which has not been published. So no, it doesn't make sense, but it's not anyone's fault (but whoever published this particular excerpt).
posted by mek at 5:50 PM on May 31, 2012


An example of overintervention-to-stifle-antifragility is over-parenting. By micromanaging your child you prevent any possible failures from occurring, therefore causing the first even minor failure to be devastating. Something which exhibits antifragility actually derives benefit from failures: eg. a child with the emotional resilience and analytical thinking skills necessary to determine why they failed, and then improve in further attempts.

Anyhew, I'm not in love with the concept (surely this has been done before?) but it's not nonsensical or even incorrect.
posted by mek at 5:56 PM on May 31, 2012


I'm surprised yall made it through the whole piece, I'm still pondering over what the dude he was talking about at the start even looks like!

"His necks moves around when he tries to express himself."


I went back like a couple of times trying to find the tonsillectomy story. Hopefully it wasn't too good, cause I didn't find it.
posted by Barry B. Palindromer at 10:29 PM on May 31, 2012


Taleb himself is an interesting case of his own point. After all he is merely one of a few people who accurately cried wolf in an ocean full of people who inaccurately cried wolf. It is impossible to tell if he was a prophet or merely a probability but he is certainly being treated as and playing up the role a prophet. Was he noise or signal? Retroactively we have anointed him signal.

But here we can see that Taleb makes his own argument by being Taleb and failing to separate signal from noise. Greenspan's failure was due to too much data and overreacting based on a vacuum counting anecdote? Really? If anything Greenspan's failure was from not reacting and having too much faith. Are we to infer from an anecdote that Greenspan governed by micromanaging micromeasures from one piece of anecdata?

Also looking at more data shouldn't alter the signal to noise ratio. The ratio is a proportion. It will change the volume of signal and noises but in proportion with the signal to noise ratio. Unless you lower your data quality but that isn't an argument he actually makes (perhaps he just assumes it). There are many instances where it is important to monitor fluctuations that matter. High level measures of sea levels are fine when looking at climate change. Pretty crap when managing a harbour with tides. Likewise a diabetic needs to react to intraday hormone fluctuations. "Oh it will all just average out over time" is how they die young. His analysis here is just to glib. Pay less attention is just not sound general advice and he is giving us no information on deciding how much data to ingest.

He also gets his logic backwards:
A stressor is information
Too much information would be too much stress

Simply doesn't follow. Perhaps it is true but he left the entirety of his argument to support it out.

So Taleb by his own admission might just be some once lucky noise with Gladwellian storytelling ability that we should ignore given that he hasn't really made any great quality arguments since. I want to believe because I suspect he might be right but the arguments are just not at all compelling ( or even existent but perhaps you make an antifragile argument by avoiding being specific) .
posted by srboisvert at 1:17 AM on June 1, 2012


This: The comment about the necks above illustrates to me that the fellow may be one of those people who considers spell-checked and proof-read to be synonymous.

... probably has very little to do with whether Taleb thinks spell-check is equivalent to editing, and quite a bit to do with the fact that publishers just don't edit much (for content or copy) anymore. This is a fact of the publishing industry, and doesn't have nearly as much to do with the relative rank or merit of the author as you would probably think.

And this: Taleb despises editors...

... is, in isolation from evidence that Taleb despises editors, a much less likely explanation for shortcomings than the fact that publishers just don't edit anymore.
posted by lodurr at 6:04 AM on June 1, 2012


publishers just don't edit anymore

So these publishers, panicking as they see their business contracting, have stopped reading the slush pile and editing? Which, as I recall, were the main value that they added? Which is supposed to save them from that contracting situation somehow?
posted by localroger at 7:49 AM on June 1, 2012


The more frequently you look at data, the more noise you are disproportionally likely to get (rather than the valuable part called the signal); hence the higher the noise to signal ratio. And there is a confusion, that is not psychological at all, but inherent in the data itself. Say you look at information on a yearly basis, for stock prices or the fertilizer sales of your father-in-law’s factory, or inflation numbers in Vladivostock. Assume further that for what you are observing, at the yearly frequency the ratio of signal to noise is about one to one (say half noise, half signal) —it means that about half of changes are real improvements or degradations, the other half comes from randomness. This ratio is what you get from yearly observations. But if you look at the very same data on a daily basis, the composition would change to 95% noise, 5% signal. And if you observe data on an hourly basis, as people immersed in the news and markets price variations do, the split becomes 99.5% noise to .5% signal. That is two hundred times more noise than signal —which is why anyone who listens to news (except when very, very significant events take place) is one step below sucker.
This paragraph seems demonstrably untrue to me, or true only if you interpret the data you are getting in the most naive way. Supposing what we're talking about is my weighing myself - which tends to generate quite a lot of noise as it can vary depending on whether I've just had a four-course meal or a massive shit. If I measure myself every day, I will definitely see a lot of noisy ups and downs. However, I will also come to expect a lot of noisy ups and downs, and so long as I do not react to every up or down as if it were a genuine trend I will probably be OK. But if I only weigh myself every six months, I won't be able to tell whether the apparent loss of half a kilo is real or just a noisy artefact. In fact, the best thing for me to do would be to apply fairly basic statistics to the daily measurements - or just plot them all on a graph and visually inspect them for a trend, which should be fairly apparent.

The clocks example is also a good one, because if I have one clock I am more likely to be certain about the time, but also less likely to be right about the time. There is a really dangerous anti-intellectualism in abandoning the tools we have available to get knowledge, and relying instead on clumsy heuristics and shoulder-shrugging. The solution is not less data, it is to interpret the data in an adult, mathematically-literate way.

An exception might be made, however, for chaotic, self-referential systems like the stock exchange. But the stock exchange is crazy and unlike all other human endeavours in many, many ways.
posted by Acheman at 7:56 AM on June 1, 2012


Except that you aren't making direct decisions everyday based on what your weight is. You are ignoring the data until it forms a clear trend, but if you are say looking at the revenues of some tech company on a day to day basis it becomes harder for you to trust in the trend.

Again this is not original thought from Taleb - there is a tremendous amount of research into decision making done by Kahneman and friends that shows that more data /=/better decision making.
posted by JPD at 8:19 AM on June 1, 2012 [1 favorite]


antifragility

So now we're replacing perfectly good words (e.g., "robustness") with malformed neologisms?
posted by Mental Wimp at 8:46 AM on June 1, 2012 [2 favorites]


But that's about the way you interpret the data, not how much data you have. If someone were to make direct decisions based on daily fluctuations in noisy data, there would be a problem, but the problem wouldn't be that there was too much data, it would be that the person was being an idiot. Giving an idiot less information isn't a solution, the solution is to get them to stop being an idiot. If Taleb knows this, why would he advise people to get less information, rather than dealing with that information in a moderately sensible manner?
posted by Acheman at 8:47 AM on June 1, 2012 [1 favorite]


This paragraph seems demonstrably untrue to me, or true only if you interpret the data you are getting in the most naive way.... The solution is not less data, it is to interpret the data in an adult, mathematically-literate way

In practice all of us are "naive" except when we make great efforts not to be.

And in practice the amount of stuff where we can make great efforts is extremely limited.

For that matter, for a great many practically important issues, people need to make decisions way before it's possible to have a rigorous analysis, if indeed you ever can get to that, because the world changes too fast.

For the most part I see people in this thread displaying one aspect of "naive", which is the tendency to see the world through one's pre-existing notions of it, and find plausible arguments why whatever thought first popped into your head is right.

But to do anything else would be take a lot more time, effort and self-awareness than we have available to throw at every passing Mefi thread.

Though it would at least show some degree of self-awareness and analytical literacy if people weren't so happy to pontificate with such certainty after so little thought.
posted by philipy at 8:48 AM on June 1, 2012


In practice all of us are "naive" except when we make great efforts not to be.

I just don't believe this. I think if you give most people a moderate amount of education in concepts like noise and regression to the mean, they will be able to make better decisions that make the most of the resources available to them.
posted by Acheman at 8:51 AM on June 1, 2012


Giving an idiot less information isn't a solution, the solution is to get them to stop being an idiot. If Taleb knows this, why would he advise people to get less information, rather than dealing with that information in a moderately sensible manner?

No - that's the key point - giving an "idiot" more information surrounding an uncertain situation, or teaching them how to "deal with" that information doesn't make them less of an "idiot" it makes them think they aren't idiots. Its a root cause of overconfidence.

(where "idiot" = all human beings)
posted by JPD at 8:53 AM on June 1, 2012


I just don't believe this. I think if you give most people a moderate amount of education in concepts like noise and regression to the mean, they will be able to make better decisions that make the most of the resources available to them.

Well there is a noble prize and the research that went along with it that says you are wrong.
posted by JPD at 8:54 AM on June 1, 2012


For a great many practically important issues, people need to make decisions way before it's possible to have a rigorous analysis, if indeed you ever can get to that, because the world changes too fast.

Exactly - and in order to accomplish that we have an innate set of evolved decision making heuristics & biases that are really good at preventing really negative outcomes, but not so good at doing most other things.
posted by JPD at 8:56 AM on June 1, 2012


Um, if I want to know the mean value of something, a single random observation allows me to estimate it, but I can't estimate my uncertainty. With two observations I can estimate uncertainty as well, but not very well and it will be large relative to the underlying variability of the value. If I get one hundred observations, I know the mean really well and the uncertainty pretty well. For a million observations, I know the mean and I have essentially no uncertainty. What the hell was Taleb talking about?
posted by Mental Wimp at 9:03 AM on June 1, 2012 [1 favorite]


1)Assumes you know what the distribution is - for many decisions you don't really know that
2)your proportional increase in perceived confidence with 10 data points as opposed to 1 data point is significantly greater than the degree your statistical measure of confidence has increased

1) is why algorithms are often bad at making decisions
2) is why people are often bad at making decisions
posted by JPD at 9:25 AM on June 1, 2012


1)Assumes you know what the distribution is

Voila, the central limit theorem!

And it is a very powerful theorem. It doesn't matter what distribution you're talking about, with enough data you can determine the existing moments of that distribution. The point is that more data is always better if you know what to do with it. If you don't, no amount of data will suffice.
posted by Mental Wimp at 11:10 AM on June 1, 2012


The examples people are arguing about are a very narrow subset of "information".

Part of what is going on is The Law of the Instrument. ("I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.")

Math is a particularly good hammer, so it's super-tempting for mathy types to see everything under the sun as something that is best dealt with by being hammered. In fact math thinking being what it is, the first thing we'll think is the equivalent of "How do I redefine the problem so it can be viewed as a version of my hammers-and-nails model?" (Which is half of what this thread is, turning some general observations that don't lend themselves to formal analysis into some specialized subset that is tractable like that, but disregarding everything else, without even knowing you disregarded it.)

Maybe we can stipulate:

There are obviously plenty of instances in which having more datapoints is a good thing.

There are obviously plenty of instances when getting and processing more datapoints is not worth the effort and expense because you already have enough for your purposes.

There are obviously plenty of instances where "information" as it reaches us does not consist of datapoints in any reasonable statistical sense at all.

If I want to forecast the number of people that want to fly to London this summer, a zillion extra historical datapoints won't help me as much as knowing that the Olympics are on. But how much am I helped by knowing what events are happening in London this summer? Is having a longer and longer list of scheduled events always going to help me? Should I analyze Mefi to see who's planning to travel to London? Maybe if we had the Twitter data from the last Olympics and we could mine it... but the demographics of Twitter were entirely different then they are now.... and so on and on.
posted by philipy at 11:27 AM on June 1, 2012


I love when empiricists reject empirical data when it doesn't agree with their world view.

with enough data you can determine the existing moments of that distribution.

existing being the key.
posted by JPD at 11:33 AM on June 1, 2012


So now we're replacing perfectly good words (e.g., "robustness") with malformed neologisms?

Well, not quite. Anti-fragility is the same as robustness but with functionalism under the hood.

The more an anti-fragile thing is interfered with the 'better' it works. His example is a box that instead of breaking when you drop it, makes it... better(?). Or wine, or the stress on an artist, or I guess military institutions?

Of the two ideas, Robustness is the more robust concept. The thing, mutation, structure, species, it persists, does it not? It is a pretty resilient concept, strong wildly varying interpretations. Robustness is only susceptible to changes in metaphysical understandings of being, whether or not we believe Heraclitus was right about rivers, so on, but even then, it actually holds up as a decently stable idea. Never mind if it is, is it still there? No? Greg, tell the lab boys: rabbits are not robust against nuclear detonation. We need a new plan for Easter.

As for anti-fragility, it is a great deal trickier. It does not merely persist but it improves. But, what does that mean? What do you we even mean by improve? Maybe non-existence is an improvement, after all, nothing can withstand time.
posted by TwelveTwo at 11:40 AM on June 1, 2012


with enough data

With a distribution where mostly nothing much happens but there are very occasional massive events, you'll know pretty much nothing about the likelihood of the massive events until it's too late.

Sure if the system stays essentially the same for "enough" years you'll eventually build up a nice picture. But with many things of interest the system you care about will change so much so quickly that will never happen in reality.
posted by philipy at 11:42 AM on June 1, 2012


Your argument seems to be that he is saying something which is in fact false (having more information in a noisy environment takes you further away from the truth), because people will make better decisions if they believe it to be true. I think that's ridiculous. The kind of people who might conceivably benefit from such a deception aren't going to be self-critical enough to accept it in the first place. The kind of people who understand that humans are subject to cognitive biases would be better off trying to overcome those biases directly - the theory behind it is pretty straightforward, and even if you think applying it is psychologically difficult, well, so is deciding not to collect data which might be useful. I do not understand why one would waste psychic energy on the latter rather than the former.

With a distribution where mostly nothing much happens but there are very occasional massive events, you'll know pretty much nothing about the likelihood of the massive events until it's too late.

You risk not knowing that such events occurred in the first place if you sample your data infrequently enough to miss them.
posted by Acheman at 11:58 AM on June 1, 2012


Moreover, the first two paragraphs of TFA actually read like parody. I mean, sure, I'd rather be a mafia boss than some whiny Woody Allen type. Who wouldn't? Seems as good a reason as any to accept a completely nonsensical theory of information.
posted by Acheman at 12:01 PM on June 1, 2012


Sure if the system stays essentially the same for "enough" years you'll eventually build up a nice picture. But with many things of interest the system you care about will change so much so quickly that will never happen in reality.

Now raise that up one level and you're back to the value of increasing data. It's a Bayesian idea.

I think a categorical error is being made here: that data cannot solve all problems does in now way imply that more data is not better or that it is not harmful.
posted by Mental Wimp at 12:11 PM on June 1, 2012 [1 favorite]


or "...in no way..."
posted by Mental Wimp at 12:12 PM on June 1, 2012


The more an anti-fragile thing is interfered with the 'better' it works.

On the surface, that term doesn't convey that meaning and is malformed for its purpose. There is a term from biology called "hormesis" that fits this concept, but it is technical and specific. For example, if a cell is exposed to mild heat, it temporarily becomes more resistant to damage from greater levels of heat. There is also a theory of immune system development that requires challenges early on to create a robust system later. However, this clumsy term doesn't really reflect any of the subtlety of these concepts and, I suspect, stems from a rather loose conceptualization of a "pop" idea.
posted by Mental Wimp at 12:17 PM on June 1, 2012


I am actually under the impression, from his Facebook and my personal correspondence with him, that he sincerely believes he is on to something. He discusses hormesis in the book. In fact, he builds his entire theory of nutrition on that concept. But he declares hormesis merely a species of anti-fragility. My issue with it is that it is too vague to be useful. But for his purposes, that might be a feature.

His bigger goal seems to involve going after the annihilation of the belief of there ever being anything fool-proof, error-free, recession-free, crash-free, risk-free. Thus his first book being against accurate predictions, the next one was about the impossibility of estimating risk, and now this current one is about systems which function despite being failure prone. My guess is that he is trying to replace one idol with another. Out with the dream of building a frictionless society to end our suffering, in with his dream of an anti-fragile utopia, where every failure is its own reward, a utopia which builds itself, and building out of our suffering.
posted by TwelveTwo at 1:25 PM on June 1, 2012 [2 favorites]


In a Platonic world where no one ever draws any unwarranted conclusions, and where there is an infinite supply of attention and other resources to devote to handling it, it is probably true that more data is never worse.

I guess if anyone wants me to, I'll be glad to write a testimonial for any data project: "Used wisely, almost certain not to make things worse!"

Frankly I imagine that what Taleb is mostly talking about is not whether it's better to have 10,000 people in a health study rather than 5,000. It's more about how in attempting to read 100 articles about health studies you'll probably be left in the end with just a few random pieces of distorted knowledge that happened to stick and no idea what is most important for you to do.

The analogy from The Information Diet is an interesting one. There's not so much a thing as toxic substances as there are toxic doses. A big enough quantity of a necessary and normally healthy food can be very harmful if it exceeds the body's ability to process it. According to the analogy, there can be "toxic" doses of information. For example reading health blogs might be a good thing, but reading one carefully selected blog might turn out a lot better than reading a dozen, if the aim is to grasp and apply what you read.

Hypothetically in our Platonic world where people absorb, remember and properly evaluate everything they read, that wouldn't be the case. But given people's actual capabilities, whereby we already forgot half of what was said in this thread, it's likely a different story.
posted by philipy at 1:38 PM on June 1, 2012


Math is a particularly good hammer, so it's super-tempting for mathy types to see everything under the sun as something that is best dealt with by being hammered.

As a "mathy type", I would assert that when one is talking about information and data, one is talking about mathematically relevant problems. They are not all tractable, but they are all analyzable mathematically. If you want to use some squishy, vague definitions, then, by all means, knock yourself out, but the discussion will not be very helpful.
posted by Mental Wimp at 2:10 PM on June 1, 2012


existing being the key.

I meant in the mathematical sense, not in any metaphysical way. If it's a Cauchy distribution, e.g., it doesn't have an expected value. Nonetheless, it's existing moments can be estimated and, hence, the distribution characterized. So, not so much a "gotcha."
posted by Mental Wimp at 2:12 PM on June 1, 2012


If what you are saying were true quant managers could consistently outperform the market if the market is not purely rational.

well we know the market isn't rational, and we know quant managers in general aren't any better than people at outperforming. The empirical data speaks against what you keep saying. The empirical data shows that when it comes to people having more data leads to worse decision making not better.
posted by JPD at 3:00 PM on June 1, 2012


in with his dream of an anti-fragile utopia, where every failure is its own reward, a utopia which builds itself, and building out of our suffering.

Surely he must recognize the folly behind this tho. We are hard wired not to create this "anti-fragility" - his other books are all about how we aren't wired to do that correctly
posted by JPD at 3:03 PM on June 1, 2012


Mental Wimp you attitude is striking. Its striking because its literally the same exact thought process that led to the creation of the most recent financial crisis.
posted by JPD at 3:04 PM on June 1, 2012


Mental Wimp: with enough data you can determine the existing moments of that distribution.

Among the things you might learn by reading Taleb instead of just sounding off about him: the problem with estimating long tail distributions is that you get very few data points out in the tail.

JPD: Very few of his thoughts are original, so if you dislike his style there are better things to read.

Nobody likes his style, so how about sharing some recommendations? Kahneman for sure, but Taleb is drawing on a bunch of other sources too.

When I saw him speak at the Long Now lecture it was weird how suddenly he switched from pompous and abrasive during the talk to almost ingratiating in the Q&A. The guy obviously has some issues. Nevertheless he's doing a surprisingly good job of getting people to read about some complex ideas and he deserves more respect than he's getting in this thread.
posted by nixt at 3:08 PM on June 1, 2012


I've not read this book - but its Kahneman, Tversky and guys like Ariely that sort of write about behavioral finance and then Popper et al. for the higher level stuff no? Other guys like James Montier do a nice job about writing about behavioral finance for casual investors. The Anti-Fragility stuff is in a narrow sense right out of the Graham "margin of safety" concept - for that read Seth Klarmann - but again thats finance-y investor-y. Sorry my reading is pretty focused in that end of the world.

The guy obviously has some issues. Nevertheless he's doing a surprisingly good job of getting people to read about some complex ideas and he deserves more respect than he's getting in this thread.

Whatever my complaints about the guy I think "Fooled by Randomness" should be required reading at every business school in the world. I once took a job with someone I thought was an asshole, but talked myself into it partially because he had it on his bookshelf in his office. It turned out he was a huge asshole, and right before he quit I found out he had never actually read it.
posted by JPD at 3:16 PM on June 1, 2012 [2 favorites]


« Older NYC Mayor Michael Bloomberg has proposed a ban on ...  |  Science off the Sphere is a vi... Newer »


This thread has been archived and is closed to new comments