Human language reveals a universal positivity bias
February 10, 2015 8:41 AM   Subscribe

Or so say researchers in a new study in the February 9 online edition of the Proceedings of the National Academy of Sciences. Here's their paper's abstract: "Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (i) the words of natural human language possess a universal positivity bias, (ii) the estimated emotional content of words is consistent between languages under translation, and (iii) this positivity bias is strongly independent of frequency of word use. Alongside these general regularities, we describe interlanguage variations in the emotional spectrum of languages that allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts." And here are descriptions of the research in Science Daily and the LA Times.
posted by Sir Rinse (43 comments total) 11 users marked this as a favorite
 
We order languages from relatively most positive (Spanish) to relatively
least positive (Chinese);
It's funny that these are the two ends of that spectrum. They are also extremes in syllables per second.
posted by Jpfed at 8:49 AM on February 10, 2015 [1 favorite]


I'm only three sentences in and am already not impressed. Note the lack of anybody in a linguistics department on the study. And their "linguistically and culturally diverse languages" include 6 (!) Indo-European languages (English, Spanish, French, German, Brazilian Portuguese, Russian), and then "Chinese" [sic] and "Arabic" [sic] (two descriptors for a wide range of languages and dialects) and Korean and Indonesian. For those of you playing along at home, this is, I don't know, grabbing the DNA samples from a few great apes, a frog, and a bird and saying something about the diversity of life.

Yeesh. Their models might be saying something, but it's sure as heck not anything universal. I'm sure language log will have a (not positive) post later on this.
posted by damayanti at 8:51 AM on February 10, 2015 [13 favorites]


Human language reveals a universal positivity bias

Yeah, right.
posted by Greg_Ace at 8:53 AM on February 10, 2015 [28 favorites]


Time to give my favorite Sidney Morgenbesser story (Morgenbesser taught philosophy at Columbia University from 1955 to 1999). He was a legendarily sharp, quick, borscht-belt inflected wit.

Morgenbesser was at a seminar by a visiting Luminary, from Oxford, the philosopher J. L. Austin. During a talk on the philosophy of language at Columbia in the 50's, Austin noted that while a double negative amounts to a positive, never does a double positive amount to a negative. From the audience, a familiar nasal voice muttered a dismissive, ''Yeah, yeah.''
posted by lalochezia at 8:54 AM on February 10, 2015 [13 favorites]


curse you, Greg Ace
posted by lalochezia at 8:55 AM on February 10, 2015 [2 favorites]


Alright, looking up the co-authors, I see one of them is a computational linguist, who does work on syntax/semantics and social networks, and a couple other work at MITRE which does some comp-ling intelligence work, so I'll give them that. But the majority of the people on this paper are applied mathematicians, and there's nobody who works on typology, which is critical.
posted by damayanti at 9:04 AM on February 10, 2015 [1 favorite]


damayanti: there's nobody who works on typology, which is critical.
Explain, please?
posted by IAmBroom at 9:08 AM on February 10, 2015


At one point, I "learned" that Hungarian was the only language in the world that had a two-syllable word for yes (igen). Whether or not that is true, it seems to both highlight the universal positivity of languages in general and, of course, the very unique position of both the Hungarian language and mentality. A magyarnak nincsen pàrja!
posted by artichoke_enthusiast at 9:13 AM on February 10, 2015


This is a derail but the mention of Morgenbesser reminds me of a story reported by Raymond Smullyan. After a talk by B.F. Skinner, Morgenbesser asked "Let me see if I have this straight. Your position is that we shouldn't anthropomorphize people?"
posted by thelonius at 9:15 AM on February 10, 2015 [8 favorites]


Can "correct" be considered a two-sylable word for "yes"?
posted by I-baLL at 9:15 AM on February 10, 2015


IAmBroom, "typology" is the study of how languages are and aren't alike. We might talk about typology in sentence structure, for example-- some languages put the verb at the beginning of the sentence, some languages put it at the end, and what does all of that tell us about universals about how languages can vary.

What they're doing is making a broad claim-- languages use happy words more-- without considering the breadth of variation available in, in this case, the semantics-- the meaning systems of the language. At the very least you want somebody who's worked on looking at all of the ways that languages can vary in expressing attitudes and emotions to do something like this. They don't.

And the fact that they consider their language sample to be "diverse" shows the problem with this: 6 of the languages are genetically related (meaning that any similarity between them is likely to be due to inheritance, not any sort of universal fact about human language). That's a huge red flag.
posted by damayanti at 9:15 AM on February 10, 2015 [2 favorites]


(ah, the Skinner anecdote is in the NYT obit too)
posted by thelonius at 9:17 AM on February 10, 2015


Can "correct" be considered a two-sylable word for "yes"?

how about OK?
posted by Billiken at 9:19 AM on February 10, 2015


how about OK?

Uh-huh.
posted by kewb at 9:28 AM on February 10, 2015 [1 favorite]


Problems:

1. Their analysis is sort of technically impressive, but I couldn't find an actual working definition of "positive," just a sort of equivocation with happiness, which is measured by "paid native speakers [rating] how they felt in response to individual words on a nine-point scale, with 1 corresponding to most negative or saddest, 5 to neutral, and 9 to most positive or happiest (10, 18) (SI Appendix)."

2. And they presented these paid consultants with words based on word frequency as determined by an idiosyncratic sampling of text, excluding actual speech for no obvious reason aside from convenience.

3. Their sampling method seems both arbitrary and distinctly biased toward language that comes from commercial sources: "...spanning books (14), news outlets, social
media, the web (15), television and movie subtitles, and music lyrics (16)."

This all sounds like bullshit to me, quite honestly. They might be finding something real here, but it's utterly obscured by an unaccountably peculiar and tendentious theoretical framework. They're assuming the soundness of these theoretical constructs which haven't been shown to have any external validity and then making wild, grand conclusions based on their operationalization.
posted by clockzero at 9:29 AM on February 10, 2015 [5 favorites]


What about "yeah?" Yeeeeeeeah, "yeah" can have an infinite number of syllables.
posted by Don Pepino at 9:49 AM on February 10, 2015


sorry for minor derail, but:

Can "correct" be considered a two-sylable word for "yes"?

how about OK?

What about "yeah?" Yeeeeeeeah, "yeah" can have an infinite number of syllables.


Yeah, so, the word for "yes" in English is "yes"; not OK, correct, affirmative, okey-dokey, etc.... those are synonyms and I'm sure someone who knows more about linguistics could break this down for us, but the point is that in Hungarian, "igen", is the word for yes. It does not mean OK, correct, okey-dokey, yeahhhh, etc.

That said; a quick internet search reveals that numerous languages have multisyllabic words for yes, with multiple languages having three syllable yeses. So, as much as my Hungarian teacher would be disappointed to find out, Hungarian is not alone in making it more challenging to be positive (igen) than negative (nem).
posted by artichoke_enthusiast at 10:00 AM on February 10, 2015 [1 favorite]


I-baLL:
"Can "correct" be considered a two-sylable word for "yes"?"
Damn straight.
posted by Hairy Lobster at 10:01 AM on February 10, 2015


Indubitably!
posted by The Nutmeg of Consolation at 10:12 AM on February 10, 2015


and then "Chinese" [sic] and "Arabic" [sic] (two descriptors for a wide range of languages and dialects) and Korean and Indonesian.

This is incorrect. Arabic is a language. I don't know in which way it isnt.
posted by MisantropicPainforest at 10:14 AM on February 10, 2015


Next, they paid native speakers to rate all these frequently-used words on a nine-point scale from a deeply frowning face to a broadly smiling one

This is the part I'd want to hear more about. I'd expect that some of these associations could vary situationally. How many native speakers' evaluations did they get for each word? Where did they find them?
posted by aubilenon at 10:19 AM on February 10, 2015 [1 favorite]


This is incorrect. Arabic is a language. I don't know in which way it isnt.

"Arabic" comprises many variations, not all of which are mutually intelligible with one another. No serious paper would fail to deal with this. If we're only dealing with Modern Standard Arabic, then we should say so.
posted by Sticherbeast at 10:27 AM on February 10, 2015 [4 favorites]


This is incorrect. Arabic is a language. I don't know in which way it isnt.


"Arabic" covers a wide range of things which are grouped together for various political reasons -- one language = one pan-Arabic identity. See also "German" and "Chinese", both of which have varieties which are not mutually intelligible, but are considered to be "one language", again, for political and ideological reasons.

But, in reality, there are a whole host of different Arabics. There's Classical Arabic. There's Modern Standard Arabic, which is used in formal situations -- things like school, newspapers, etc. And then a whole host of spoken dialects, many of which are, again, mutually intelligible. If you put a speaker of Moroccan Arabic in a room with somebody from the Persian Gulf, they're going to have a hard time talking to each other. Most linguists working on Arabic would define what variety they're working on-- Classical, Modern Standard, Egyptian Arabic, etc.

For the record, it looks like what they did here was gather a lot of probably mostly Modern Standard Arabic (as it's a written corpus) and had, based on their appendix, mostly speakers of Egyptian Arabic judge those words for "Happiness".
posted by damayanti at 10:34 AM on February 10, 2015 [5 favorites]


I know, I speak Fusha and Egyptian Arabic. But Arabic is indeed a language, not a familiy of languages. That doesn't mean its not a diglossia, or that it doesn't have dialects. Keep in mind there is no objective and agreed upon criteria for judging what is and what is not a dialect or distinct language.
posted by MisantropicPainforest at 10:43 AM on February 10, 2015


I'm not positive about this.
posted by Faint of Butt at 10:45 AM on February 10, 2015 [1 favorite]


I've noticed the same bias with "Wheel Of Fortune" puzzles.
posted by grumpybear69 at 10:48 AM on February 10, 2015


I know, I speak Fusha and Egyptian Arabic. But Arabic is indeed a language, not a familiy of languages. That doesn't mean its not a diglossia, or that it doesn't have dialects. Keep in mind there is no objective and agreed upon criteria for judging what is and what is not a dialect or distinct language.

The diglossia is exactly why a serious paper would clarify beyond the general term "Arabic". As it stands, it's sort of like seeing a medical paper using the term "head cancer".
posted by Sticherbeast at 10:51 AM on February 10, 2015 [6 favorites]


I know, I speak Fusha and Egyptian Arabic. But Arabic is indeed a language, not a familiy of languages. That doesn't mean its not a diglossia, or that it doesn't have dialects. Keep in mind there is no objective and agreed upon criteria for judging what is and what is not a dialect or distinct language.


There isn't a clear criteria, but there is a continua where we have "Things that we clearly want to call one language" and "Things that are probably more than one language". Arabic is pretty clearly towards the "probably more than one language" end. Which you showed yourself: you said you speak "Fusha and Egyptian Arabic". Most (white) English speakers wouldn't describe themselves as speaking, say "Cleveland and Metropolitan New York English".

Despite all that, the key point I was trying to make was that, as Sticherbeast noted, in a serious linguistic paper, a researcher would note what variety of Arabic they're working on. (And, on preview, beat me to it, again!)
posted by damayanti at 10:54 AM on February 10, 2015 [1 favorite]


Most (white) American English speakers wouldn't describe themselves as speaking, say "Cleveland and Metropolitan New York English".

But some American English speakers would describe themselves as speaking African-American Vernacular English and Standard American English. Just as some British English speakers would describe themselves as speaking Scots (or Glaswegian, perhaps) and Standard British English, or some German speakers would describe themselves as speaking Plattdeutsch and Hochdeutsch.
posted by Pseudonymous Cognomen at 12:27 PM on February 10, 2015


Japanese also has a two syllable "yes": はい (hai), although it sounds more like one syllable to English speakers.
posted by bashos_frog at 4:59 PM on February 10, 2015 [1 favorite]


Averaging these, in English for example, "laughter" rated 8.50, "food" 7.44, "truck" 5.48

…"food truck" rated 9.2 when subjects were surveyed while hungry, but only 6.1 after they'd eaten…
posted by Lexica at 5:31 PM on February 10, 2015 [1 favorite]


No.
posted by clvrmnky at 5:35 PM on February 10, 2015


Japanese also has a two syllable "yes": はい (hai), although it sounds more like one syllable to English speakers.

Definitely two morae, but the number of syllables is still being debated. The current mainstream thinking (after McCawley and so on) would see it as one syllable, but recently some linguists (notably Labrune) are arguing that actually the idea of a syllable isn't relevant to Japanese after all (so that the question "how many syllables does 'hai' have?" wouldn't even be well defined).
posted by No-sword at 6:45 PM on February 10, 2015 [2 favorites]


But some American English speakers would describe themselves as speaking African-American Vernacular English and Standard American English. Just as some British English speakers would describe themselves as speaking Scots (or Glaswegian, perhaps) and Standard British English, or some German speakers would describe themselves as speaking Plattdeutsch and Hochdeutsch.

I said *most white* speakers, but in any case, to clarify my position:

All languages have fuzzy edges. For English, this includes most of the "World Englishes" and other varieties that have had significant substrate influence, including Scots and AAVE. I also mentioned German upthread as another one of those varieties that (like Chinese and Arabic) is often mentioned as a key "language" that makes it hard to define "language".

However Arabic represents a more extreme example than English, or German. See, for example, the Wiki page on Arabic Phonology which explicitly states that it's dealing with MSA, and redirects to other pages for other varieties. On the other hand, the page for English Phonology also has a disclaimer, but it's a softer one, noting that the phonology across varieties is relatively stable.

I'll readily admit that the author's should have also indicated Standard American English, High German, etc. But the exclusion of any modifiers for "Arabic" is much, much, more noteworthy (and lowers the credibility of the authors far more than the exclusion of the others).
posted by damayanti at 7:00 PM on February 10, 2015


Arabic has been the lingua franca across a large area for 1300 years, English for less than 400, so it's understandable that Arabic is more diverse than English - though Arabic is less diverse, perhaps, than if it hadn't been the language of a major unifying religion. The inclusion of some closely related languages (French and Spanish, English and German) and some mostly unrelated (Indonesian) lets them control for historical and cultural effects.

But in the end, "if you can't say something nice, don't say anything at all." Few people enjoy the company of habitual complainers. So really, this finding is not surprising.

I was more interested in the analysis of e.g. Moby Dick. Computers are usually lousy at things like that. I can see someone trying to use a refined version of this dataset to help computers analyse (and eventually design) speeches. I can see marketers using it on conversations slurped up from "smart" TVs to figure out viewer reactions, although that might depend on a bit better voice recognition than we have now. Politicians and corporations will find a way to monetise it.

Hmm, maybe not so positive after all.

Weighting of words is an interesting thought. Depending on the people in the study, words like "conservative" and "liberal" could vary considerably, and the weighting between, say, Americans and Australians might be instructive.
posted by Autumn Leaf at 5:13 AM on February 11, 2015


It just amazes me how many of these big data studies there are lately that attempt to measure sentiment entirely devoid of context. And to no point whatsoever. I'm still trying to figure out *what* they are trying to say, and *why* they are trying to say it. What does it contribute to our understanding of sociolinguistics (because that is essentially what this is, but for it being stripped of all of the social and all of the linguistics)?
posted by iamkimiam at 5:33 AM on February 11, 2015 [1 favorite]


This effect has been known since at least the 1950s. Zajonc provides an excellent overview in his 1968 monograph of the mere exposure effect. This work led to an entire field of research in subliminal exposure and liking judgments.

What's new in the PNAS study is the huge cross-cultural corpus. Interestingly, they trace their inspiration only as far as the late-60s work on the Pollyanna principle, rather than digging deeper to find the true origins of the idea of linguistic positivity bias.
posted by phenylphenol at 6:46 AM on February 11, 2015 [1 favorite]


I said *most white* speakers

You still mean "most white American speakers" (white people in other countries also speak English)--and I'd argue that in any case while most Americans might not *describe* themselves as speaking "X English and Y English" in practice many do, anyway. For instance Southerners who go to university in the North are more likely to adopt a neutral dialect; they may think of it as "accent reduction", but when it includes suppression of dialectal variants (like "fixing to" for "about to", etc) then it's more than just accent.
posted by Pseudonymous Cognomen at 8:45 AM on February 11, 2015


You know what? I rescind, or at least redirect my previous skepticism.

The average IMDB score is 6.38/10. The average Yelp review is 3.8/5. The average Pitchfork review is 7.2/10. This sort of thing happens all over the place.

It kind of goes against my perception of controversy and snarking and so on, but I guess people really do prefer talking about things they like.


Actually I just tried this myself, with some negative or critical comments. So even forgetting about whatever's biasing reviews, it looks like there's a thing going on where we use positive words to express negative sentiment.

This all sounds like bullshit to me, quite honestly

Let's see. "like" "quite" and "honestly" might be said to have positive connotations. "bullshit" has negative connotations. Overall: a statement of positivity!

It just amazes me how many of these big data studies there are lately that attempt to measure sentiment entirely devoid of context. And to no point whatsoever.

Positive: { just, amazes, many, big, studies, sentiment, entirely, context, point }
Negative: { devoid, no }

Given this I'm a little surprised their average wasn't much higher than it was. The fact that they're able to get some "shape" from stories with this approach suggests that it's not just, statistically speaking, hooey. But there's lots of statistically significant patterns that aren't important or useful. (You could probably also see some structure by plotting the relative frequencies of different verb tenses over the course of a story. But so what?)
posted by aubilenon at 12:27 PM on February 11, 2015 [1 favorite]


I'm only three sentences in and am already not impressed. Note the lack of anybody in a linguistics department on the study

two sentences in and same
posted by effugas at 4:32 PM on February 11, 2015


It just amazes me how many of these big data studies there are lately that attempt to measure sentiment entirely devoid of context. And to no point whatsoever. I'm still trying to figure out *what* they are trying to say, and *why* they are trying to say it.

Oh, sentiment analysis is all about mining Twitter (and maybe other corpora) for stock tips. Seriously. ALL HAIL THE MARKET
posted by effugas at 5:35 PM on February 11, 2015 [1 favorite]


effugas: "I'm only three sentences in and am already not impressed. Note the lack of anybody in a linguistics department on the study

two sentences in and same
"

I didnt even have to read the article ...

the blurb was enough. :)
posted by TheLittlePrince at 9:39 AM on February 12, 2015




« Older The spectrum of animal happiness   |   ಠ_ಠ Newer »


This thread has been archived and is closed to new comments