Percentiles of Vividness
July 11, 2018 7:43 AM   Subscribe

The dashing and grizzled reader rubbed his angular, bestubbled chin. The site had yielded its content without objection - part craven drum beating for a product, yet perhaps just compelling enough to stroke the fires of interest amongst the percipient denizens of Metafilter. Who cares about the best novels? Or the most influential? What if you could automatically rank 5000 novels by vividness?
posted by AndrewStephens (18 comments total) 6 users marked this as a favorite
 
Vividness
The most vivid writing always invokes a sensory experience, summoning a world of images, sounds, smells, flavors and textures, and bringing the reader viscerally into the hypnotic trance of the story.

So we devised a new kind of linguistic metric, which we call “vividness,” to measure the relative intensity of the sensory language in any piece of writing.

We trained our linguistic algorithms by analyzing all 270 million words of prose in our library, identifying the most vivid nouns, verbs, and adjectives in each book, and scoring each word on a scale of 1 to 10 according to the intensity of its vividness.

For example, the word “dewdrop” has a vividness score of 9.5, and the word “eyelash” has a score of 6.3.
eh?
posted by misteraitch at 7:53 AM on July 11, 2018


where's the "how much weed have i smoked" slider

coz thats muy, tres, very, بہت, rất, 非常, बहुत, important, bro
posted by lalochezia at 7:54 AM on July 11, 2018 [1 favorite]


I would like to see an analysis of articles - indefinite versus definite articles. That way I could determine if the author was taking a stand on the specificity of the nouns that they use.
posted by njohnson23 at 8:06 AM on July 11, 2018




where's the "how much weed have i smoked" slider

I give you a hamburger.
posted by capricorn at 8:11 AM on July 11, 2018 [3 favorites]


Are they selling analysis or the analysis software?
posted by pracowity at 8:13 AM on July 11, 2018


pracowity - the latter, I think, as part of a larger application aimed at writers.
posted by misteraitch at 8:19 AM on July 11, 2018


I am torn between “what a load” and “any list that puts Only Begotten Daughter in the 99th percentile can’t be all bad.”
posted by doubtfulpalace at 8:24 AM on July 11, 2018 [1 favorite]


and Jane Austen scores 6 of the bottom 10 spots!

I love Austen - apparently I hate vividness :)

(or maybe one can love vividness sometimes, and love arch social commentary other times.)
posted by jb at 8:28 AM on July 11, 2018


I wish they gave more details on how they developed the algorithm. Surely crowdsourcing was a part of a it. The claim that an algorithm could automatically detect what makes a big sensory impact to a human reader doesn't make any sense. Also, how did they get the texts of these books? Did they purchase them all as ebooks? And what an odd mishmash of books -- everything from video game novelizations to classics. So many questions.

That said, the rankings do make some intuitive sense to me.
posted by treepour at 8:49 AM on July 11, 2018


A quick survey of some of the Most Vivid Words (diarrhea, windpipe, bikini, saliva, vomit, disgusting, jaundiced, mucus, squirts, latex, hypodermic, hindquarters) suggests that this was a system designed expressly to validate Chuck Palahniuk, possibly by Chuck Palahniuk.
posted by Iridic at 8:56 AM on July 11, 2018 [18 favorites]


As someone with aphantasia, this is a good list of books for me to avoid. Blah blah scene description. Where’s the dialogue?
posted by greermahoney at 8:59 AM on July 11, 2018 [1 favorite]


I want some actual linguists to get in on this, but it's interesting whenever I see a pitch for something like this as a product because there's this huge, huge field of computational linguistics doing this stuff essentially for research and sport.

A common example of this kind of thing is sentiment analysis, assigning positive and negative aspects to a bunch of words and structures and then parsing a document for the overall "sentiment" of the writing. And the thing is: it works! Kinda! And whether you want to fixate on the "it works" part or the "kinda" part probably depends a lot on whether you're trying to sell it to someone, because both of these things are true. You can make a decent guess, with a good sentiment analysis setup, of whether a given piece of writing tends toward a positive or negative sentiment, so long as there's no confounding factors like context specific language or ironic/sarcastic tone or complicated rhetoric or all the other things human beings do with their language constantly. You could even try to control for that stuff in a given context at the expense of de-generalizing your approach, e.g. a sentiment analysis of a routinely snarky chatroom would need to be tuned different than sentiment analysis of a customer complaint line.

A more general take on the idea is to treat language as a high-dimensional vector space. That is, pick every meaningful attribute you want to analyze in a piece of writing. That's a dimension you're tracking. Maybe you're tracking two (e.g. a simple sentiment binary, "positive" and "negative", or as has frequently and unconvincingly been done, "masculine" and "feminine"). Maybe you're tracking a handful (maybe a broader ranges of emotions, "anger", "sadness", "joy", "confusion", and so on). Maybe you're tracking ten thousand different narrow semantic concepts ("car", "hoofed mammal", "corporateness", "diaphaneity", etc).

Whatever the dimensions, you go through your text and you count up each instance of that dimension. Each dimension can be thought of as a vector: as, literally, an arrow sticking out in some random direction from a center point, and the length of that arrow is equal to the count of items that fit that dimension. In the end you end up with a ball of arrows of various lengths pointing out in every direction. Every piece of writing will have a different ball of vectors in the end, as a kind of signature for the content of that piece.

Maybe, as they suggest in this case, you attach a weight to each instance—"horrific" gets a 10 for negative, say, while "meh" gets a 1—and so the counting is a little more subtle. But in any case, you run the text through, you get a set of score, et, voila: you've converted your text into a mechanical assessment of the various quantified aspects of the text. This one's got a long vector labeled "positive"; this other one has a short one. That history text there has a small "vivid" vector, yon pulp novel has a long one.

It's a useful approach for doing the specific thing it does: getting a rough picture of some semantic dimensions of a text according to whatever specific semantic biases you throw at it. It's not so useful for attaching any meaning or context to that rough picture; it certainly can't tell you if the writing is any good, and that feels like one of the classic traps of "use this tool to be a better writer" that gets pitched with this sort of thing. This tool will absolutely help you become a more vivid writer for the definition of "vivid" that means "uses a high density of words that we assigned high vividity weights to in our training data", and...that's all? By itself that's not gonna get you anywhere.

Like, I think the general idea of "vividness" produced here is a decent bit of semantic sorting, as far as it goes; Chuck Palahniuk, for sure! Books full of visceral spectacle. But Nobody would like Chuck Palahniuk's books—him showing up on this list would be meaningless, a "what? by who?" reaction—if they were all vomit and no compelling narrative conceits or offkilter characters, and it's highly unlikely the vividity and vomit of his stories was other than central to his writing style and goals in the first place.
posted by cortex at 9:21 AM on July 11, 2018 [6 favorites]


(Sees Ray Bradbury featured prominently at the top of the vividness scale. Nods approvingly.)
posted by DrAstroZoom at 12:21 PM on July 11, 2018


For example, the word “dewdrop” has a vividness score of 9.5, and the word “eyelash” has a score of 6.3.

So if your hero has tears glistening under his eyelashes like dewdrops, your writing will be like 100% vivid and you can start clearing your shelves for all the awards that will follow. Awards, meaning kudos on AO3.
posted by betweenthebars at 1:32 PM on July 11, 2018


Í would be more encouraged by this device designed to tell me when I’m overusing the passive voice, if the creators had any idea what the passive voice actually was.

On their explanation page, they give a page of a novel by Nick Hornby in which all the forms of the word “to be” are highlighted as examples of the “passive voice.”. None actually are. (The closest thing is “the drawing was hung over the fireplace” and I would argue that “was hung” is not here a passive voice verb, it’s just a copula and a participle.). But even if we grant that as a real passive, that’s one out of 11 flagged items which actually represents what it’s supposed to represent.

Even if you grant that using the passive voice is a problem, which not everyone would.

I appreciate the creator’s effort and enthusiasm but before you do tremendous amounts of calculation on tremendous amounts of literature, maybe talk to an actual English professor for a few minutes? I bet a lot of them would be super enthusiastic about learning things this way and would be very helpful, if cautious in drawing conclusions!

Or is this kind of technology supposed to obviate English professors?
posted by edheil at 5:22 PM on July 11, 2018 [2 favorites]


You can make a decent guess, with a good sentiment analysis setup, of whether a given piece of writing tends toward a positive or negative sentiment, so long as there's no confounding factors like context specific language or ironic/sarcastic tone or complicated rhetoric or all the other things human beings do with their language constantly. You could even try to control for that stuff in a given context at the expense of de-generalizing your approach, e.g. a sentiment analysis of a routinely snarky chatroom would need to be tuned different than sentiment analysis of a customer complaint line.

Oh no, Kimmy, the Internet doesn't talk like that. The Internet talks like Chandler.
posted by jonp72 at 7:50 PM on July 11, 2018


So vividness is a fixed, objective property of individual words, irrespective of structure, context, meaning, style, audience...? Doesn’t matter what I do with the words, only which ones I concatenate?
posted by Segundus at 9:40 AM on July 12, 2018


« Older The American Middle Child is now an endangered...   |   Let Someone Other Than The QB Throw The Ball Newer »


This thread has been archived and is closed to new comments