Join 3,553 readers in helping fund MetaFilter (Hide)


Word search
December 16, 2010 11:34 AM   Subscribe

Google's new Ngram Viewer lets you track the history of words in six languages, including several flavo(u)rs of English. Whether it's the rise and fall of a single word, the evolution of technology, or the mysterious seventeenth-century proliferation of fart jokes, there's a lot to play with. More at the Official Google Blog.
posted by theodolite (74 comments total) 49 users marked this as a favorite

 
A slightly more detailed description of the engine (as well as an explanation of what the hell "English One Million" is) is available here.
posted by theodolite at 11:36 AM on December 16, 2010 [1 favorite]


Sad to see the slow decline of the word 'the' during the 20th century.
posted by tavegyl at 11:38 AM on December 16, 2010


This is really, really interesting.

Uhhh...until it died.
posted by DU at 11:41 AM on December 16, 2010


Well that's pretty cool, but L. Ron Hubbard's Engram Viewer lets you hold soup cans and remove Thetans.
posted by Mayor Curley at 11:42 AM on December 16, 2010 [3 favorites]


Women declining in popularity

What happened ~1815?

Probably means something

Thanks, postwar period, but kids today, amirite?
posted by DU at 11:46 AM on December 16, 2010


How interesting... going through my lists of archaic words.
posted by BlackLeotardFront at 11:48 AM on December 16, 2010


metafilter vs. Metafilter
posted by swift at 11:49 AM on December 16, 2010


In both cases the current rise is still fairly modest. I'm going to change my rating to "Buy".
posted by Kabanos at 11:50 AM on December 16, 2010


What happened ~1815?

The initial decline from their peak use corresponds with the first Great Awakening, but I have no idea if there's a causal connection there. Interesting, though.
posted by aught at 11:50 AM on December 16, 2010


Oh man, this is gonna be great for us historians.
posted by nasreddin at 11:51 AM on December 16, 2010


Folks in the '30s keepin' it real, yo
posted by John Millikin at 11:54 AM on December 16, 2010 [1 favorite]


What happened ~1815?

People started getting the vapors.
posted by swift at 11:54 AM on December 16, 2010 [2 favorites]


This is awesome.
posted by DU at 11:57 AM on December 16, 2010


Seems pertinent
posted by found missing at 11:59 AM on December 16, 2010


People started getting the vapors.

You may be on to something.
posted by DU at 11:59 AM on December 16, 2010 [1 favorite]


Look about right
posted by DU at 12:01 PM on December 16, 2010


I'm really enjoying tracking when words became popular culture; cowboy, gangster, rock and roll,

It's like seeing history graphed.
posted by quin at 12:01 PM on December 16, 2010


Okay.
posted by quin at 12:03 PM on December 16, 2010


The technology graph is even better if you capitalise Internet (the search is case-sensitive)
posted by copley at 12:04 PM on December 16, 2010


See also.
posted by blue_beetle at 12:08 PM on December 16, 2010


Looks like the "free market" didn't start generating a whole lot of heat until the 1950s...
posted by saulgoodman at 12:09 PM on December 16, 2010 [1 favorite]


It seems that there is a lot more love and desire around these days, but always accompanied by a constant underlying level of hatred!
posted by copley at 12:11 PM on December 16, 2010


"Capitalism," too, for that matter.
posted by saulgoodman at 12:11 PM on December 16, 2010 [1 favorite]


I don't understand the Y-scale. the only appears in 6% of books?

homosexual,gay,catamite,sodomite,pederast. The bump in "homosexual" around 1945 is distinctive: Kinsey?
posted by Nelson at 12:12 PM on December 16, 2010


Occasionally one phrase simply replaces another.
posted by theodolite at 12:12 PM on December 16, 2010 [3 favorites]


fuck,shit,damn,cunt,dick,asshole,piss,poop

interesting to see how fuck and shit are making a run on damn, while poop floats steadily along.
posted by mrgrimm at 12:12 PM on December 16, 2010 [1 favorite]


Looks like douchebags are a leading indicator of major world conflict (WWI, WWII, Gulf War). I'm afraid of what the current trend suggests.
posted by Kabanos at 12:12 PM on December 16, 2010 [2 favorites]


How do you search for a phrase with more than one word? All mine are coming up zero.
posted by ZenMasterThis at 12:13 PM on December 16, 2010


And 1960 was the date that Americans began to abuse grammar in the most unforgivable way!
posted by copley at 12:14 PM on December 16, 2010 [2 favorites]


Gangsta, of course, begins to rise much after gangster. Also, check out usage of the long s: yefterday.
posted by snofoam at 12:15 PM on December 16, 2010


Looks like we're overdue for a comeback...
posted by uncleozzy at 12:17 PM on December 16, 2010


United States, England, and China
posted by swift at 12:18 PM on December 16, 2010 [1 favorite]


Definitive proof that it ain't what it used to be
posted by AFII at 12:18 PM on December 16, 2010 [1 favorite]


I am delighted that words typical of very dignified English seem to be making a comeback in the internet era, among them "lest," "whom" and "whence."
posted by Dee Xtrovert at 12:18 PM on December 16, 2010


It also turns out that extensive experience is not overused in books.
posted by snofoam at 12:21 PM on December 16, 2010 [1 favorite]


ain't what it used to be

Indeed.
posted by uncleozzy at 12:22 PM on December 16, 2010 [1 favorite]


Fart in Danish, Norwegian and Sweedish means "Speed." In latin it means "filling." In french it apparently means "Wax." In Catalan it means "Sick."

This required a bit of Googlefu (searching for the usage of the word, researching to find the author's home country, and then leveraging google translator to find the definiition...)

And yes, in 19th century english dictionaries they referred to it as breaking wind.
posted by Nanukthedog at 12:24 PM on December 16, 2010


"I don't understand the Y-scale. the only appears in 6% of books?"

I think it means that 'the' makes up 6% of the corpus of words. Which ties in with the data on this page. The word 'the' appears 62,000 out of 1,000,000 words (see list 5.6).
posted by copley at 12:25 PM on December 16, 2010 [1 favorite]


i see what you did there, google.
posted by Baby_Balrog at 12:31 PM on December 16, 2010 [26 favorites]


Oh man! This is exciting! Enough so that even though my Comcast went out right about when this post went up, I am bothering to state my excitement via my phone.
posted by cortex at 12:35 PM on December 16, 2010


cromulent,embiggen,refudiate
posted by mrgrimm at 12:43 PM on December 16, 2010


lol, wtf, omg

What was so funny?
posted by Kabanos at 12:52 PM on December 16, 2010 [1 favorite]


Two girls, one cup [SFW]
posted by copley at 12:52 PM on December 16, 2010


It makes sense that shew dropped off in the mid 1800s, but the resurgence in the 1920s and in the last 20 years is interesting.

Unrelated, the bad influences of Dorothy Parker and Bart Simpson.
posted by marco_nj at 1:08 PM on December 16, 2010


Their dataset is flawed. A search for Picasso for example shows appearances in 1840?

This book is one example. It appears to have been published in the 1960s not 1840.
posted by vacapinta at 1:08 PM on December 16, 2010


Not quite, John.
posted by skyscraper at 1:11 PM on December 16, 2010


Argggh, I meant this. Although the other one was mildly interesting.
posted by skyscraper at 1:13 PM on December 16, 2010


Oh dear. Did someone say case sensitive?

I'll stop now.
posted by skyscraper at 1:16 PM on December 16, 2010


heh.
posted by Cyclopsis Raptor at 1:31 PM on December 16, 2010


I don't understand the search for "facebook". Time travelers, or was there a prior use?
posted by swimming naked when the tide goes out at 1:43 PM on December 16, 2010


I don't understand the search for "facebook". Time travelers, or was there a prior use?

There was a prior use. Like the idea, the name, too, was appropriated.
posted by saulgoodman at 1:46 PM on December 16, 2010


I haven't read much about their dataset but given the kinds of weird data we're seeing I wouldn't trust anything pre-1800. Because results are reported as percentages, I suspect their corpus is much smaller the farther back in time you go. So a few statistically improbably words showing up in their limited sample of the written word in 1750 would show up as huge spikes that aren't really that meaningful.

This is exacerbated by the use of the "smoothing" function. If you turn that off, you get the actual data back and it reveals how incredibly noisy the data is for old books. Take a look at the fart graph with no smoothing and you'll get a decent sense of this. It's hard to know for sure without spending some time spelunking in the data but my instinct is that we see the decrease in fart usage around 1800 is because the corpus probably starts to grow significantly at that point and the occasional "fart" that snuck into their pre-1800 dataset just looks huge when it wasn't necessarily disproportionate. It might also be the case that there are biases in what made it into the corpus that causes stuff like fart to be more likely to show up in that time period than after 1800.

Take a cue from earlier in the thread, you can also take a look at "the" with no smoothing. There are probably long term shifts in how much we use articles, but clearly the data pre-1800 is just all over the place with lots of years reporting zero use of the word "the".
posted by heresiarch at 2:01 PM on December 16, 2010 [2 favorites]


heh.
posted by Cyclopsis Raptor


Add FIFA to the mix and it comes in 2nd.
posted by Kabanos at 2:14 PM on December 16, 2010 [1 favorite]


I should really not tell, but there are easter eggs.
posted by GuyZero at 2:18 PM on December 16, 2010 [5 favorites]


Anachronisms like "Internet" in 1914 are explained due to 1.-OCR errors 2.-Metadata errors (books marked with the wrong date 3.-Someone actually used the word, but in a different sense. But mostly #1 and #2.

Some of these results I'm dubious of and wondering how much confidence we should have in what they actually mean. I'll say one thing though, in terms of writing etymology, this is a huge boon. Most articles on Wikipedia could use a sentence or two about when the term came into popular usage.
posted by stbalbach at 2:35 PM on December 16, 2010


Some of these results I'm dubious of and wondering how much confidence we should have in what they actually mean.

Let me explain. For example see Middle Ages. You would look at that graph and think the term "Middle Ages" is going out of favor, it has a steep downward curve from 1850 to the present. But that's not true at all, more books are being published about the Middle Ages now than at any time in history (I know this to be true).. so why the downward trend? Because the chart is not showing absolute popularity, rather it is showing how many books in any given year use the term compared to all books published that year. So, as more books on all subhects are being published every year, the ocean of possible results keeps getting bigger, which skews the graph results. I think. Can anyone else explain this? Thus I think it's really only best used in comparative mode, comparing two or more words to see trends. A single word trend is misleading.
posted by stbalbach at 2:48 PM on December 16, 2010 [2 favorites]


For example see Middle Ages.

It's case sensitive. "Middle Ages" has a somewhat different curve.
posted by nasreddin at 2:54 PM on December 16, 2010


Soda vs Pop
posted by Hubajube at 3:01 PM on December 16, 2010


nasreddin: "It's case sensitive"

OK, thanks didn't know that. But still, since 1950 it's showing a steep downward trend, which seems wrong in an absolute sense. Every year more books are being published than ever about the Middle Ages. But here's something interesting, compare Medieval,Middle Ages which does show the right kind of trend for Medieval. Which begs the question, is "Medieval" replacing "Middle Ages" as the term of choice?
posted by stbalbach at 3:32 PM on December 16, 2010


Some of the odd numbers for older works could be from OCR errors of long s glyphs. Ex. fuck vs suck
posted by bdc34 at 3:52 PM on December 16, 2010 [2 favorites]


One thing I know for sure, Germans never forgot their first taste of kugel
posted by Mchelly at 7:19 PM on December 16, 2010 [1 favorite]


Some of the odd numbers for older works could be from OCR errors

I typed in "kahlua," clicked on "Search in Google Books" for 1500-1972 and from there clicked on "19th Century."

The first result was from The Complete Poetical Works of Henry Wadsworth Longfellow, Page 306:

... And the wedding guests assembled, Clad in all their richest raiment, Robes of fur and belts of wampum, .Splendid with their paint and plumage, Beautiful with beads and tassels. First they ate the sturgeon, Kahlua, And the pike, ...

So I clicked through, because seriously, who serves Kahlua with fish?
posted by merelyglib at 7:22 PM on December 16, 2010


Aether, Phlogiston
posted by OverlappingElvis at 7:52 PM on December 16, 2010


heh.
posted by Cyclopsis Raptor


You get a very different graph if you use acronyms for all sports and not just NASCAR.
posted by batou_ at 1:13 AM on December 17, 2010


What happened ~1815?

Here's a possible clue.
posted by Short Attention Sp at 4:04 AM on December 17, 2010


Note that this tool also lets you pinpoint the origin of various phrases with disputed or apocryphal origins, such as "the whole nine yards." The result parallels the best scholarship which suggests a 1960s origin. Note that to do this, after an initial long-timeframe search, you should narrow the window and reduce the smoothing to zero, because otherwise the smoothing can produce a spuriously early starting point. Ngram seems to have a limit of five words, so you need to truncate longer phrases. For example, "it ain't over till the fat lady sings" works if you search for "till the fat lady sing.s"
See also:
keeping up with the Joneses (originated in a 1913 cartoon)
piss and vinegar (from Steinbeck's Grapes of Wrath)
jazz
The Big Apple
posted by beagle at 5:45 AM on December 17, 2010


"horse,bicycle,train,bus,car"
posted by Nossidge at 8:26 AM on December 17, 2010


You get a very different graph if you use acronyms for all sports and not just NASCAR.

neither one is a very good indicator of literary output. the acronyms NBA, NHL, and NFL are used much more often than MLB to refer to their sports.

baseball,football,basketball,soccer,hockey,tennis,golf

football is of course #1 because it is international soccer.

but baseball still likely reigns supreme as the most reference U.S. sport.

then again, it cuts both ways. "baseball" is used much more to connote mlb than football is to mean "nfl."

interesting to see the dip in all sports 1940-1960. i suppose there were some other important things going on at that time... and as expected, not many writers give two shits about hockey.

also, tennis is the anomaly. while all the sports mostly rise and fall in parallel, tennis bucks the trend in the '80s then gets passed by golf in the '90s.

as always, I blame Tiger Woods.
posted by mrgrimm at 1:59 PM on December 17, 2010


sigh
posted by jammer at 3:00 PM on December 17, 2010 [1 favorite]


Plato,Aristotle,Descartes,Hume,Kant,Locke,Berkeley,Spinoza,Leibniz
Hume's histories used to be his most influential books, so his early popularity may be that rather than philosophy.
"Berkeley" gets a big bump at end of 20th century, but it means the place/university now too.

Leonardo,Michelangelo,Donatello,Raphael
I am surprised to see Raphael so much more popular than the others in 1800-1900; maybe there are other figures using that name that I'm not thinking of.
Not much of a bump in the early 90s from the comic

Civil War,War Between the States,War of Northern Aggression
First of all, notice where the curve is where the war starts getting written about a lot - it takes a while;
Second, "Civil War" is dominant in their corpus right from the start; (unless there's another name I'm not thinking of)
Third, check out the little bump for "War Between the States" around the time of WWII and the beginning of integration/the civil rights movement

man,Man,God
Between 1690 and 1750 or so, "man" and "Man" change places in popularity, then they change back. (I assume at least some fraction of the "man" uses are for the same use as "Man" ie mankind) So - a typographic vogue? an artifact of the data set?

nature,technology

husband,father,wife,mother

family,children,love,sex
Trends of love vs sex, and family+children vs love.

Elvis,Michael Jackson,Beatles,Marilyn Monroe
Elvis blows everyone else away, Michael Jackson surprisingly low

Mrs,Ms,Miss
Ms not making as much headway as I would have thought, although - why were Mrs and Miss declining so much from 1930-1965 or so? Are there just fewer books with female characters referred to by full title?
posted by LobsterMitten at 12:26 AM on December 18, 2010


Worrying correlation between cheese and hate.
posted by pica at 2:52 AM on December 18, 2010 [1 favorite]


Michael Jackson surprisingly low
Remember, this is searching books, not magazines, not online usage.

So, for example, in google,facebook,ebay,amazon, Facebook is "surprisingly low" — the search doesn't reflect the dominance in traffic and usage that Facebook currently has, because so far, much more has been written in books about Google, Ebay and Amazon than Facebook.
posted by beagle at 2:18 PM on December 18, 2010


Damnit, I was making a post about this (a little late!).

Notably, Mark Davies of Brigham Young University claims that size doesn't matter all too much, arguing that while his 400 million word Corpus of Historical American English (funded by the National Endowment for the Humanities) is much smaller, it is more versatile.
posted by Defenestrator at 1:32 PM on December 19, 2010 [2 favorites]


Another interesting article about the limitations.
posted by Defenestrator at 1:35 PM on December 19, 2010 [1 favorite]


« Older Favimon: Like Pokemon with favicons. [via mefi pro...  |  A busy day for Google, as it s... Newer »


This thread has been archived and is closed to new comments