A slightly more detailed description of the engine (as well as an explanation of what the hell "English One Million" is) is available here. posted by theodolite at 11:36 AM on December 16, 2010 [1 favorite]
Sad to see the slow decline of the word 'the' during the 20th century. posted by tavegyl at 11:38 AM on December 16, 2010
This is really, really interesting.
Uhhh...until it died. posted by DU at 11:41 AM on December 16, 2010
Well that's pretty cool, but L. Ron Hubbard's Engram Viewer lets you hold soup cans and remove Thetans. posted by Mayor Curley at 11:42 AM on December 16, 2010 [3 favorites]
In both cases the current rise is still fairly modest. I'm going to change my rating to "Buy". posted by Kabanos at 11:50 AM on December 16, 2010
What happened ~1815?
The initial decline from their peak use corresponds with the first Great Awakening, but I have no idea if there's a causal connection there. Interesting, though. posted by aught at 11:50 AM on December 16, 2010
Oh man, this is gonna be great for us historians. posted by nasreddin at 11:51 AM on December 16, 2010
interesting to see how fuck and shit are making a run on damn, while poop floats steadily along. posted by mrgrimm at 12:12 PM on December 16, 2010 [1 favorite]
Looks like douchebags are a leading indicator of major world conflict (WWI, WWII, Gulf War). I'm afraid of what the current trend suggests. posted by Kabanos at 12:12 PM on December 16, 2010 [2 favorites]
How do you search for a phrase with more than one word? All mine are coming up zero. posted by ZenMasterThis at 12:13 PM on December 16, 2010
Gangsta, of course, begins to rise much after gangster. Also, check out usage of the long s: yefterday. posted by snofoam at 12:15 PM on December 16, 2010
I am delighted that words typical of very dignified English seem to be making a comeback in the internet era, among them "lest," "whom" and "whence." posted by Dee Xtrovert at 12:18 PM on December 16, 2010
Fart in Danish, Norwegian and Sweedish means "Speed." In latin it means "filling." In french it apparently means "Wax." In Catalan it means "Sick."
This required a bit of Googlefu (searching for the usage of the word, researching to find the author's home country, and then leveraging google translator to find the definiition...)
And yes, in 19th century english dictionaries they referred to it as breaking wind. posted by Nanukthedog at 12:24 PM on December 16, 2010
"I don't understand the Y-scale. the only appears in 6% of books?"
I think it means that 'the' makes up 6% of the corpus of words. Which ties in with the data on this page. The word 'the' appears 62,000 out of 1,000,000 words (see list 5.6). posted by copley at 12:25 PM on December 16, 2010 [1 favorite]
Oh man! This is exciting! Enough so that even though my Comcast went out right about when this post went up, I am bothering to state my excitement via my phone. posted by cortex at 12:35 PM on December 16, 2010
I don't understand the search for "facebook". Time travelers, or was there a prior use?
There was a prior use. Like the idea, the name, too, was appropriated. posted by saulgoodman at 1:46 PM on December 16, 2010
I haven't read much about their dataset but given the kinds of weird data we're seeing I wouldn't trust anything pre-1800. Because results are reported as percentages, I suspect their corpus is much smaller the farther back in time you go. So a few statistically improbably words showing up in their limited sample of the written word in 1750 would show up as huge spikes that aren't really that meaningful.
This is exacerbated by the use of the "smoothing" function. If you turn that off, you get the actual data back and it reveals how incredibly noisy the data is for old books. Take a look at the fart graph with no smoothing and you'll get a decent sense of this. It's hard to know for sure without spending some time spelunking in the data but my instinct is that we see the decrease in fart usage around 1800 is because the corpus probably starts to grow significantly at that point and the occasional "fart" that snuck into their pre-1800 dataset just looks huge when it wasn't necessarily disproportionate. It might also be the case that there are biases in what made it into the corpus that causes stuff like fart to be more likely to show up in that time period than after 1800.
Take a cue from earlier in the thread, you can also take a look at "the" with no smoothing. There are probably long term shifts in how much we use articles, but clearly the data pre-1800 is just all over the place with lots of years reporting zero use of the word "the". posted by heresiarch at 2:01 PM on December 16, 2010 [2 favorites]
heh.
posted by Cyclopsis Raptor
Add FIFA to the mix and it comes in 2nd. posted by Kabanos at 2:14 PM on December 16, 2010 [1 favorite]
Anachronisms like "Internet" in 1914 are explained due to 1.-OCR errors 2.-Metadata errors (books marked with the wrong date 3.-Someone actually used the word, but in a different sense. But mostly #1 and #2.
Some of these results I'm dubious of and wondering how much confidence we should have in what they actually mean. I'll say one thing though, in terms of writing etymology, this is a huge boon. Most articles on Wikipedia could use a sentence or two about when the term came into popular usage. posted by stbalbach at 2:35 PM on December 16, 2010
Some of these results I'm dubious of and wondering how much confidence we should have in what they actually mean.
Let me explain. For example see Middle Ages. You would look at that graph and think the term "Middle Ages" is going out of favor, it has a steep downward curve from 1850 to the present. But that's not true at all, more books are being published about the Middle Ages now than at any time in history (I know this to be true).. so why the downward trend? Because the chart is not showing absolute popularity, rather it is showing how many books in any given year use the term compared to all books published that year. So, as more books on all subhects are being published every year, the ocean of possible results keeps getting bigger, which skews the graph results. I think. Can anyone else explain this? Thus I think it's really only best used in comparative mode, comparing two or more words to see trends. A single word trend is misleading. posted by stbalbach at 2:48 PM on December 16, 2010 [2 favorites]
For example see Middle Ages.
It's case sensitive. "Middle Ages" has a somewhat different curve. posted by nasreddin at 2:54 PM on December 16, 2010
OK, thanks didn't know that. But still, since 1950 it's showing a steep downward trend, which seems wrong in an absolute sense. Every year more books are being published than ever about the Middle Ages. But here's something interesting, compare Medieval,Middle Ages which does show the right kind of trend for Medieval. Which begs the question, is "Medieval" replacing "Middle Ages" as the term of choice? posted by stbalbach at 3:32 PM on December 16, 2010
Some of the odd numbers for older works could be from OCR errors of long s glyphs. Ex. fuck vs suck posted by bdc34 at 3:52 PM on December 16, 2010 [2 favorites]
One thing I know for sure, Germans never forgot their first taste of kugel posted by Mchelly at 7:19 PM on December 16, 2010 [1 favorite]
I typed in "kahlua," clicked on "Search in Google Books" for 1500-1972 and from there clicked on "19th Century."
The first result was from The Complete Poetical Works of Henry Wadsworth Longfellow, Page 306:
... And the wedding guests assembled, Clad in all their richest raiment, Robes of fur and belts of wampum, .Splendid with their paint and plumage, Beautiful with beads and tassels. First they ate the sturgeon, Kahlua, And the pike, ...
Note that this tool also lets you pinpoint the origin of various phrases with disputed or apocryphal origins, such as "the whole nine yards." The result parallels the best scholarship which suggests a 1960s origin. Note that to do this, after an initial long-timeframe search, you should narrow the window and reduce the smoothing to zero, because otherwise the smoothing can produce a spuriously early starting point. Ngram seems to have a limit of five words, so you need to truncate longer phrases. For example, "it ain't over till the fat lady sings" works if you search for "till the fat lady sing.s"
See also: keeping up with the Joneses (originated in a 1913 cartoon) piss and vinegar (from Steinbeck's Grapes of Wrath) jazz The Big Apple posted by beagle at 5:45 AM on December 17, 2010
football is of course #1 because it is international soccer.
but baseball still likely reigns supreme as the most reference U.S. sport.
then again, it cuts both ways. "baseball" is used much more to connote mlb than football is to mean "nfl."
interesting to see the dip in all sports 1940-1960. i suppose there were some other important things going on at that time... and as expected, not many writers give two shits about hockey.
also, tennis is the anomaly. while all the sports mostly rise and fall in parallel, tennis bucks the trend in the '80s then gets passed by golf in the '90s.
as always, I blame Tiger Woods. posted by mrgrimm at 1:59 PM on December 17, 2010
Plato,Aristotle,Descartes,Hume,Kant,Locke,Berkeley,Spinoza,Leibniz
Hume's histories used to be his most influential books, so his early popularity may be that rather than philosophy.
"Berkeley" gets a big bump at end of 20th century, but it means the place/university now too.
Leonardo,Michelangelo,Donatello,Raphael
I am surprised to see Raphael so much more popular than the others in 1800-1900; maybe there are other figures using that name that I'm not thinking of.
Not much of a bump in the early 90s from the comic
Civil War,War Between the States,War of Northern Aggression
First of all, notice where the curve is where the war starts getting written about a lot - it takes a while;
Second, "Civil War" is dominant in their corpus right from the start; (unless there's another name I'm not thinking of)
Third, check out the little bump for "War Between the States" around the time of WWII and the beginning of integration/the civil rights movement
man,Man,God
Between 1690 and 1750 or so, "man" and "Man" change places in popularity, then they change back. (I assume at least some fraction of the "man" uses are for the same use as "Man" ie mankind) So - a typographic vogue? an artifact of the data set?
Mrs,Ms,Miss
Ms not making as much headway as I would have thought, although - why were Mrs and Miss declining so much from 1930-1965 or so? Are there just fewer books with female characters referred to by full title? posted by LobsterMitten at 12:26 AM on December 18, 2010
Michael Jackson surprisingly low
Remember, this is searching books, not magazines, not online usage.
So, for example, in google,facebook,ebay,amazon, Facebook is "surprisingly low" — the search doesn't reflect the dominance in traffic and usage that Facebook currently has, because so far, much more has been written in books about Google, Ebay and Amazon than Facebook. posted by beagle at 2:18 PM on December 18, 2010
Damnit, I was making a post about this (a little late!).
posted by theodolite at 11:36 AM on December 16, 2010 [1 favorite]