My Word
January 7, 2012 8:40 AM   Subscribe

The Corpus of American Historical English is a searchable index of word usage in American printed material from 1810 to 2009. Powerful complex searches allow you to trace the appearance and evolution of words and phrases and even specific grammatical constructions, see trends in frequency, and plenty more. Start with the 5-Minute Tour.
posted by Miko (23 comments total) 56 users marked this as a favorite
What does it mean that the first word I looked up was "poop"?
posted by knile at 8:44 AM on January 7, 2012

I'm getting an error on that second link.
posted by jquinby at 8:59 AM on January 7, 2012 [1 favorite]

We may find this to be the answer to the frequent Meta discussions regarding words/phrases that folks object to as being some sort of -ist. Looking up some of the more recent problematic phrases/words is interesting. If nothing else, we'll have a common understanding as to where a word started and how its use has evolved.
posted by HuronBob at 9:01 AM on January 7, 2012

Fantastic resource. Thanks, Miko!
posted by rory at 9:02 AM on January 7, 2012

It's helpful, HuronBob. I also find the restricted date search on Google Books really helpful for that too.

jquinby, it's working for me. Not sure what the problem may be?
posted by Miko at 9:03 AM on January 7, 2012

Second link: The page cannot be found
posted by Skygazer at 9:08 AM on January 7, 2012

IOW: The second link needs to be accessed through the first link. It's at the bottom of the lower frame.
posted by Skygazer at 9:10 AM on January 7, 2012

Ah, I see. Thanks Skygazer.
posted by Miko at 9:15 AM on January 7, 2012

So, first use of the word "Internet" was in the 1840s, but there's no evidence of it in the 1980s?

posted by Devonian at 9:20 AM on January 7, 2012

After messing around a bit, I think the Corpus is fairly unehlpful for recent slang that has been around for a long while ("MILF" yields an improbably solitary citation, for instance), but if you think politically (or feed the engine some derogatory terms), there are some interesting results. One expects the use of "socialist" during the 1930s, when the Nazis came to power. But it's fascinating that the 1980s is the second most popular decade, when "socialist" was used under the Reagan era to refer to anything even remotely liberal. Also fascinating: "groovy" was cited in 1947 in Time. Years before "Orwellian" was first used (purportedly by this Corpus) in 1952. It's a fairly decent metric if you're too lazy to consult etymological texts, but hopefully they'll work out the kinks for greater precision over time.
posted by ed at 9:21 AM on January 7, 2012

I think the main problem is that even 400 million tezts is a little bit of a small sample when you compare it to all print material that has existed (every edition of every newspaper and magazine everywhere, every book, every piece of sheet music, etc.) But it's interesting and a good start. If you look at the section where this tool is compared to Google Books and some other corpora, it gives some sense of what this is good at and what the others are good at.
posted by Miko at 9:25 AM on January 7, 2012

er, make that 400 million words of text
posted by Miko at 9:25 AM on January 7, 2012

IOW: The second link needs to be accessed through the first link. It's at the bottom of the lower frame.

Or check that. As long as you first access the first link, you can then, access the second link from the FPP.

I'm not sure how that works. Some sort of ASP or frames or cookie (?) thinga-ma-jiggy. Some coder will be along now any minute to explain it, I'm sure and there's probably also a workaround, but again, I'll leave it to the expert(s).

posted by Skygazer at 9:28 AM on January 7, 2012

I think that this is very interesting but I don't know that I'd rely on this corpus for tracking word popularity in the 1800's. For example I did two searches, the first for shoe, the second for sandal. There's an odd spike in the usage of the word shoe in the 1830's. It appears to be because of the inclusion of a single book, "Horse Shoe Robinson: A Tale of the Tory Ascendency", in the input data. "The Squire of Sandal-Side A Pastoral Romance" cause a similar spike in the usage of sandal in the 1880's. If I'm reading their site correctly the numbers are in usages per million words but I'm guessing that there aren't enough 19th century sources to prevent a single source from distorting the results. I'm sure that this corpus is meant to be used for answering questions that are much more sophisticated than asking how the frequency of using an individual word has changed over the last two centuries but that's what I naively jumped to.
posted by rdr at 9:42 AM on January 7, 2012

An interesting resource, thanks.

I thought I would see if words that we now consider to be from other dialects used to have more currency. I chose "arse" as a word typically associated with Commonwealth English, but not US English. However, I fear that there may be some mistakes in the transcriptions, which I'm sure are OCR:

...tends to prevent arse surplus stocks...

Axe seems to be almost as popular as ax over the entire time, although it is consistently less popular by the mid 1900s. Many of the early occurrences of ax seem to be representations of "black" speech.

Wank seems only to have appeared since the 1980s, sadly.
posted by Jehan at 9:56 AM on January 7, 2012

The evolution of the meaning and the frequency of use of the word "pussy" is quite interesting, turns out.
posted by psylosyren at 10:33 AM on January 7, 2012

This has been a subject of interest ever since I stumbled onto "Made in America" by Bill Bryson. Now its all online - thanks, Miko!
posted by infini at 11:21 AM on January 7, 2012

If I had this growing up, I would have had all A's in school.
posted by Melismata at 11:57 AM on January 7, 2012

This is really cool, and when complemented with google ngram viewer, etymology dictionaries, the OED . . . I'm getting a little flushed, aren't I? Goodness, please excuse me. *dabs at brow*
posted by exlotuseater at 1:08 PM on January 7, 2012

Why the hell did " fart" hit such a peak in the 70s and then settle into a low plateau?
posted by The Whelk at 1:23 PM on January 7, 2012

We were not counting beans on our plates back then.
posted by infini at 1:50 PM on January 7, 2012

All I know is that, "broke ass", is from 1971 while "broke-ass" got me a hit in 2002 & 2003.
posted by jadepearl at 1:56 PM on January 7, 2012

Why the hell did " fart" hit such a peak in the 70s and then settle into a low plateau?

He rode a blazing saddle, he wore a shining star...
posted by ROU_Xenophobe at 2:26 PM on January 7, 2012 [4 favorites]

« Older I can't use these things together   |   A Not-Sober Lullaby Newer »

This thread has been archived and is closed to new comments