Hate Map
May 14, 2013 5:20 PM Subscribe

Researchers at Humboldt State University have mapped hateful tweets. Dr. Monica Stephens, at Humboldt State, has teamed up with undergraduate research assistants to study the geographical distribution of hate speech in tweets. The graphical map breaks down by "genre" of hate (homophobia, racism, disability) as well as by individual words flagged. Far more details are available on floatingsheep.org; the data was provided by the DOLLY Project at the University of Kentucky.
posted by obliquicity (109 comments total) 14 users marked this as a favorite

Like far too many heat map visualizations, this just tells me where the people are. Great. Not much to see here, move on.
posted by dlg at 5:25 PM on May 14, 2013 [8 favorites]

Half of the Seattle hate tweets are Dan Savage lovingly using the term fag.
posted by munchingzombie at 5:30 PM on May 14, 2013 [23 favorites]

In that case, it's awfully odd that the most populous state is so empty of people.
posted by elizardbits at 5:30 PM on May 14, 2013 [19 favorites]

So... you didn't actually bother reading enough to see that they normalized for Twitter traffic in each area?
posted by kmz at 5:31 PM on May 14, 2013 [16 favorites]

True that. I spoke too soon. Still confused why it's not normalized against total number of geotagged tweets.
posted by dlg at 5:31 PM on May 14, 2013

Their FAQ for this project gets into the spatial analysis a little more.
posted by avocet at 5:32 PM on May 14, 2013 [1 favorite]

This may be the most useful thing to ever come out of Humboldt State.

If you don't smoke weed.
posted by elsietheeel at 5:32 PM on May 14, 2013 [2 favorites]

I'm having a little trouble without the ability to zoom in further: Is that little oasis of homophobia in NM in Hatch or T-or-C? Somehow I'm envisioning it as one particularly prolific person, because I'm pretty sure there's only like six people who live between Cruces and ABQ.
posted by Sequence at 5:33 PM on May 14, 2013 [4 favorites]

Minnesota youre kind of a dick
posted by DLWM at 5:33 PM on May 14, 2013 [4 favorites]

It breaks down in an odd way if you zoom in more than twice. I'm guessing this has to do with the county-level aggregation mentioned, but it's a little hard to believe that the epicenters of tweet-traffic (hateful or not) isn't more clumpy around the major metro areas. Atlanta, for example, is blank, once you dive in once or twice.
posted by jquinby at 5:33 PM on May 14, 2013 [1 favorite]

Argh. I'm just going to lie down and shut my eyes. I can't even parse text right now, it appears.
posted by dlg at 5:33 PM on May 14, 2013

Looks like a microscope picture of bacteria.
posted by jonmc at 5:35 PM on May 14, 2013

dlg, the funny thing is that I only looked at this more closely after I made the exact same complaint on Facebook, and the OP there pointed out that I fail at reading *sigh* so at least we're in the same boat!
posted by obliquicity at 5:35 PM on May 14, 2013 [1 favorite]

3. The map is based solely on geocoded data from Twitter, and does not reflect our personal attitudes about a given place.

I can't even believe they actually have to say that. Never mind. I can.
posted by rtha at 5:35 PM on May 14, 2013 [4 favorites]

The number of tweets that are geotagged in the first place is small; who knows if that is representative of Twitter? Maybe hateful people don't change their location privacy preferences?

I don't know what this map is supposed to show at all. All of Wisconsin (which is spelled incorrectly) has 15 tweets for "fag." Apparently most of them are from Green Bay. What hypothesis are we testing? What conclusion should we draw?
posted by desjardins at 5:36 PM on May 14, 2013 [5 favorites]

desjardins is right; random Google search reveals a respondent on Quora who claims:

A random sample of 1.5 million public tweets by 10k users over 18 months revealed the following:

1.0% of tweets are geotagged in some way

87% of geotagged tweets contain exact coordinates while...
13% of geotagged tweets represent regional polygons, such as cities

95% of users never geotag a single tweet but...
~1% of users geotag the majority of the tweets they post

Very passive users (< 50 tweets per year) and very active users (> 1000 tweets per year) geotag a smaller percentage of tweets than moderate users.

The prevalence of geotagging has increased linearly from 0.6% in late 2010 to ~1.4% today.
posted by obliquicity at 5:40 PM on May 14, 2013 [2 favorites]

9. If you are a disgruntled white male who feels that the persistence of hatred towards minority groups is a license to complain about how discrimination against you is being ignored, just stop. You can refer to all of our previous commentary on this issue from November. Though we have typically refrained from deleting asinine comments to this effect - those who choose to make these comments do more to prove themselves to be fools than we ever could - we fully reserve the right to delete any and all comments we believe to be unnecessary.
posted by rtha at 5:40 PM on May 14, 2013 [2 favorites]

All the people I personally know (which is obviously not a representative sample, but hey:)

1. Only use these kinds of terms about their own groups (particularly "cripple" and the LGBT-related ones I don't have asterisked out,) and,
2. Don't geotag their tweets.

On the second one, the only time I remember geotagging a tweet myself, it was for a club that I control the Twitter account of (and the club is trying to drum up interest, etc., in a very specific region.) And it was kind of a hassle with no apparent bonus as far as followers are concerned, so I stopped doing it.

I'm also honestly fascinated that enough people in the (not-exactly-well-populated) Appalachian region geotag their tweets that you could actually include data from those areas at all.
posted by SMPA at 5:42 PM on May 14, 2013

bible belters gonna belt out the bible
posted by DU at 5:46 PM on May 14, 2013 [1 favorite]

I feel like there must be one prolific Tweeter in Hood River, OR who's examining Queer Studies or something.
posted by ErikaB at 5:46 PM on May 14, 2013 [4 favorites]

Minnesota youre kind of a dick

Yeah, that made me sad. RI does OK, except, weirdly, for "queer." WtF, RI?
posted by GenjiandProust at 5:47 PM on May 14, 2013

I live in the middle of a dark red patch. And people wonder why I haven't bothered with trying to make local friends since moving here.
posted by Jacqueline at 5:49 PM on May 14, 2013

Queer is the kind of word that LGBT people have done a pretty good job of reclaiming, though.
posted by elizardbits at 5:49 PM on May 14, 2013 [3 favorites]

Using DOLLY to search for all geotagged tweets in North America between June 2012 and April 2013, we discovered 41,306 tweets containing the word ‘nigger’, 95,123 referenced ‘homo’, among other terms

I'd like to know the total number of geotagged tweets in this time period. Did I miss it?
posted by desjardins at 5:49 PM on May 14, 2013

Are the words they show on the FAQ the only ethnic slurs they included? Kind of weird they didn't include anti-Semitism...
posted by You Guys Like 2 Party? at 5:51 PM on May 14, 2013

Here are some people talking about this map on Twitter.
posted by desjardins at 5:51 PM on May 14, 2013

1. Only use these kinds of terms about their own groups

"In order to address one of the earlier criticisms of our map of racism directed at Obama, students at Humboldt State manually read and coded the sentiment of each tweet to determine if the given word was used in a positive, negative or neutral manner. This allowed us to avoid using any algorithmic sentiment analysis or natural language processing, as many algorithms would have simply classified a tweet as ‘negative’ when the word was used in a neutral or positive way. For example the phrase ‘dyke’, while often negative when referring to an individual person, was also used in positive ways (e.g. “dykes on bikes #SFPride”). The students were able to discern which were negative, neutral, or positive. Only those tweets used in an explicitly negative way are included in the map."
posted by rtha at 5:53 PM on May 14, 2013 [17 favorites]

rtha, I was just going to post that....
posted by GenjiandProust at 5:54 PM on May 14, 2013

Minnesota youre kind of a dick

The governor of Minnesota signed marriage equality into law today. Even in Minnesota, the idea of gay marriage brings the worst out of some people.
posted by bfister at 5:56 PM on May 14, 2013 [1 favorite]

Apparently excess humidity makes you homophobic.
posted by jason_steakums at 5:57 PM on May 14, 2013 [5 favorites]

Yeah, that made me sad. RI does OK, except, weirdly, for "queer." WtF, RI

Isn't that, like, a big-ass clam or something?

I'm also wondering what kind of immigration or cultural shift is going on in Minneapolis to make 'chink' a thing.
posted by mudpuppie at 6:03 PM on May 14, 2013

Minnesota, specifically Minneapolis/St. Paul has a pretty big Hmong population.
posted by jason_steakums at 6:05 PM on May 14, 2013 [1 favorite]

I love me some maps, but I think it's irresponsible to put out a visualization on a charged topic without having some sort of narrative beyond "this is what we did to make the map."
posted by desjardins at 6:06 PM on May 14, 2013 [1 favorite]

I'd really like to see a heat map for all the tweets they collected -- is California low because it's an oasis of wonderful tweeterers, or is it low because people in California rarely geotag their tweets? Or vice versa for the east? It's normalised to total geotagged volume, but if the volume is low I'd be a lot warier than if geotag usage was somewhat balanced.

I don't see this answered anywhere on their site, though possibly I missed it.
posted by jeather at 6:19 PM on May 14, 2013

elsietheeel: "This may be the most useful thing to ever come out of Humboldt State.

If you don't smoke weed."

tweet this so you can be counted
posted by boo_radley at 6:24 PM on May 14, 2013

Ummm, there's a problem here; they claim that these are tweets referring negatively to some word or another, but I think they lack any context. For example, the two biggest hotspots in the nation for homophobia include northwestern Minnesota and Iowa/Illinois.

Except that the biggest hit for either is on the word "dyke." There are two red spots in the entire country for those terms, and they're on the Red River in the Fargo/Moorehead area--an area that gets flooded out regularly--and on the Mississippi between Iowa and Illinois.

So people in these areas can't spell "dike", but that's all we learned today, kids.
posted by Ickster at 6:28 PM on May 14, 2013 [11 favorites]

So people in these areas can't spell "dike", but that's all we learned today, kids.

According to a quick search, nope, they appear to mostly mean the pejorative for lesbian (or Dick Van Dyke).
posted by en forme de poire at 6:40 PM on May 14, 2013 [3 favorites]

I'm tying to figure out why the epicenter of online hate in New England is the Bangor area. Without doing any research, I'm going to speculate that the red spot is Orono and some team in Hockey East besides U. Maine has a really good black player.

That's actually a pretty good theory- I have relatives in Bangor, so I was just going to say "Hellmouth?".

I'm also wondering what kind of immigration or cultural shift is going on in Minneapolis to make 'chink' a thing.

Sadly, that's probably in reference to the Hmong and other SE Asian people who've been there since the '70s.
posted by TheWhiteSkull at 6:43 PM on May 14, 2013

Utah: America's least hateful state?
posted by Apocryphon at 6:43 PM on May 14, 2013

Dang it, I can't get the mouse-hover functionality to work.

(By the way, a lot of the methods questions raised here are answered if you click on the "Details about this map" link in the lower right-hand corner.)

Each of the tweets was manually reviewed for negative, positive and neutral context for each word, so the dyke/dike thing wouldn't be a problem.

The reason for the displacement off major population hubs is that the mappers used county centroids - so within the boundaries of each county the hot spot is placed over the point that splits the population density of the county down the middle. That means if a county line runs through a big city, only the part of the population of that city that falls in the county will be included in the calculation of the centroid, and if there are smaller regions of population density in the same county they will pull the centroid towards them, away from the city. So don't be too literal about whether the point falls over Orono or San Bernadino or whatever.
posted by gingerest at 6:46 PM on May 14, 2013 [3 favorites]

Utah: America's least hateful state?

Probably just its most polite.
posted by en forme de poire at 6:47 PM on May 14, 2013 [1 favorite]

Yeah, the most racist/homophobic guy I know here in California is in the Sacramento area, so this is all totally valid. Plus HSU is my alma mater, so yeah. Go, Lumberjacks!
posted by Huck500 at 6:51 PM on May 14, 2013 [3 favorites]

good on (most of) socal, but this isn't surprising:

Orange County, California has the highest absolute number of tweets mentioning many of the slurs, but because of its significant overall Twitter activity, such hateful tweets are less prominent and therefore do not appear as prominently on our map.
posted by changeling at 6:54 PM on May 14, 2013

Each of the tweets was manually reviewed for negative, positive and neutral context for each word, so the dyke/dike thing wouldn't be a problem.

This also solves the 'queer studies'/'dykes on bikes' problem.
posted by Myca at 6:56 PM on May 14, 2013 [3 favorites]

What's with the big blob of 'cripple' in east nowhere Montana?
And why is cripple the only disability they analyzed for? Is 'gimp' not a choice?
posted by Confess, Fletch at 7:06 PM on May 14, 2013

95,123 referenced ‘homo’
Wonder how many were from palaeontologists?
posted by dg at 7:07 PM on May 14, 2013

this irrelevant study is getting way too much attention.
posted by superuser at 7:07 PM on May 14, 2013 [2 favorites]

95,123 referenced ‘homo’
Wonder how many were from palaeontologists?

"In order to address one of the earlier criticisms of our map of racism directed at Obama, students at Humboldt State manually read and coded the sentiment of each tweet to determine if the given word was used in a positive, negative or neutral manner. This allowed us to avoid using any algorithmic sentiment analysis or natural language processing, as many algorithms would have simply classified a tweet as ‘negative’ when the word was used in a neutral or positive way. For example the phrase ‘dyke’, while often negative when referring to an individual person, was also used in positive ways (e.g. “dykes on bikes #SFPride”). The students were able to discern which were negative, neutral, or positive. Only those tweets used in an explicitly negative way are included in the map."
posted by rtha at 7:13 PM on May 14, 2013 [8 favorites]

tweet this so you can be counted

I'm just bitter because I didn't get to go to HSU like I wanted.
posted by elsietheeel at 7:15 PM on May 14, 2013 [2 favorites]

If you are a disgruntled white male who feels that the persistence of hatred towards minority groups is a license to complain about how discrimination against you is being ignored, just stop.

and I pump my fists in the air
posted by Rustic Etruscan at 7:19 PM on May 14, 2013

In response to a comment on the post, Dr. Monica Stephens wrote, "Gimp was also considered but most of the tweets referred to the software. There were very few (less than 100) used as a derogatory term towards disabled persons."

(I wish they'd put it in the FAQ because looking at the comments makes me need to shower forever.)

The FAQ addresses some of the questions about what words were used.
posted by gingerest at 7:20 PM on May 14, 2013 [1 favorite]

I think maybe this says more about regional tolerance for geotagging than about hate.
posted by mullacc at 7:41 PM on May 14, 2013 [2 favorites]

When I went to HSU, it was the only place that would take me. :-)
posted by humboldt32 at 7:48 PM on May 14, 2013 [2 favorites]

I'm also wondering what kind of immigration or cultural shift is going on in Minneapolis to make 'chink' a thing.

Civic pride compels me to point out that none of the red you see in Minnesota is in the Twin Cities whatsoever- if you zoom in you see it's all in small towns like Hutchinson and Detroit Lakes and Cloquet- the combined population of which is around 50K- this is a grossly misleading mapping.
posted by Esteemed Offendi at 7:48 PM on May 14, 2013 [2 favorites]

(I wish they'd put it in the FAQ because looking at the comments makes me need to shower forever.)

"Why have you omitted derogatory words directed against Whites such as: honkey, peckerwood, white trash, white boy and cracker?"

(facepalm)
posted by avocet at 7:59 PM on May 14, 2013 [3 favorites]

The heatmap blur radius (kernel density estimation bandwidth) seems to be constant in screen coordinates rather than in real-world coordinates, which means that if you zoom out all the way the entire eastern half of the US turns into a sea of red, while if you zoom in all the way there are just these sparse little islands of red.

This issue makes it really easy to misread the map at *all* levels except the closest (and even there I'm not sure I trust it). So we're getting reading mistakes like the idea that I-80 from Sac to Reno is a hotbed of hate.

But if you zoom in, SacTown doesn't display as any more of a hot spot than West LA. Weirdly, it's *Lake Tahoe* that's the hottest (but still not so hot) spot in 100 miles, with a secondary spot that looks like it's on Highway 49 about where Jackson is, and then little glows near Yuba City, Woodland, a few other towns in the Sierras.
posted by weston at 8:00 PM on May 14, 2013 [1 favorite]

And, once again, my home city, San Diego, falls off the map. (Yay!)
posted by SPrintF at 8:06 PM on May 14, 2013

the combined population of which is around 50K- this is a grossly misleading mapping.

4. In order to produce this map, we took the number of geotagged hateful tweets, aggregated them to the county level and then normalized this count by the overall number of tweets in that county. This means that the spatial distributions you see for the different variables are decidedly NOT showing population density. As we mentioned above, this is clearly stated in all of the previously written material accompanying the map. And because we are specifically looking at the geographic patterns of Twitter activity, it makes more sense to normalize by overall levels of Twitter activity than by population.
posted by rtha at 8:07 PM on May 14, 2013 [6 favorites]

Southern Illinois is a big red blob of hate. Yup, checks out.
posted by deathpanels at 8:17 PM on May 14, 2013 [1 favorite]

To produce the map all tweets containing each 'hate word' were aggregated to the county level and normalized by the total twitter traffic in each county. Counties were reduced to their centroids and assigned a weight derived from this normalization process.

I understand the idea behind this, but it makes the map almost useless in the western part of the country.
My county is about the size of Connecticut, so whereas CT gets multiple heat dots, we get one centered in the middle, which tells me almost nothing.

County level is too coarse a mapping on anything but a zoomed out national level.

I don't envy the poor students that had to read all those tweets, though.
posted by madajb at 8:18 PM on May 14, 2013

Each of the tweets was manually reviewed for negative, positive and neutral context for each word, so the dyke/dike thing wouldn't be a problem.

Has anyone found anything on the site about the review process other than "tweets were manually reviewed by students"? I can't seem to find anything, and I have some questions about the methodology:

Were the tweets reviewed by multiple people, or just one person each?
What were the instructions to the reviewers? Is it possible the instructions biased the results?
some words no longer achieved a minimum number to be displayed on the map. What was the minimum and how was that minimum determined?
Also, I'd like to see the authors address the huge geographic variance that occurs around the middle of the country. Were the reviewers assigned tweets at random, or were they assigned tweets in batches by geography?
Were the student reviewers aware of the geography they were reviewing, or was the reviewing blind?

Apologies if these were answered somewhere on the site. I couldn't find it.
posted by Guernsey Halleck at 8:39 PM on May 14, 2013 [1 favorite]

"Queer" is consistently homophobic?! Huh? Maybe if you were doing a study 50 years ago!

This shows how likely one is to go wrong in trying to be the "hate" police. Once you decide to go to war with an abstraction like "hate," nuance and context are going to be the first casualties.
posted by John Cohen at 8:39 PM on May 14, 2013

Human beings are reading the tweets to see if they're actually negative, guys.
posted by Rustic Etruscan at 8:46 PM on May 14, 2013 [7 favorites]

RTHA...RTFA...Duuuude
posted by lordaych at 9:01 PM on May 14, 2013 [1 favorite]

/snerk

It started feeling a little axe-grindy so, eh. If people don't want to click a link and read the text, there's only so much boulder-pushing I want to do, I guess.
posted by rtha at 9:03 PM on May 14, 2013 [4 favorites]

It looks like they don't really have enough data yet to judge. For example, "dyke" appears almost totally in Iowa and Minnesota, and unless those two states are hardcore anti-lesbian and I just never heard about it before, I kinda doubt that's meaningful. Plus, as mentioned, "queer" is not even a slur anymore, I say things like "queer-positive" all the time. And what about 4chan style "ironic" racism, which is shitty but not quite as bad as actual racism?
posted by DecemberBoy at 9:46 PM on May 14, 2013

DecemberBoy:

> "In order to address one of the earlier criticisms of our map of racism directed at Obama, students at Humboldt State manually read and coded the sentiment of each tweet to determine if the given word was used in a positive, negative or neutral manner. This allowed us to avoid using any algorithmic sentiment analysis or natural language processing, as many algorithms would have simply classified a tweet as ‘negative’ when the word was used in a neutral or positive way. For example the phrase ‘dyke’, while often negative when referring to an individual person, was also used in positive ways (e.g. “dykes on bikes #SFPride”). The students were able to discern which were negative, neutral, or positive. Only those tweets used in an explicitly negative way are included in the map."
posted by desuetude at 9:58 PM on May 14, 2013 [7 favorites]

Who's next?
posted by Miko at 10:08 PM on May 14, 2013 [2 favorites]

OK, scratch that bit then. It still seems odd that "dyke" only appears in two seemingly random states, and apparently "wetback" was only found in central and southeast Texas and a little in Georgia. Every border state should be rife with that one.
posted by DecemberBoy at 10:08 PM on May 14, 2013

And incidentally I wondered about "queer" because it's the only homophobic slur that appears in my hometown of Austin, which is a very non-homophobic city.
posted by DecemberBoy at 10:09 PM on May 14, 2013

Taking a 22,000 mile step back .... what the hell east half of the country?
posted by Tell Me No Lies at 10:20 PM on May 14, 2013

(and I say that recognizing that the underlying numbers are just tiny. But statistically it's really weird that they cluster over your way.)
posted by Tell Me No Lies at 10:22 PM on May 14, 2013

@Tell Me No Lies

that's where new york is
posted by This, of course, alludes to you at 10:25 PM on May 14, 2013

Besides the fact that I just plain don’t believe their data, or agree with their methods, I don’t see the point. Otherwise, good job!

Unless the people who rated the "hate" were not privy to the location there will be inherent bias. If students were rating the "hate", what oversight was there? That’s a lot of tweets, 150,000, were people serious about rating or just getting through them? Were the raters paid? How was the data normalized?

They got on a lot of web sites though.
posted by bongo_x at 10:25 PM on May 14, 2013 [1 favorite]

I think the answers are in the links, bongo_x.
posted by Miko at 10:31 PM on May 14, 2013 [1 favorite]

rtha: "95,123 referenced ‘homo’
Wonder how many were from palaeontologists?

"In order to address one of the earlier criticisms of our map of racism directed at Obama, students at Humboldt State manually read and coded the sentiment of each tweet ... Only those tweets used in an explicitly negative way are included in the map.""

Sorry - poor attempt at a joke :-(
posted by dg at 10:33 PM on May 14, 2013 [1 favorite]

I think the answers are in the links

Where? I'm not trying to be argumentative. But, I can't find a single mention of methodology other than "students reviewed the tweets."
posted by Guernsey Halleck at 10:34 PM on May 14, 2013

I can't find a single mention of methodology other than "students reviewed the tweets."

Me neither, the actual information is either slim or well hidden.
posted by bongo_x at 10:37 PM on May 14, 2013

On the other hand, we learn that Mississippi is one of the least racist places in the country.
posted by bongo_x at 11:17 PM on May 14, 2013

Apparently excess humidity makes you homophobic.

It's not the hate, it's the humidity.
posted by three blind mice at 11:22 PM on May 14, 2013 [6 favorites]

I looked over this a bit : they had 150,000 geo-coded tweets, which was a conversion rate of 1-4% of all tweets they parsed.

There are 3100 counties.

Which comes out to an average of 60 geo-coded tweets per county. Of course this isn't the case, more populated areas tweet more, so they normalize over all tweets.

Further, I read this tweet as some areas having 1% of all tweets geo-coded, and others 4% geo-coded; I feel this is a pretty skewed range, considering that we have no idea whether people in cities may be more likely to geo-code themselves, or vice-versa. It would be nice to see a filled map of response rates, or something to show us that similar rates of geo-coded tweets are going out across the country. Again this MAY not tell us anything. This again assumes that people are acting the same (no social pressures, self-monitoring etc) across the country.

It's interesting as an experiment but it doesn't really pass any sort of... scientific muster, of course the PI wouldn't say this... Also the density heat-mapping mapping they do at the national scale really is REALLY misleading, but once you zoom regions seem to normalize. A chloropleth map would be really be better, but GOOGLE MAPS.

More info on their methods at this FAQ I don't think they address the main concerns.
posted by stratastar at 11:26 PM on May 14, 2013

Some posters have been saying that "the tweets were manually reviewed by humans" isn't enough detail about the methodology. This confuses me. To me, it seems like this task is really easy. You really think there are lots of tweets where it's like "Wow, I can't tell if this use of the word 'wetback' is intended to hateful"? I'm just curious: when you bring up this doubt, what kind of tweet are you imagining?
posted by a birds at 11:37 PM on May 14, 2013 [1 favorite]

From what I've read on the website, they appear to plan on publishing this work formally in peer-reviewed journals. At that point, the methodology will have to be described in detail.
posted by gingerest at 11:58 PM on May 14, 2013

How about the infamous n-word? Its use can range anywhere from vile racism to colloquialism to song lyrics. If you see a tweet that says "Thinkin every nigga is sellin narcotics" and you don't know it's from an NWA song, could you maybe think it's racist? Especially if you've been told to specifically look for racist tweets? Is it totally impossible for you to come up with a solitary sentence that could be considered racist if taken out of context?

This is why I want to see the methodology. I think there are plenty of instances where a statement, taken completely without context, is not clearly racist. Right now we don't know how the study dealt with borderline cases. Were they just one person's opinion, or did multiple people review the same tweets? If it was multiple reviewers, how were different interpretations reconciled?
posted by Guernsey Halleck at 12:11 AM on May 15, 2013

I understand that they are normalizing the data. I don't think there's any reason to extrapolate about a large and diverse geographic region because of 19 geo-located users in Nowheresville County, population 30. Insufficient data.
posted by Esteemed Offendi at 12:13 AM on May 15, 2013 [1 favorite]

Do you have a better example, Guernsey? That snippet wouldn't have been included in the study.

"A further point is that the term ‘n----r’ is almost universally associated with negative, derogatory intent, as opposed to the more colloquialized (and appropriated by the black community) ‘n---a’, which a quick inspection of the data shows is used more positively. References to ‘n---a’ were not included in the study."

The idea that a clip of NWA lyrics a) would actually fool grad students looking specifically to distinguish hate speech from stuff like song lyrics and b) would do so often enough to call the whole project into question... you honestly, truly believe that? *shrug*
posted by a birds at 12:52 AM on May 15, 2013 [1 favorite]

And why is cripple the only disability they analyzed for? Is 'gimp' not a choice?

someone asked this in the comments on the blog. The response was
Gimp was also considered but most of the tweets referred to the software. There were very few (less than 100) used as a derogatory term towards disabled persons.
posted by jacalata at 12:59 AM on May 15, 2013 [1 favorite]

Apparently twitter bigotry cannot exist in Mountain Standard Time, except in southern Idaho.
posted by Ommcc at 1:03 AM on May 15, 2013

(or Dick Van Dyke)

Or as he was known before his Hollywood name change, Penis Van Lesbian.

I have been waiting FOURTEEN years to make that joke, please excuse me.
posted by Purposeful Grimace at 1:10 AM on May 15, 2013 [3 favorites]

The idea that a clip of NWA lyrics a) would actually fool grad students looking specifically to distinguish hate speech from stuff like song lyrics and b) would do so often enough to call the whole project into question...

That's my whole point. You don't know that and neither do I, because we do not know what the students were told to look for. Unless these students have a magic racism detector, there are always going to be borderline cases. As I said previously, we don't know what the guidelines are, we don't know how many people reviewed each tweet, and we don't know who made the final call if there was a disagreement between reviewers.

As for another example, how about this:
"F*** you, you f***ing homo."
Is that a slur, or is that a bro-ski joking with his friend? I have heard that exact phrase used by people in the latter context. Would you flag that if you were reviewing the tweets? I probably would.

As someone mentioned earlier, the data averages out to about 60 flagged tweets per county. It only takes a handful of borderline cases to skew the results.
posted by Guernsey Halleck at 1:42 AM on May 15, 2013 [1 favorite]

And to head off the inevitable response that any use of the word "homo" is automatically homophobic, yes it is. I agree 100%. But do the reviewers agree? What were the criteria they were told to use? Did anyone check? We have not been given that information. That is my main issue with this study. When you base a scientific study on someone or a few someones' judgement calls, your methods need to be transparent.

I think this is a poorly designed, opaque study that proves little more than the fact that there are some Americans who post things on twitter that are offensive, and feel the need to geotag those posts. Maybe it's because I've dealt with way too many bad statistics and tenuous inferences in my work, but I think this study should be held to a higher standard than "I know it when I see it" analysis.

I've taken up enough of this thread already. I'm signing out - good night all.
posted by Guernsey Halleck at 2:08 AM on May 15, 2013

There is zero chance that anyone, ever, sees "Fuck you, you fucking homo" and can't tell whether "homo" is being used as a pejorative. Good night.
posted by a birds at 2:09 AM on May 15, 2013 [2 favorites]

"4. In order to produce this map, we took the number of geotagged hateful tweets, aggregated them to the county level and then normalized this count by the overall number of tweets in that county. This means that the spatial distributions you see for the different variables are decidedly NOT showing population density. As we mentioned above, this is clearly stated in all of the previously written material accompanying the map. And because we are specifically looking at the geographic patterns of Twitter activity, it makes more sense to normalize by overall levels of Twitter activity than by population."

While a lot of people are indeed attacking their study for problems it does not have, this statement is indefensibly wrong. Population density, by county if not square mile, is clearly having a massive effect on their data in a way that dramatically limits the kinds of questions it can shed light on. In the United States there is a very wide distribution in the population of counties, for example Los Angeles County at 9,818,605 has more people than any of the 41 least populous states in the Union whereas Loving County, Texas and Kalawao County, Hawaii have 82 and 90 people respectively, and it looks to me like most of what we are seeing is just normal statistical variance in low sample sizes rather than anything real. For example if Loving Texas were to appear to live up to its name and have zero hateful tweets while its neighbor Winkler County, at 7,110 people, were to have one group of racist friends get into shit fest that produced about 100 hateful tweets then any hate in Loving would be invisible while Winkler would appear as hateful as a Los Angeles County as many bigoted tweets as their total sample size. Similarly, if a single person in Loving Texas were to fail to live up to the name and make a single bigoted tweet then that would appear as concentratedly hateful on the map as 120,000 bigoted tweets in LA. Indeed, Winkler county does appear to really hate lesbians but that is likely the result of a single tweet.

150,000 is an impressive total sample size, even when divided ten ways into the various slurs, but it is still insufficient to the answer the kind of question their map is purporting asking of the data. There are clever statistical solutions for this kind of problem, for example grouping together less populous counties into single super counties by population or say states that would be at least within a few orders of magnitude of each other, but even zooming out they appear to fuzz the data by geographical area rather than population.
posted by Blasdelb at 2:13 AM on May 15, 2013 [9 favorites]

Ah crap. Underreportation and borrowed themes.

Look, back in the day when wilderness was an issue, HSU students comprised the bulk of the trail crews. Hard working young men and women, they were. I loved to haul their supplies for them.

Memories for my dotage: Two crews at Goodale Pass, their summer's task that year was to make a trail in living granite, hard labor at 11,000 feet. So come I that evening, headed south over the pass, from a trip out to Iva Bell to pick up trash and then lead my seven mules to this camp to haul out their gear, for they are closing down for the season.

The storm cloud was 500 feet below me, thick like ocean surf. I led the string down into the storm and found their two squad tents poking up out of knee-deep fresh snow, and the blizzard wind pushing their fires in random directions. I unpack the mules and loose them all to find shelter in the trees. Spent the night with the crews, at 10,000 feet, ate supper outside, standing in the snowstorm--actual whiteout, sound-eating flakes the size of my thumb, thick as fog--standing between the two large fires, shift-footed on tingling feet, basting, while the wind roared down upon us from across the Silver Divide. We munch down on venison (supplied by me) and chili (supplied by them), and then finish off the bottle of Jack and last of the weed. Wind, stories, until late in the evening, then white noise and thumps on the tumpline, shelves of snow slide softly off the tent and pile up around the walls, a stabilizing windbreak.

Next day is two hours downhill until the snow pack lessens, fresh snow, but the mare knows how to plow, and it's downhill. The trail is lost in the drift, but the mare knows the way. The crews, carrying only daypacks, left ahead of me. They can travel much faster than a string of mules, and probably farther, if it came down to it.

Most of these guys were on contract extensions, staying in the high country until the November storms drive them down into relative civilization at the Bosillo Creek Ranger Station, above Mono Hot Springs.

theories of hate, read from afar, a practical magic for those who see.

Go HSU!
posted by mule98J at 8:47 AM on May 15, 2013

I'd be interested to see if they're using any statistical enrichment analysis (Fisher test, e.g.) to identify the redder regions. Combined with a sensible false-discovery rate correction that should solve the problems of varying county sample size because at the same threshold, small counties would need a higher proportion of hateful tweets than larger counties (where the sampling is more extensive and thus more confident).
posted by en forme de poire at 9:38 AM on May 15, 2013

This is science like mood rings are science. It’s not just a matter of them doing it wrong, the whole idea is fundamentally flawed (and they’re doing it wrong).

Even if they were doing the rest of it right, the whole thing hinges on a group of students making judgements on snippets, with absolutely no context, by people they know nothing about, hundreds of miles away in places they know nothing about. It’s bias all the way down.

"Going to talk to that nigger about my car"
"What are all the fags doing tonight?"
"I feel like a cripple"

Is the first one a 20 year old black man shopping for a new stereo, a 75 white woman visiting her mechanic, or a 30 year old Hispanic man about to commit assault? There is simply no way to know what these things mean, or where the line between poor taste and hate is drawn. But "I fucking hate black people" wouldn’t even show up here, nor would many common slurs.

How about the second one, did they say "it’s from San Francisco, it’s probably not hate speech"? If they saw the location of the tweets when they rated them there is built in bias, and it’s completely un scientific. How many people had to judge a single tweet, just one with poor judgement? Or were they random students being paid and really didn’t give a shit?

Calling this questionable is being very generous.
posted by bongo_x at 9:40 AM on May 15, 2013 [3 favorites]

While they normalized the number of tweets per county, they did not correct for county density

Thank you Bahro, for posting the first non-presumptive issue with this study and perhaps the first one in the history of Metafilter study threads.
posted by Tell Me No Lies at 12:03 PM on May 15, 2013 [2 favorites]

Normalizing by the number of (geo-coded) tweets per county is more precise than normalizing by county population density. When you're presenting a proportion to describe behavior, you want the denominator to represent the specific population from which the numerator is drawn. The denominator for population density is everyone in the county, and the denominator for geocoded tweet density is people who use Twitter. The numerator is drawn only from the group of people who use twitter; people who don't have phones or computers, babies, and people who loathe social media of all types exist in significant numbers in many counties, but cannot contribute to the numerator. (I am an epidemiologist. My job basically boils down to counting people and sticking them into numerators and denominators. So I have feelings about this.)

About the NWA lyric - the list of terms searched includes "n---er" but not "n---a", and in the FAQ for the Obama hate tweet map this group did in November 2012, the authors state

A further point is that the term ‘n----r’ is almost universally associated with negative, derogatory intent, as opposed to the more colloquialized (and appropriated by the black community) ‘n---a’, which a quick inspection of the data shows is used more positively. References to ‘n---a’ were not included in the study.

posted by gingerest at 4:13 PM on May 15, 2013

Gingerest, I agree that you want to normalize by the number of tweets to do the analysis (though I think you might need some notion of a significant change as Blasdelb pointed out), but specifically for visualization I think Bahro's point still holds. The visualization appears to be aggregating by geographic area, so as you zoom out the counties blend together. But since counties are closer together in the East, a given square area on the map contains more individual counties, and will thus look brighter even if the county "hotspots" are distributed without any real geographical pattern. So binning the entire US into tiles of a fixed size might have given a clearer picture. I might be missing something - will have to see what the methods say.
posted by en forme de poire at 7:22 PM on May 15, 2013

I see what you're saying. For me, the map would be more informative and useful with county lines instead of roads (and a legend - I am old and cranky and believe if you use a visual symbol you should provide a key).
posted by gingerest at 7:43 PM on May 15, 2013

Absolutely. Actually, they could have just colored the counties like so and that would have solved the viz problem.
posted by en forme de poire at 10:45 PM on May 15, 2013 [1 favorite]

As others have said-- something is very odd with that map. Look at New England. If you stay zoomed out it is a blob of red-hot hate. Zoom in and there is almost no hate to be found except a hot blob of blue somewhere in the town of Palmyra, Maine--population 1,953--known as "Maine's friendly town" and some blue at the towns where Dartmouth (NH) and Middlebury (VT) Colleges are. Zoom in on the Northwest and our gentle people of the trees look even worse, with a red blob of hate pulsating (unlike anything in New England or even hateful New York) apparently on the river near Memaloose State Park--population 0-- and some blue blobs in Aberdeen and someplace between Mcleary and Shelton. In short, let's stay together for the good of the union and to keep the people in and around Woodville, GA (population 320 and the red blob of n-word use in Georgia) from going hate China Syndrome.
posted by Cassford at 9:55 PM on May 16, 2013

Guernsey Halleck: "That's my whole point. You don't know that and neither do I, because we do not know what the students were told to look for. Unless these students have a magic racism detector, there are always going to be borderline cases. As I said previously, we don't know what the guidelines are, we don't know how many people reviewed each tweet, and we don't know who made the final call if there was a disagreement between reviewers. "

On top of that THEY WERE PAID FOR POSITIVE RESPONSES. AHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH.
posted by stratastar at 10:05 PM on May 16, 2013

In point 9 of the FAQ, the authors say that the students were paid "roughly $10 per 1000 coded tweets". Not per positive response, per response manually coded positive, negative or neutral based on a predefined rubric (according to the blurb on the map itself.)
posted by gingerest at 10:09 PM on May 16, 2013

Ah, mis-read on my part, sorry.
posted by stratastar at 10:11 PM on May 16, 2013

Cassford, the reason that the blobs don't fall directly over the major population centres is that they are attributed to the county population centroid. There are several ways to calculate the population centroid, and I don't know which one these authors used, but the easiest for me to explain is the latitude-longitude approach. Find the latitude and longitude that come closest to splitting the county into two equally sized populations. The point where the two lines meet is the centroid. If the county line runs through a major population centre (as is the case in many large cities), only the part of the population that lives in the county contributes to that county's population centroid, so the centroid will be offset towards that city but not sit over it. Likewise, if there's two or more highly dense areas in a county, the centroid will fall somewhere in between them, not directly over any of them.
posted by gingerest at 10:17 PM on May 16, 2013

Mapping the global Twitter heartbeat: The geography of Twitter

More than 3% of all tweets are found to have native location information available, while a naive geocoder based on a simple major cities gazetteer and relying on the user-provided Location and Profile fields is able to geolocate more than a third of all tweets with high accuracy when measured against the GPS-based baseline. Geographic proximity is found to play a minimal role both in who users communicate with and what they communicate about, providing evidence that social media is shifting the communicative landscape.

posted by the man of twists and turns at 3:22 PM on May 17, 2013

« Older Actual game disc sold separately! | Fitch the Homeless Newer »

This thread has been archived and is closed to new comments

MetaFilter

Hate Map
May 14, 2013 5:20 PM Subscribe

Tags

Share

Hate Map May 14, 2013 5:20 PM Subscribe

Tags

Share

Hate Map
May 14, 2013 5:20 PM Subscribe