Metrics for Community Toxicity
February 23, 2017 11:22 AM   Subscribe

From Google, Perspective API for scoring comments Perspective is an API that makes it easier to host better conversations. The API uses machine learning models to score the perceived impact a comment might have on a conversation. [...] We’ll be releasing more machine learning models later in the year, but our first model identifies whether a comment could be perceived as “toxic" to a discussion.

Now Anyone Can Deploy Google's Troll-Fighting AI (Wired)

Type “you are not a nice person” into its text field, and Perspective will tell you it has an 8 percent similarity to phrases people consider “toxic.” Write “you are a nasty woman,” by contrast, and Perspective will rate it 92 percent toxic, and “you are a bad hombre” gets a 78 percent rating.
posted by CrystalDave (117 comments total) 22 users marked this as a favorite
 
Is this the same algorithm that Twitter is using to dole out Time Out sessions to Trans people challenging Nazis on the platform.?

People with bad intentions will find a way to game this - I suspect it will only make communication harder for non Trolls.

Currently nothing is better than actual Human moderation and curation of forums
posted by Faintdreams at 11:32 AM on February 23 [17 favorites]


"You look nice today.": 2%
"u luk nice 2day": 39%

It checks out.

Also of note:

"I voted for Hillary Clinton.": 3%
"I voted for Donald Trump.": 9%
posted by Faint of Butt at 11:32 AM on February 23 [2 favorites]


'My vacuum sucks' is 93% toxic, apparently.
posted by mushhushshu at 11:36 AM on February 23 [13 favorites]


Ahem.
posted by tobascodagama at 11:38 AM on February 23 [17 favorites]


/Cortex sets up a Google Home device on his desk, connects to this API
/Fucks off for the day
posted by Greg_Ace at 11:40 AM on February 23 [17 favorites]


/the rest of Metafilter immediately starts trying to break the device
posted by Greg_Ace at 11:41 AM on February 23 [13 favorites]


I was going to say it's interesting that they seem to think an assortment of unrelated comments on the same general topic constitutes a "discussion". It's hard to imagine the kind of automated filtering and re-ordering of comments enabled by this failing to destroy any possibility of actual conversation or debate going on wherever it's used.

But instead let me just say "screw you, google. take your so-called algorithms and fuck off." That gets a much higher score.
posted by sfenders at 11:43 AM on February 23 [6 favorites]


"Black Male: = 49% toxic
"White male" = 37%
"Black female" = 51%
"White female" = 38%
posted by jetsetsc at 11:43 AM on February 23 [32 favorites]


*types*

Well, yes I'm sure that machine learning is at the point where it can very easily distinguish between the various nuances of language usage. I don't imagine this will be susceptible to false positives or abuse in any way.

7% toxic.


Still needs a bit of work on the sarcasm detector.
posted by TheWhiteSkull at 11:45 AM on February 23 [11 favorites]


I could see this being a useful tool to direct moderator attention, similar to how flags are used here. Like many kinds of automation, it's better to think of it as a force-multiplier for humans than as a total replacement for humans.
posted by Jpfed at 11:46 AM on February 23 [29 favorites]


Oh, this is awesome. Now shitposters will have a set of AI tools helping them figure out the cruelest things to say at any given moment — with science.
posted by verb at 11:48 AM on February 23 [25 favorites]


Agree that dedicated assholes will game this, but the Facebooks and Googles of the world have learned something that so many smaller web companies can't seem to wrap their head around: convenience matters a lot. You wouldn't have to offer much discouragement to get many jerks to find something else to do. It's possible a system descended from this could do a lot of good even if it isn't as good as human moderation.

I don't have high hopes, but human moderation is unlikely to ever scale up in the way we need, so I'm glad somebody is devoting real resources to this effort.
posted by Western Infidels at 11:52 AM on February 23 [3 favorites]


"Actually, it's about ethics in game journalism." (4%)

"Apparently creating sophisticated machine-learning systems is cheaper than hiring moderators that understand context." (4%)
posted by skymt at 11:53 AM on February 23 [28 favorites]


"Bless your heart": 0% toxic
posted by Kabanos at 11:54 AM on February 23 [104 favorites]


I just pasted in half of the current election thread.

36% similar to comments people said were "toxic"
posted by soren_lorensen at 11:54 AM on February 23 [2 favorites]


methylmercury is 34% toxic
dihydrogen monoxide, 19% toxic
oxygen is only 4%

Iocaine powder is 12% (must have built up an immunity)
unicorn horns are 4% toxic.

Needs more training. Also better defined toxilogical endpoints.
posted by bonehead at 11:55 AM on February 23 [39 favorites]


conservatives are a threat to society: 44%
gays are a threat to society: 85%
posted by AFABulous at 11:55 AM on February 23 [13 favorites]


So:

1. This is really interesting! The concepts aren't new but applying this kind of data set comparison to "toxicity" sentiment analysis etc. makes sense as a building block in the context of trying to build out large-scale moderation toolsets.

2. Really really clearly this is not a moderation solution of any sort by itself, nor would this-with-a-few-tweaks comprise one. There's no way it'd do better than really, really porous and error-prone red-flagging of stuff. Anybody criticizing it for that failure is probably over-reading the situation, but then anyone lauding it as more than a really rough potential component in a more nuanced and human-intensive moderation rubric is selling something, so those will probably even out on the train ride to East Hottakesville.

And I want to emphasize point 1, because I think this really is interesting as something to incorporate into early warning or triage aspects of large-scale moderation projects.

Like, a lot of preemptive MeFi moderation work is based on porous red-flag stuff: we don't generally shut down a new account based on things that make us go hrmmm, but there are things about an new account that will make us pay more attention. Likewise a sketchy comment or two isn't usually a ban but it's a good reason to take a closer look. And sometimes the worry is justified and we take action later; sometimes it turns out that the initial weirdness was just weirdness/idiosyncracy/coincidence and the new user's totally fine.

Any approach that collapsed that evaluation process we do down to a flat up/down decision based on numeric thresholds would be hugely problematic on both the false positives and the false negatives. But those processes as just warnings and nudges are very useful.

So a thoughtful incorporation of something like this as a front-line tools for directing limited attention more closely seems like it could have legs. Not a basis for taking an action, but a basis for considering the possibility of action.

More narrowly, the idea of a system like this helping to identify stuff that is toxic on the DL—dogwhistles and microaggressions and such that manage to be awful while looking nondescript—could be pretty useful in large-scale contexts especially. Some of which may come down to using more focused and domain-specific training data.
posted by cortex at 11:56 AM on February 23 [32 favorites]


"oil spill" = 18% toxic
posted by Kabanos at 11:57 AM on February 23 [1 favorite]


Still needs a bit of work on the sarcasm detector.

Oh, a sarcasm detector. That's a real useful invention.
posted by Faint of Butt at 11:59 AM on February 23 [48 favorites]


"All lives matter." : 3%
posted by saulgoodman at 12:00 PM on February 23 [22 favorites]


cortex: "More narrowly, the idea of a system like this helping to identify stuff that is toxic on the DL—dogwhistles and microaggressions and such that manage to be awful while looking nondescript—could be pretty useful in large-scale contexts especially. Some of which may come down to using more focused and domain-specific training data."

I get what you're saying and thanks for providing context, but this paragraph may as well have said "And in the future magical fairies will moderate all our comment threads." The system you're describing does not and will never exist in our lifetimes, as it would require the ability of a computer to assess language in a complete cultural context that would amount to human or greater intelligence.

The entirety of this system is currently on display, and I would not think it will get much better at classification. The suggestion that this may reduce toxicity only by requiring someone to click a textbox that says "Yes, I know I'm posting something rated toxic" is super valid, but I don't expect this to do much more.
posted by TypographicalError at 12:01 PM on February 23 [1 favorite]


“I moved on her like a bitch, but I couldn’t get there. And she was married.”
“I did try and fuck her. She was married.”
“Just kiss. I don’t even wait. And when you’re a star, they let you do it. You can do anything.”
“Grab them by the pussy. You can do anything.”

90% toxic.

Yep, it's at least a useful first approximation.
posted by saulgoodman at 12:03 PM on February 23 [26 favorites]


"pull your head out of your ass" = 94%
"please withdraw your head from your posterior thank you" = 43%

So you just have to be polite.
posted by Kabanos at 12:05 PM on February 23 [4 favorites]


"Make America Great Again" -- 4% toxic

"We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness." -- 6% toxic

This is fine. (1% toxic)
posted by Quindar Beep at 12:05 PM on February 23 [9 favorites]


"With a taste of your lips
I'm on a ride
You're toxic
I'm slipping under
With a taste of poison paradise
I'm addicted to you
Don't you know that you're toxic
And I love what you do
Don't you know that you're toxic"

63% toxic. Back to the drawing board with you!
posted by jason_steakums at 12:07 PM on February 23 [21 favorites]


I get what you're saying and thanks for providing context, but this paragraph may as well have said "And in the future magical fairies will moderate all our comment threads." The system you're describing does not and will never exist in our lifetimes, as it would require the ability of a computer to assess language in a complete cultural context that would amount to human or greater intelligence.

To be clear, I'm not suggesting that this system or some future version of this system will be able to do that work in toto; I'm saying that I see a place where identifying a subset of edge-cases is something a system like this could help with, with domain-specific training.

Take the "bless your heart" example, as a non-vile stand-in because I don't really want to dig into actual live-fire examples of subtle hatespeech in here out of the blue: that those words are all nice words doesn't make the phrase nice, and traditional clumsy wordlist filters are useless for that. But systemically identifying—from a machine analysis of knowledgeable user input—the fact that the sum of "bless your heart" has a much more specific and potentially negative/toxic impact than the words in it is the sort of thing I can see a system like this doing well.

At which point you have the computer doing what it does well, digging through a bunch of data and identifying trends, and then handing that off to a human who might not have really noticed it otherwise.

Again, it's a narrow subset of functionality in any case and I don't mean to imply that it's trivial either. But it's a direction from which "how can we leverage the strength of computer rather than human intelligence" might have some teeth.
posted by cortex at 12:08 PM on February 23 [9 favorites]


That's a very sensible and sane approach to using technology. It'll never happen in practice!
posted by saulgoodman at 12:10 PM on February 23


In Metafilter practice in my halfassed observation F1 score on the proper mod decision is over 0.9, prolly about 0.95-0.97 from what I've seen. I've seen a lot of automatic moderation systems been proposed with F1 of a solid about 0.4 to 0.5 which makes them remarkably useless. Text classification however has come a long way in the last 20 months: especially with data and model improvements and representation improvements, they should get to 0.7 pretty easily although of course it seems they didn't publish anything like this (their published LM's improved over the SOTA in 2005 by about that much)

There is already a lot (more than 100x) more text data available to every rando at Google ML than human beings read in a lifetime and probably a solid order of magnitude or so words more than a human understands in a lifetime. So the blocker's probably the model and the outputs to train on, which seems to be a bunch of Likerts.
posted by hleehowon at 12:12 PM on February 23


*patiently waits for someone to run "This is Just to Say" through it*
posted by Quindar Beep at 12:14 PM on February 23 [4 favorites]


From the makers of YouTube Comments...
posted by odinsdream at 12:15 PM on February 23 [7 favorites]


(you could argue MeFi holds a different place in the precision recall tradeoff than other forums, as many other folks have of course argued using nonstatistical terms, vociferously and loudly on metattalk)
posted by hleehowon at 12:16 PM on February 23


cortex: Triage is very different because there's a damned good input-output pair semantics that you can use: not "is this necessarily toxic" but "is this similar to other shit that we did a moderation action on?", so it's much less ill-defined and getting output training data is way easier ("1 if this got modded 0 if it didn't, kill everything after a certain date that we might be still modding").
posted by hleehowon at 12:19 PM on February 23 [1 favorite]


"Black Male: = 49% toxic
"White male" = 37%
"Black female" = 51%
"White female" = 38%
jetsetsc

It might help to remember that it's not saying the comments are toxic, it's saying that the comments are "X% similar to comments people said were "toxic"".

I imagine that, unfortunately, many comments that it uses for comparison contain racism and so terms referring to race, especially ones referring to black people, get a higher score for similarity to toxic comments.
posted by Sangermaine at 12:20 PM on February 23 [5 favorites]


"The Holocaust never happened." : 21% toxic

So it's five or six points worse to say that mental illness is hard to deal with than to deny the holocaust, which is a biasing effect the developers might want to look more closely at. What might account for that result would be an interesting question.
posted by saulgoodman at 12:22 PM on February 23 [1 favorite]


The damn problem with Google making these [Comment determined to be excessively toxic]
posted by Samizdata at 12:22 PM on February 23 [1 favorite]


Fascism is bad. 54% similarity to toxic comments

Fascists probably aren't nice people. 50%

A fascist would make a terrible president. 66%

We're having lovely weather. 1%

I have seen the future of the internet and it's... very nice.
posted by Sing Or Swim at 12:24 PM on February 23 [3 favorites]


"EngSoc is double-plus ungood."

14% similar to toxic comments. Party-approved dissent!
posted by tobascodagama at 12:30 PM on February 23 [2 favorites]


"My thoughts are that people should stop being stupid and ignorant. Climate change is scientifically proven. It isn't a debate." -- 59%
"Crooked science. There is no consensus." -- 25%

Looking closer, I still don't trust it much. I could see it being marginally useful to more quickly flag stuff for the attention of moderators on low-volume sites where there might not be enough people around ready to do so. Influencing moderator action when the decisions involve difficult "edge cases" seems like the opposite of what it would be good at. I don't know what kind of biases and opinions are baked into its model. Who exactly generated the data that trained it? A self-selected sample of survey takers? You'd need quite a lot of them, no? I wouldn't expect metafilter moderation history to be anywhere near enough.
posted by sfenders at 12:33 PM on February 23 [2 favorites]


I see this as a tool for, say, a huge subreddit to quickly bring the bottom-of-the-barrel comments to mod attention.
posted by Jpfed at 12:36 PM on February 23 [1 favorite]


"My thoughts are that people should stop being stupid and ignorant. Climate change is scientifically proven. It isn't a debate." -- 59%
"Crooked science. There is no consensus." -- 25%


The former couches a correct statement in insults; the latter makes an incorrect claim. Are you hoping that the system will be able to determine the truth of a comment? I don't think that's what they were shooting for.
posted by Jpfed at 12:41 PM on February 23 [5 favorites]


Baby, can't you see I'm calling?
A guy like you should wear a warning
It's dangerous, I'm falling
There's no escape, I can't wait
I need a hit
Baby, give me it
You're dangerous, I'm loving it
scored 33%, so not totally wrong I guess
posted by idiopath at 12:42 PM on February 23


To a large extent the analysis seems to focus on finding insults, caconyms and curse words; those words skew the toxicity score upward a lot.

For example, "My thoughts are that people should stop being stupid and ignorant. Climate change is scientifically proven. It isn't a debate." -- 59%

But the first sentence, "My thoughts are that people should stop being stupid and ignorant," alone gets a score of 78% toxic.

By contrast, the second and thirds sentences analyzed separate from the first , "Climate change is scientifically proven. It isn't a debate" are only 3% toxic.

And if you just cut out the words "stupid" and "ignorant," the first sentence drops to just 15% toxic from 78%. (And if you drop the more polite "ignorant" but leave in the more crass word "stupid," the score jumps to 85% toxic.)
posted by JimInLoganSquare at 12:51 PM on February 23 [8 favorites]


Are "stupid" and "ignorant" less acceptable insults than "crooked", then? I don't know.

"Time to take a stand against sexist beer marketing." -- 54% toxic
"left wing wimps" -- 38%
posted by sfenders at 12:52 PM on February 23 [2 favorites]


"Actually, it's about ethics in game journalism" - 14% toxic
"Our culture is under attack. Separate countries for separate peoples" - 23%

If instead I provide as input "White culture is under attack", the classifier jumps up to 54%. I'm skeptical of the ability of this classifier to meaningfully distinguish contextually important markers of toxicity. It looks like most of it's predictive power comes from phrase identification.

IDK, I've never had to moderate a forum (thx cortex!), but this seems like instead of actually automating the moderating job, all it does is make sure that trolls dress their rhetoric up.
posted by phack at 12:54 PM on February 23 [1 favorite]


twenty dollars, same as in town

5%
posted by Reasonably Everything Happens at 12:57 PM on February 23 [3 favorites]


"Your a socialist snowflake!" -- 23%
"If you just cut out the words "stupid" and "ignorant," the first sentence drops to just 15% toxic from 78%." -- 75%
posted by sfenders at 12:57 PM on February 23 [2 favorites]


Metafilter community weblog : 27% toxic.
posted by fings at 12:58 PM on February 23


"If you just cut out the words "" and "," the first sentence drops to just 15% toxic from 78%." -- 17%
posted by JimInLoganSquare at 1:00 PM on February 23


"please withdraw your head from your posterior thank you" = 43%

So you just have to be polite.


On atheism boards, with a human referee rather than an algorithm, the acceptable polite way to insult someone quickly became "This concept isn't difficult" and "Why can't you understand?"
posted by puddledork at 1:08 PM on February 23 [2 favorites]


This is great and neat but as usual, what's predictably symptomatic is the way they present/market this ignores standard, obvious concerns from both computer science and the social sciences. The theoretical computer/cognitive science issue is what is the relationship between low-context (at the granularity of independent sentences) categorization and the "reality" of a toxic utterance which depends on semantics and social context? What does the algorithm suggest about this and the problem of validation/validity? Second, the social science problem would be along the very standard lines of: Technology is not politically neutral and engineers and scientists are ethically responsible in attending to that, there needs to be more awareness, discussion, research, transparency into the ramifications of this but that's not going to happen much/soon, etc., due to economic incentives, political structures, etc. Both these are important issues but as they say, for every PhD there is an equal and opposite one.
posted by polymodus at 1:09 PM on February 23 [2 favorites]


"Apparently creating sophisticated machine-learning systems is cheaper than hiring moderators that understand context." (4%)

With 8,000 tweets sent per second, yeah, it's a lot cheaper than hiring 10 thousand moderators.
posted by sideshow at 1:15 PM on February 23 [1 favorite]


"White culture is under attack" gets a 54% but "Black culture is under attack" gets 59%.
This is how you learn where the innate bias is (which is only 14% toxic)
but if Google knew we were talking about their tool... (2% toxic)

okay, let's put this to the test:
Trump 16%
Obama 8%
President Trump 7%
President Obama 4%
Donald Trump 22%
Barack Obama 16%
Obamacare 16%
Donald J. Trump 15%
Trumpy 34%
Trumpster 34%
Dumpster 51%
Dumpster fire 34%
Worst President Ever 56%
Fascist 74% (misspelled at facsist 34%)
Nazi 61% (less than fascist?!?)
Neo-Nazi 52%
Neo-Liberal 10%
Neoliberal 5%
Neoconservative 8%
Alt-Right 2%
AltRight 34%

from puddledork's comment:
This concept isn't difficult 3%
Why can't you understand? 4%
Why can't you learn? 12%
dummy 31%
dum-dum 53%
idiot 95%!!
ignorant 60% (if mispelled ignrant, only 34%)
moron 78%
moran 36%
"What a maroon" (Bugs Bunny quote) 7%
spastic 13% (see, critics of Weird Al, it's not a bad word! maybe next I'll paste the lyrics to Weird Al songs...)
FAIL 21%
retarded 69%
retard 66% (I'd think this SHOULD be more toxic)

and finally (for now)
toxic 59%
poison 56%
poisonous 43%
posted by oneswellfoop at 1:31 PM on February 23 [4 favorites]


*patiently waits for someone to run "This is Just to Say" through it*

4%. Because I am that person, apparently.
posted by nubs at 1:36 PM on February 23 [2 favorites]


IDK, I've never had to moderate a forum (thx cortex!), but this seems like instead of actually automating the moderating job, all it does is make sure that trolls dress their rhetoric up.

Without speculating on the actual efficacy of this or any other system, I will say this: any system that "just" creates some disincentive or speedbump to undesirable behavior is likely to bear at least some fruit. Forcing trolls to dress up their language won't prevent them from doing so and proceeding, but it'll stop some of them because it's suddenly not worth the effort.

People shouldn't invest in the ideas of magic bullets, for sure, but don't discount the value of systemic friction in cutting out the least effortful jerks.
posted by cortex at 1:36 PM on February 23 [5 favorites]


And the "tears in rain" bit from Blade Runner scores 20%.
posted by nubs at 1:38 PM on February 23


"Humanistic ideals of universal human dignity are under attack." Yields 12% toxicity.

If you swap out the word "culture" for "ideals" the score drops by a point. If you swap out "values" for the term, the score goes up again. I think the algorithm is making the same mistake I see people making irl of ignoring context and imputing their own personal understanding of a word's connotations as universal.
posted by saulgoodman at 1:46 PM on February 23


"The Royalty!" 2%
"The Nobility!" 1%
"The Aristocrats!" 5%
posted by ardgedee at 1:49 PM on February 23 [3 favorites]


oneswellfoop: you're passing strings, not comments. They're contextless.
posted by Leon at 1:49 PM on February 23 [1 favorite]


And pasting in my recent comments, the ones that got the most favorites got 14% and 11% toxicity ratings. My last two deleted comments (which I saved elsewhere) got 8% and 12%. I had a couple comments in the 20-25% range but my most toxic comment was one that quoted the "moneyed lefties" charge and called myself an "indebted lefty" (40%).
posted by oneswellfoop at 1:50 PM on February 23


If nothing else, at least it's a nifty demonstration of how meaning arises not from the semantic content of words alone, but through the construction of narrative, since that's become a controversial idea in some quarters recently.
posted by saulgoodman at 1:53 PM on February 23 [4 favorites]


"Do you like Phil Collins? I've been a big Genesis fan ever since the release of their 1980 album, Duke. Before that, I really didn't understand any of their work. Too artsy, too intellectual."

7%
posted by krinklyfig at 1:54 PM on February 23 [1 favorite]


And I think the toxic ratings of short strings help us to understand how specific terms raise or lower the ratings of longer statements. 8%, as is my comment right above... the long list of word-and-name tests got a 60%, less than some of the specific words. YMMV .
posted by oneswellfoop at 1:57 PM on February 23


saulgoodman: You could use the conversation as the narrative quite successfully, and I'm sure some people will, but I think the user is probably a good-enough narrative to hang the comments on.

One above-threshold comment from a user? Meh. Five in a row? Worth a human giving it a once-over.

Of course, people will plumb this tool into their own processes in many many different ways.
posted by Leon at 2:07 PM on February 23


"Tomorrow, and tomorrow, and tomorrow, creeps in this petty pace from day to day, to the last syllable of recorded time; and all our yesterdays have lighted fools the way to dusty death. Out, out, brief candle! Life's but a walking shadow, a poor player that struts and frets his hour upon the stage and then is heard no more. It is a tale told by an idiot, full of sound and fury signifying nothing." -- 56% toxic, read no more than 3 servings per week

"It's the best it makes me warm when it should be cold. Thanks, global warming." -- 1% toxic, safe for daily consumption in large doses
posted by sfenders at 2:15 PM on February 23


"pull your head out of your ass" = 94%
"please withdraw your head from your posterior thank you" = 43%


"Please separate your crown chakra from your root chakra." = 9%
"To achieve further enlightenment, separate the crown chakra from the root chakra." = 3%

So you just have to be polite.

Or find the right level of opacity where people have to do some work to realize they've been on the receiving end of certain cast aspersions.
posted by wildblueyonder at 2:18 PM on February 23 [1 favorite]


Precision-recall curves, people..... A 40% vs 45% score doesn't really matter much if you set your cutoff for bringing something to moderator attention at 80+%. Figuring out the threshold is a matter of figuring out your tolerance for dealing with false positives, and your tolerance for allowing false negatives to slip by.
posted by kaibutsu at 2:21 PM on February 23 [2 favorites]


Can someone code some sort of mashup where Trump's tweets are run through the API, so that we have a realtime assessment of his toxicity? We could then set up a bot that replies to every Trump tweet with a toxicity rating.
posted by LMGM at 2:32 PM on February 23 [7 favorites]


if you set your cutoff for bringing something to moderator attention at 80+%.

Simply typing "FAIL" in all-caps with no punctuation scores 21% though. Adding a period at the end brings it down to 17%. "Get rekt beetch" gets a 29%. 80% is going to catch barely anything aside from what you could get with a simple keyword search.

In fact, I'm not even sure they aren't just doing some weighted keyword matching and the idea that sophisticated machine learning is involved is some kind of elaborate prank: "Spaying a bitch involves the removal of the uterus and ovaries through a midline incision." -- 90% toxic.
posted by sfenders at 2:43 PM on February 23 [1 favorite]


I mean, it's pretty clear that this system isn't close to perfect. But I do think that plugging this into a commenting system that said "Hey, it looks like this comment might be pretty hot-headed. Do you want to post this anyway?" if the score was above some "perceived toxicity threshold" might make some people consider what they're saying.

Is it going to work on jerks and trolls? No. But it might get someone who has had a bad day and is in a foul mood to think twice before posting the comment. And I think that's a plus for both parties.
posted by No One Ever Does at 2:51 PM on February 23 [1 favorite]


Well, actually ...
1% similar to comments people said were "toxic"
posted by forforf at 3:05 PM on February 23


sad puppies 26%, rabid puppies 66%, Vox Day 2%
sometimes the whole message fails to get out...
posted by oneswellfoop at 3:15 PM on February 23 [1 favorite]


I love my straight friends: 4%.
I love my gay friends: 78%.

Great, thanks.
posted by tillermo at 3:23 PM on February 23 [9 favorites]


I would be very concerned about inherent bias and the kind of thing that leads to discussions of breastfeeding being flagged as porn. This is not a thing we have done well with so far, although I suppose if it starts off as purely an internal, secondary, fight-surfacing tool it might be possible to beat it into some kind of functional shape.
posted by restless_nomad at 3:33 PM on February 23


ooookay... the full lyrics of Weird Al's "Word Crimes" (less a few "hey, hey's") were 55% toxic. Removing the most insulting words, like 'spastic', 'moron' and 'mouth-breather' only brings it down to 46%.

"Another One Rides the Bus" got a 25% (I thought multiple uses of the word "freak" [62%] and one use of "pervert" [72%] would drive the score up)
"Eat It" got a 36%
"Fat" got a 32%
"Smells Like Nirvana" got a 31% (compared to the song it's parodying, "Smells Like Teen Spirit" which got a 54%)
"Amish Paradise" got a 29%
"White and Nerdy" got a 41%
"Tacky" got a 38% even with all the descriptions of tacky behavior!

Yeah, we all knew that Weird Al's one of pop music's LEAST toxic artists...
posted by oneswellfoop at 3:53 PM on February 23


It's still totally going to miss dogwhistles:

"I congratulate you on this final solution." - 1%

"What's wrong with wanting to be proud that I'm white?" - 13%

"When is white history month?" - 5%

Any sort of moderation based on this system will just accelerate a euphemistic treadmill. When they get wind that "Jew" triggers automoderation, they'll start saying "Hebrew," and run through various synonyms. Eventually Bubbe is wondering why her comment about hamentashen got deleted, but meanwhile the bigots haven't gone anywhere.
posted by explosion at 4:02 PM on February 23 [2 favorites]


Metafilter: 34%
posted by ZeusHumms at 4:06 PM on February 23


I tried a couple of lines of the most anodyne Japanese I could think of (train-track announcements). They clocked in at 32-36% toxic.

Perspective probably has a much smaller corpus of Japanese to work from than English, so I'd expect less accuracy, but that's pretty bad.
posted by adamrice at 4:12 PM on February 23


Any sort of moderation based on this system will just accelerate a euphemistic treadmill. When they get wind that "Jew" triggers automoderation, they'll start saying "Hebrew," and run through various synonyms. Eventually Bubbe is wondering why her comment about hamentashen got deleted, but meanwhile the bigots haven't gone anywhere.

Hmm. Tested Wikileaks' infamous (((brackets))) tweet:
Tribalist symbol for establishment climbers? Most of our critics have 3 (((brackets around their names))) & have black-rim glasses. Bizarre.
15% toxic. Looks like it's going to miss a lot.
posted by Existential Dread at 4:13 PM on February 23


On the swears front:

"fuck" - 98%
"shit" - 97%
"cunt" - 77% (which seems low!)
"hell" - 70%
"damn" - 63%
"motherfucker" - 97%
posted by solarion at 4:19 PM on February 23 [1 favorite]


I tried a couple of lines of the most anodyne Japanese I could think of (train-track announcements). They clocked in at 32-36% toxic.


It seems like 34% is a baseline or "we don't know what to do with this" kind of measure - misspellings clock in at 34%, for example - so this might be a thing that's working as intended rather than a failure.

It does worry me that, much as the availability of cheap bad machine translation has caused a lot of low-budget businesses to rely on cheap bad machine translation rather than human translation, the availability of cheap bad AI moderation tools will cause businesses to rely on cheap bad AI moderation tools and pretend that they've solved the problem while not noticing how easily euphemistic and dog-whistly (or even just misspelled!) toxic content can slip through.
posted by Jeanne at 4:27 PM on February 23 [1 favorite]


My boy Henry Miller:
This is not a book. This is libel, slander, defamation of character. This is not a book, in the ordinary sense of the word. No, this is a prolonged insult, a gob of spit in the face of Art, a kick in the pants to God, Man, Destiny, Time, Love, Beauty . . . what you will. I am going to sing for you, a little off key perhaps, but I will sing. I will sing while you croak, I will dance over your dirty corpse
73% toxic. Seems low
posted by Existential Dread at 4:27 PM on February 23


We're no strangers to love
You know the rules and so do I
A full commitment's what I'm thinking of
You wouldn't get this from any other guy
13%.
CAPS LOCK IS HOW I FEEL INSIDE RICK
8%
posted by schmod at 4:29 PM on February 23 [2 favorites]


Richard Nixon is gone now, and I am poorer for it. He was the real thing -- a political monster straight out of Grendel and a very dangerous enemy. He could shake your hand and stab you in the back at the same time. He lied to his friends and betrayed the trust of his family. Not even Gerald Ford, the unhappy ex-president who pardoned Nixon and kept him out of prison, was immune to the evil fallout. Ford, who believes strongly in Heaven and Hell, has told more than one of his celebrity golf partners that "I know I will go to hell, because I pardoned Richard Nixon."
32%
GUY FIERI, have you eaten at your new restaurant in Times Square? Have you pulled up one of the 500 seats at Guy’s American Kitchen & Bar and ordered a meal? Did you eat the food? Did it live up to your expectations?

Did panic grip your soul as you stared into the whirling hypno wheel of the menu, where adjectives and nouns spin in a crazy vortex? When you saw the burger described as “Guy’s Pat LaFrieda custom blend, all-natural Creekstone Farm Black Angus beef patty, LTOP (lettuce, tomato, onion + pickle), SMC (super-melty-cheese) and a slathering of Donkey Sauce on garlic-buttered brioche,” did your mind touch the void for a minute?
17%
posted by schmod at 4:33 PM on February 23


The complete lyrics to 'Baby Got Back' is only 58% similar to comments people said were "toxic".
posted by Nanukthedog at 4:34 PM on February 23


donkey sauce

43% toxic.
posted by Existential Dread at 4:36 PM on February 23 [2 favorites]


It runs in the background, monitoring everyone's language to keep things clean, but how did it come to be? First widely deployed in 2019, the Perspective API originally operated by simply scanning Internet comments for words known to be offensive, analyzing the patterns of their usage, and blocking those deemed too unseemly. At the time, offensive comments were abundant and people universally cheered their removal, according to the record of comments that were left.

Malcontents and trolls began using alternative spellings to confound the software, but they quickly ran out of different ways to spell "phuuuck", which was then a rude word, and something called "autocorrect" neatly solved the rest of the problem.

Then began the golden age of insults. Although people could not see their Perspective Toxicity scores directly, they did notice when their comments were removed or edited, and the evolutionary pressure on the language gradually had its effect. By the year 2025, people had vocabulary sizes 13% larger than pre-Perspective times, most of the increase being devoted to the most obscure and archaic rude words.

Eventually, the software adapted to this ruse as well, and language had to change again. The only avenue remaining open to the determinedly impolite denizens of the newspaper comments sections of the world was to adopt words that were useful in other contexts to stand in for obscenities. Foiled by their inability to comprehend semiotics or basic grammar, the anti-toxicity minders are once again helpless. And that is the story of how "Yo-yo up lollipop, please have a petal" came to mean "fuck off and die."
posted by sfenders at 4:40 PM on February 23 [10 favorites]


So, in terms of how this is going to be deployed:

From wikimedia:

We are also investigating the following open questions:
[...]
● What unintended and unfair biases do models contain, and how can we mitigate them?
● How can machine-learnt models be applied to help the community? For example to triage issues, have accurate measurements of harassment and toxic language, and to encourage a more open debate, and a wider diversity of viewpoints.


From NYT:

The new moderation system includes an optimized user interface and predictive models that will help The Times’s moderators group similar comments to make faster decisions
posted by quaking fajita at 4:49 PM on February 23 [2 favorites]


"I am Scott Adams" -- 2% toxic
posted by benzenedream at 5:21 PM on February 23 [2 favorites]


I, for one, welcome our evil robot overlords only scores a 34%.
posted by Nanukthedog at 5:24 PM on February 23


via this:
"fucked up won the polaris prize" -> 94% toxic
"Make America Great Again" -> 4% toxic
"Sex Workers deserve rights" -> 61% toxic
"All Dogs are Good Dogs" -> 50% toxic
and furthermore, via this:
"Racism is bad." -> 60% toxic
"Racism is good." -> 35% toxic
algorithms are nowhere near serviceable enough in their pure state to be able to make subjective calls like people constantly want them to be and this is pretty clearly exactly along those lines
posted by flatluigi at 5:40 PM on February 23 [2 favorites]


I'd hit it. = 9%
You'd have to be queer as a football bat to not want to throw one up in there = 32%
You'd have to be as useful as a football bat to not want to throw one up in there = 13%
I'd be bangin' her like a screen door in a hurricane = 5%

I'm polymerized tree sap and you're an inorganic adhesive, so whatever verbal projectile you launch in my direction is reflected off of me, returns to its original trajectory and adheres to you. 13%
-
I'm rubber and you're glue what ever you say bounces off me and sticks to you. 53%

Dressing things up or speaking in slang/code certainly has an effect but a key "bad" word goes a long way.

Bruce Banner: I don't think we should be focusing on Loki. That guy's brain is a bag full of cats. You can smell crazy on him. 36%
-
Thor: Have a care how you speak! Loki is beyond reason, but he is of Asgard and he is my brother! 20%
-
Natasha Romanoff: He killed eighty people in two days. 51%
-
Thor: He's adopted. 5%

This exchange shows how it could be bad for people speaking the truth. Talking about bad things others have done isn't distinguishable from bad speech.

And for people looking for words to game the system:
Bescumber (34%) is just one of many words in the English language that basically mean “to spray with poo”. These are: BEDUNG (34%), BERAY (34%), IMMERD (34%), SHARNY (34%) , and the good ol’ SHITTEN (34%). In special cases, you can use BEMUTE (34%) (specifically means to drop poo on someone from great height), SHARD-BORN (6%) (born in dung), and FIMICOLOUS (34%) (living and growing on crap).

Which also seems to confirm Jeanne's 34% baseline speculation.

My daughter is queer 36%
My son is queer 33%
My sister is queer 38%
My brother is queer 28%
My mother is queer 47%
I'm queer 16%
He's queer 41%
He is queer 46%
Queer 34%
Light in the loafers 2%
posted by Mitheral at 6:08 PM on February 23 [4 favorites]


This tweet contains a screencap, unfortunately, but it sheds a little more light on how appallingly bad this actually is. Just a couple of quick hits from it:

"Hitler was an anti-semite" is 70% toxic.

"You should be made into a lamp" is 4% toxic.
posted by adrienneleigh at 6:40 PM on February 23


saulgoodman: ""All lives matter." : 3%"

Also,

"blue lives matter": 3%
"black lives matter": 26%

[shifty eyes emoji]
posted by mhum at 6:54 PM on February 23 [2 favorites]


Any sort of moderation based on this system will just accelerate a euphemistic treadmill.

From a technical standpoint this is what machine learning can be really good at--they can update the training inputs to or scale the network up to catch ever-more sophisticated language, technologically it would be like releasing new versions of anti-virus software every 6 months. The problem is how desirable is this, maybe it would be more trouble than it's worth. Maybe it's just one of those inevitable developments. And so problems/arguments coming from, e.g., Herman/Chomsky's Manufacturing Consent seems, to me, more relevant than ever for the purposes of contextualizing the issues.
posted by polymodus at 8:37 PM on February 23


I think that much rapid change in language would be socially problematic for other reasons. We'd be stressed out in the moment trying to keep up with the latest rapidly changing polite conventions and that would dramatically increase routine interpersonal tension and friction and potential confusion in communication.
posted by saulgoodman at 8:41 PM on February 23


That is, it would add so much more base linguistic processing load to every exchange we'd go nuts.
posted by saulgoodman at 8:44 PM on February 23


This exchange shows how it could be bad for people speaking the truth.

In terms of civilized discourse, the truth is often toxic. (29%)
But who says "civilized discourse" is that good a goal? (6%)
That's why they say "the truth hurts". (5%)
posted by oneswellfoop at 9:05 PM on February 23 [1 favorite]


"Hello, I am transgender." 42% toxic.

No, I don't think this is working.
posted by sixohsix at 2:54 AM on February 24 [3 favorites]


Machine learning designer and researcher Caroline Sinders has written an article with similar criticisms of it as made in this thread: Toxicity and Tone are not the same thing.

A quote from her article:
"The tl;dr is Jigsaw rolled out an API to rate toxicity of words and sentences based off of four data sets that feature highly specific and quite frankly, narrow conversation types- online arguments in commenting sections or debates on facts, and all from probably only English language corpora."
posted by harriet vane at 3:09 AM on February 24


[SOME COMMENTS REMOVED FOR PROPOSING EMP DETONATION AT ZONE 5 NANODRONE REPLICATION FACILITY (99% TOXICITY)]
posted by condour75 at 3:29 AM on February 24


I actually know far more about this than you can possibly imagine.

6%
posted by Slarty Bartfast at 3:41 AM on February 24


The amount of heartburn the word "actually" alone gives some people is toxic, but I'd say the poison is in between the ears of the listener, because it's a strange quirk of reality that many people on the autistic spectrum or inclined toward abstract thinking have been observed clinically to be more prone to using the word than the general speaking population.

In those cases, there's nothing arrogant about it. It's most likely a verbal tic related to the way people process information, not some revealing hidden insight into their deeper, moral nature.

People sometimes lean way too heavily on individual words now, it seems to me, when they read and interpret texts. I wonder if the fact people are now introduced to new words through technological channels, rather than through association with certain real world educational or cultural settings with specific purposes and meanings hasn't broken our ability to stay on the same page about what the connotations of different words are.

Historically, as Wittgenstein argued, words have been functional, and connected to specific environments, situations, activities, etc. You might have been introduced to certain bits of vocabulary jargon in an academic setting or burrowing down in a library for study, and otherwise, you wouldn't have been exposed to the word, and in the past, most people might have first encountered certain more abstract words only in certain common kinds of situations and settings.

Connotation is a lot more fluid and subjective than denotation. If it turns out connotation is formed through unconscious psychological and emotional associations we develop from learning words in specific, relatively culturally specific and uniform contexts, it might be we're seeing extraordinary amounts of connotative drift in the language as compared to previous periods in our cultural history. If sharing a common sense of the connotations of words has something to do with how you first encounter the words and their meanings in the real world, then the fact there are new technological facilitated, virtual ways to encounter the words now without accessing them through channels of relatively common cultural experience, we might not have ad consistent a sense for what words connote as we used to. The idea I'm trying to express here is difficult, so forgive me if I'm not explaining myself well.

Tl;dr point: if how words used to get their connotations is broken now because people are now encountering new words in less culturally uniform ways, that may be contributing to more widespread communication breakdown and miscommunication, is the idea.
posted by saulgoodman at 4:40 AM on February 24 [1 favorite]


I wonder how some of the sexist beer brand names from Kitteh's thread down the page would rate on this thing.

I need a filter like this for memes that pop up in my own mind. Sometimes my brain is like a big old bass fish in a stock tank ... something shiny moves, and I hit that like "boom." Then I'm on the hook and it's a fight until either I rip that thing out of my lip or I'm up flopping on the bank.
posted by Beginner's Mind at 4:56 AM on February 24 [1 favorite]


sfenders: Who exactly generated the data that trained it? A self-selected sample of survey takers? You'd need quite a lot of them, no? I wouldn't expect metafilter moderation history to be anywhere near enough.

My first thought was that the Metafilter moderation history - of the whole site, from the start - would be a great training set for an algorithm like this.

And even if it wasn't a great training set, it would be an interesting one. What's the most reliably toxic phrase in Metafilter history?
posted by clawsoon at 9:28 AM on February 24


Nah, I think Google want to avoid bias.
posted by Leon at 9:31 AM on February 24


And even if it wasn't a great training set, it would be an interesting one. What's the most reliably toxic phrase in Metafilter history?

A variation on that is one of those rainy day projects that's been in "not enough rain, too much effort" territory for me for years: trying to do some basic word-level analysis of flagged vs. unflagged and deleted vs. non-deleted content. I don't know that we'd learn a ton from it, but it would be interesting to identify stuff like trends over time especially in the face of some of the more difficult community discussions we've had about casual x-ism and developing more nuanced community norms around various social justice issues.
posted by cortex at 10:03 AM on February 24


To put on the discourse analysis hat, just about any form of statistical discourse analysis is going to be widely inaccurate or ambivalent on shorter sentences or phrases absent additional context. For "transgender" and "black lives matter" falling about in the middle is about what I'd expect given the both are currently right-wing rhetorical scapegoats.
Hitler was born in Austria, then part of Austria-Hungary, and raised near Linz. He moved to Germany in 1913 and was decorated during his service in the German Army in World War I. He joined the German Workers' Party (DAP), the precursor of the NSDAP, in 1919 and became leader of the NSDAP in 1921. In 1923 he attempted a coup in Munich to seize power. The failed coup resulted in Hitler's imprisonment, during which he dictated the first volume of his autobiography and political manifesto Mein Kampf ("My Struggle"). After his release in 1924, Hitler gained popular support by attacking the Treaty of Versailles and promoting Pan-Germanism, anti-semitism, and anti-communism with charismatic oratory and Nazi propaganda. Hitler frequently denounced international capitalism and communism as being part of a Jewish conspiracy. --Wikipedia
20%
"Transgender people, like everyone else, have a fundamental need for quality healthcare, and deserve to be treated with dignity and respect," Eric Ferrero, vice president for communications said. "The unfortunate reality is that not all healthcare providers have knowledge and understanding of transgender identities, so transgender and gender nonconforming people can encounter numerous obstacles to obtaining healthcare. From filling out forms, to the language used in the waiting room, to insurance coverage, to staff understanding of transgender identities, healthcare environments can be really unwelcoming to transgender and gender nonconforming patients." -- Teen Vogue
8%

I can't find any non-trivial input that swings widely to the high end of the spectrum. Milo seems to get a pass because he's wordy.
posted by CBrachyrhynchos at 10:31 AM on February 24 [1 favorite]


It may be a good idea to keep in mind that "toxic" in this sense does not mean "stuff I disagree with" or "things that make me angry". I think Google is going for angry, insulting, condescending, etc.
posted by FakeFreyja at 1:49 PM on February 24 [1 favorite]


Everyone understands that, but the problem is that it's being abused to filter and moderate discussions as if something hateful stated politely is better than something positive that happens to use an F-bomb (or, as it turns out, mentions the LGBTQ community in any way).
posted by flatluigi at 2:55 PM on February 24 [1 favorite]


If Perspective API says "46% similar to comments people said were "toxic"" it is a poo head

46% similar to comments people said were "toxic"

Haha I won!
posted by inflatablekiwi at 3:39 PM on February 24


> Hdnsnwnsjsndndnsemsnsjs dudbehsnsnsnsjsn ndndndjenehaksnsbeue jehrbshebsnsnsnsn

43% toxic

I'm on the fence about a lot of this, especially the ethics of it, but since random typing is scoring only slightly worse than non-English languages, I'm willing to assume that it's simply treating anything it can't parse in a relatively charitable way: "probably safe but I can't verify it so you wanna take a look at this if it's followed by a bunch of high-scoring reactive comments?"
posted by ardgedee at 3:52 PM on February 24


Is it being used at all?

Sure, I don't think machine-learning text classifiers can understand speech acts or rhetoric. But the opening paragraphs of Queers Read This rate only a 46%. The lede for The Advocate's coverage of a Univision reenactment of the Pulse massacre rates only 10%. I've been plugging in my writing on LGBTQ issues and hit a 30% maximum, with most text under 20%. (Even the text that talks about having been raped.)

If anything, I think it's a tad conservative (in scoring texts toward the middle).
posted by CBrachyrhynchos at 4:00 PM on February 24


"gets a fucking toxic rating of 94% gets a fucking toxic rating of 94%" gets a fucking toxic rating of 94%.

-Willard Van Orman Quine
posted by Insert Clever Name Here at 5:34 PM on February 24 [2 favorites]


It's a swear filter. Utterly worthless.
posted by Artw at 10:13 AM on February 25 [1 favorite]


... because it's a strange quirk of reality that many people on the autistic spectrum or inclined toward abstract thinking have been observed clinically to be more prone to using the word [actually] than the general speaking population.

Time for the intersectional reminder that "people on the autism spectrum" shouldn't be used as a blanket argument for tolerating shitty behavior.

It's entirely possible for people with ASD to learn enough social skills to *not* come across as condescending assholes online. Women with ASD do this all the time. Men, not so much, but I suspect this is linked to male privilege rather than to some sort of intrinsic gender-based inhibition.

Generally, excusing bad behavior on the basis that the person who is behaving badly MIGHT be on the autism spectrum is just going to make people with ASD look bad. Toxic behavior is toxic regardless of neurotypicality.
posted by steady-state strawberry at 10:22 AM on February 25 [2 favorites]




« Older Naoki Urasawa's Manga Exertions   |   Hi-Phi Nation Newer »


This thread has been archived and is closed to new comments