😬 🀣 😍 ? πŸ’–πŸ’–πŸ’–! πŸ—„οΈ πŸ“» πŸ“±? πŸ’€πŸ’€πŸ’€
November 8, 2019 7:19 PM   Subscribe

New Emoji Are So Boringβ€”but They Don't Have to Be: A new data set on the popularity of emoji reveals a problem with Unicode's approval process, along with a way to fix it.
Many of the most popular existing emoji would not have passed Unicode's search criteria if they'd been in place at the time: smiling face with smiling eyes, face with tears of joy, loudly crying face, sparkle heart, eggplant, smiley poo, devil face, see-no-evil monkey, party popper, bicep, crossed finger, and shrug. None of these have anywhere near the benchmark 500 million results when you search for them in Google, even in 2019 when those results have been juiced by many pages about the emoji themselvesβ€”instead, they got in by being on Japanese phones before Unicode started taking over the decision-making process.

Article by Internet Linguist Gretchen McCulloch, who has been featured on MetaFilter previously (anti-SEO), previously (negation), previously (gender), previously (doggo), and also previously (slashfic tagging). You may also know her from the All Things Linguistic tumblr and the NYT-bestselling linguistics smash hit Because Internet.

Unicode Consortium page about the emoji frequency data.
posted by Not A Thing (38 comments total) 14 users marked this as a favorite
 
Maybe the real technological Singularity is when Unicode emoji combinations go exponential.
posted by They sucked his brains out! at 7:33 PM on November 8


Having emoji in Unicode has always been a bit weird, because the Unicode consortium isn't merely codifying an existing script, but actively creating one as it goes. So unlike, say, Cuneiform or Latin where there's a clear stopping point (the point at which every glyph has been assigned a codepoint), when will the 'emoji' script be 'complete'? It's doubly weird because unlike a real script where you can point to a corpus of text as evidence that a particular glyph exists, the new emoji come into existence by the very act of standardization itself.
posted by Pyry at 7:47 PM on November 8 [12 favorites]


Oh damn they are letting disabled people be represented in emoji, time to get really mad about the selection process.
posted by Space Coyote at 7:55 PM on November 8 [1 favorite]


AFAICT the new disability-related emoji are too recent to be reflected in the data, but there does seem to be an underlying issue that McCulloch doesn't address -- that this kind of usage data, if put to the use she envisions, might skew things (further) against symbols used by groups that are either few in number or underrepresented among smartphone users (or among customers of the particular companies that are contributing data).
posted by Not A Thing at 8:13 PM on November 8 [1 favorite]


when will the 'emoji' script be 'complete'?

There's over a million code points to fill. Does it matter if it ever will be?
posted by Your Childhood Pet Rock at 8:15 PM on November 8 [3 favorites]


Maybe the real technological Singularity is when Unicode emoji combinations go exponential.

And at that point humanity will revert to using actual words as a protest against the oppressive preponderance of emojis, and the cycle start all over again.
posted by Greg_Ace at 9:20 PM on November 8 [3 favorites]


The best time for the Consortium to ban emoji from Unicode was after including the original codepoints as a legacy of Japanese character sets. The second best time is now.
posted by save alive nothing that breatheth at 9:27 PM on November 8 [9 favorites]


It takes tools for me to even attempt to understand WTH is going on with emoji. Next you know, we'll all be communicating via Drawception - Picture Telephone Drawing Game. I'm sure that will make everything much better.
$ xclip -o | raku -ne '.combΒ».uninameΒ».say'
GRIMACING FACE
SPACE
ROLLING ON THE FLOOR LAUGHING
SPACE
SMILING FACE WITH HEART-SHAPED EYES
SPACE
QUESTION MARK
SPACE
SPARKLING HEART
SPARKLING HEART
SPARKLING HEART
EXCLAMATION MARK
SPACE
FILE CABINET
SPACE
RADIO
SPACE
MOBILE PHONE
QUESTION MARK
SPACE
SLEEPING SYMBOL
SLEEPING SYMBOL
SLEEPING SYMBOL
posted by zengargoyle at 10:24 PM on November 8 [3 favorites]


People use raku?
posted by Pruitt-Igoe at 12:17 AM on November 9 [1 favorite]


There's over a million code points to fill. Does it matter if it ever will be?

A million emoji are going to be a pain in the ass to draw, especially if you want to make them all look like part of a set.

I get the feeling the Unicode Consortium was not really set up to be the official emoticon standard.

The best time for the Consortium to ban emoji from Unicode was after including the original codepoints as a legacy of Japanese character sets. The second best time is now.

I feel like the horse has bolted here, in that now that a standard of some kind exists, people are going to try and conform to it even if it's not official or shouldn't be used. ASCII's code points live forever. I think, if you banned emoji, the standard would slowly drift (because a lot of people will not react kindly to someone trying to take their emoji away) and there'd be a lot of frustration that everyone else wasn't conforming to Apple's representation of emoji.
posted by Merus at 1:04 AM on November 9


So there was a study that found that the images reflecting different emotional states are the most popular emojis? Hmm, it sure feels like there just might have been an easier way to figure that out...


(Yeah, I know, emoji is allegedly only coincidentally related to the word emoticon, but c'mon)
posted by gusottertrout at 1:37 AM on November 9 [3 favorites]


I came in to mention Because Internet (which is well worth a read), and then I saw that McCulloch wrote TFA. Which makes perfect sense.

So, I guess my point is: if you're interested in linguistics and/or digital culture, then go read Because Internet. It's about a lot more than just emoji.
posted by escape from the potato planet at 4:07 AM on November 9 [6 favorites]


But, yeah – I do feel like emoji shouldn't be part of Unicode's mission. I have nothing against emoji, but little doodles which have no specific semantic value are a fundamentally different thing than letters, numerals, punctuation, musical and mathematical notation, etc.

Unicode should add glyphs which are already used in the wild. They should not have gotten involved in emoji, where there's an incentive to keep making up new glyphs to add to the standard, so that people can use them. That puts it ass-backwards.
posted by escape from the potato planet at 4:13 AM on November 9 [3 favorites]


I have nothing against emoji, but little doodles which have no specific semantic value are a fundamentally different thing than letters, numerals, punctuation, musical and mathematical notation, etc.

𓂀𓁝𓂺𓀐
posted by Your Childhood Pet Rock at 4:53 AM on November 9 [9 favorites]


It does seem like Unicode has gotten itself stuck between a rock and a hard place.

On the one hand, there's clearly a felt need from users and tech companies for emojis, and in particular more emojis. And to be effective communication tools (i.e. substitutes for gestures), emojis need to be part of a uniform standard shared between vendors and not just something that Apple/Google/Xiaomi makes up as they go. So Unicode is fulfilling a much-needed role in standardizing emojis before they enter actual use.

But on the other, deciding which new symbols to create is almost the opposite of Unicode's core competency -- and that's probably why they've ended up with such nonsensical inclusion criteria. (Google search counts? seriously?!)

I wonder if it could make sense to farm this kind of standardization out to something like an Emoji Consortium? Maybe they could develop a standard interchange system for GIFs too...
posted by Not A Thing at 7:15 AM on November 9 [3 favorites]


I'm with Pyry; with emojis the Unicode consortium is creating a new script, not encoding an existing one. The ship has sailed on this complaint; they deliberately decided to go this direction. I just think it was a mistake.

Also I'm a grumpy old man and kind of hate the idea of emoji, would much rather that there was a more free-form system that got common usage. Something like Slack and Discord's custom emoji, they work well. The key there is anyone can invent an image but its use tends to stay bound within a subculture or community. It works great as a per-server thing in Slack and Discord, but I think you could adapt the idea to something more open like Twitter or Facebook.
posted by Nelson at 7:20 AM on November 9 [1 favorite]


We got 95% of the Internet on UTF-8. Why split it back out again when Unicode can do the job?

It's throwing out the baby with the bathwater.
posted by Your Childhood Pet Rock at 7:29 AM on November 9


A modest proposal: rather than allow the Unicode standardization process to become derailed with which emoji to add, the UC should simply define a set of codepoints for SVG drawing operations, allowing a user agent to encode any imaginable emoji in a text stream.
posted by skymt at 9:10 AM on November 9 [1 favorite]


What about when a device that doesn't support this SVG method encounters one of those emojis? You'd get dozens of squares in a row, one for each drawing command, instead of an emoji.
posted by Your Childhood Pet Rock at 9:24 AM on November 9


Like the 80% of Android handsets on obsolete versions of Android that are out there. Instead of somewhat degrading gracefully with a square it would break the web for these people.
posted by Your Childhood Pet Rock at 9:27 AM on November 9


I used to be an old grump when it came to emoji. But the fact that some people interact with this information through a screen reader changed my mind. Should those people hear "colon dash closing parenthesis" or should they hear "smiling face"?

I have no idea what a screen reader would do if it encountered a bunch of svg-like commands. The nice thing about most current emoji is that they do have some kind of meaning. Considering the diverse ways people may need to consume information, the closer we can be to faithfully representing the meaning itself rather than creating it with syntactic coincidences the better.
posted by Jpfed at 10:46 AM on November 9 [3 favorites]


Should those people hear "colon dash closing parenthesis" or should they hear "smiling face"?

My opinion is that internet screen reader software in 2019 should know what a smiley is. :-) "Colon dash close parenthesis" constitutes an embarrassing mispronunciation of a common "word."

But I've learned not to be surprised by the pathetic lack of effort put into accommodation tools for disabled people by major corporations.
posted by Harvey Kilobit at 11:05 AM on November 9 [2 favorites]


My opinion is that internet screen reader software in 2019 should know what a smiley is.

Apple's Voice Over has pronounced :-) and :) as "Smiley" since at least 2010 and supports smiley, wink, and frown out of the box. It's easy to add additional custom pronunciations (e.g. to make :D "big smile" or whatever).
posted by jedicus at 11:24 AM on November 9 [1 favorite]


A modest proposal: what if we standardized some kind of additional system for annotating or "marking up" text into a kind of higher-level or "hyper" text that would then have a syntax for embedding arbitrary images for example. These images could even live on different servers, and the annotation language could use specially formatted names to "locate" these images (and perhaps other "resources") in a standardized or "uniform" way. I realize this sounds very ambitious, but with modern computers I think it might just be achievable.
posted by Pyry at 12:34 PM on November 9 [5 favorites]


They used to be <IMG> tags on the au network in Japan. They stopped because they realized sending a few hundred bytes of data when they could have sent two was quite ridiculous.
posted by Your Childhood Pet Rock at 1:26 PM on November 9 [2 favorites]


I must admit, I've been quite surprised to see how far to the ends of the Earth some people will go to stop a practical process that's working fairly well because it's just not proper.
posted by Your Childhood Pet Rock at 1:29 PM on November 9 [3 favorites]


Why split it back out again when Unicode can do the job?

I feel like people are not proposing to remove emoji from unicode-the-standard, but to remove responsibility for deciding what emoji to add from unicode-the-organization. Different issue.
posted by vibratory manner of working at 2:40 PM on November 9 [1 favorite]


Hmm that's a good point, a 4-byte emoji is fewer bytes than some hypothetical markup like ":crying_face:". And it's shorter because Unicode is kind of acting as a codebook that turns a long piece of data into a shorter piece of data.

Modest proposal #2 (I realize this one is "out there" but hear me out): What if we had a way to automatically generate a codebook for a given document, or even corpus of documents, that would let us represent those documents in fewer bytes by identifying repeated substrings and replacing them with shorter codewords. Then we could have the benefits of both a flexible markup language and a reduced bytecount for transmission.
posted by Pyry at 2:40 PM on November 9 [1 favorite]


You mean like the private use areas of Unicode?
posted by Your Childhood Pet Rock at 2:52 PM on November 9


Cmd+Ctrl+Space, "FACE DISMAYED BY SNARKY MODEST PROPOSALS THAT IGNORE THE BENEFITS OF SOMETHING THAT REALLY SMART NERDS WHO KNEW ABOUT HYPERLINKS AND DATA COMPRESSION DECIDED WAS STILL USEFUL"

Hm. Guess I'll settle for πŸ™„
posted by tonycpsu at 2:56 PM on November 9


Unicode was a good idea. Even the first Unicode emoji, which were simply standardizing an existing encoding, were a good idea. Is continuing to grow the Unicode emoji set indefinitely, and with an opaque, unaccountable process to boot, a good idea? I don't think so.

Services like Slack, Discord, Twitch, and Mastodon demonstrate that using markup is in fact the better approach, both in terms of technical flexibility, and in terms of giving users the freedom to decide what tiny images they can embed themselves.
posted by Pyry at 3:09 PM on November 9


Services like Slack, Discord, Twitch, and Mastodon demonstrate that using markup is in fact the better approach

Up until the point where someone tries to copy and paste the text.
posted by Your Childhood Pet Rock at 3:30 PM on November 9


When you copy text it turns the emoji back into the markup, so you get something like "here is a crow, which is woefully missing from Unicode: :crow:". When you paste that text, the markup turns back into the image again. I don't see the issue. If you paste into a service that doesn't have markup, then at least you get a semi-useful text tag like ":crow:" instead of a character-not-found box like you would if your font didn't have that particular emoji.
posted by Pyry at 3:40 PM on November 9


But then every service has its own markup, and you have to remember which one has :crow: and which one has :raven:. Unicode support for a given emoji might be universal when it's first introduced, but over time, assuming people keep up with OS updates, it is.

And how are services like Slack and Twitch any more accountable or transparent here? Unicode Consortium is a bit less open than, say, IETF, but they still do publish their standards, and there's advance notice when new stuff is coming. I don't remember Slack telling me about their new markup things, or offering me a chance to vote on them. And if I did, I'd only be changing it for Slack, one service out of many.

Unicode is far from perfect, but you really seem to be downplaying the disadvantages of markup here.
posted by tonycpsu at 3:52 PM on November 9


What if we had a way to automatically generate a codebook…

If you’re feeling deflated, then gee, zip itβ€”don’t get in a huff, man.
posted by musicinmybrain at 6:34 PM on November 9 [1 favorite]


Slack doesn't have to be accountable; users define custom emojis on a per-Slack basis. Say we had a Metafilter Slack. Any authorized user (possibly any user at all) can add a :plate-of-beans: custom emoji. The downside is it only works on that Slack. The upside is every Slack community has its own set of customized images.

I'm not saying it's a perfect solution, Unicode emoji are amazingly good for their universality. But precisely because they have to be universal, there's arguments about which emoji should be in the one codeset we all share, and whether they're boring, and whether they're inclusive, and... Slack's custom emoji allow users to do a different thing with different tradeoffs. Even better, they coexist nicely alongside Unicode emoji. It's a neat system.

The hard part is generalizing it. Slack's custom emoji work precisely because they are per-server things. I don't know how you'd do something similar in Twitter. You could go full "inline custom images in your tweets", but that lacks the brevity of emoji.
posted by Nelson at 7:33 PM on November 9


> What if we had a way to automatically generate a codebook…

My own Modest Proposal, although I intentionally omit Emoji because I couldn't work out how to talk about them without unacceptable negativity: OmniCode
posted by nickzoic at 2:19 PM on November 10


OmniCode's cool and all but drawing lines is hard.. what if we replaced emoji with a system where you could have a two dimensional array of pixel elements, each set with 24 bit color and 8 bits of alpha channel?
posted by Nelson at 4:10 PM on November 10 [1 favorite]


« Older My Asian Mom Bought Me a Blonde Wig   |   Take it to the table Newer »


You are not currently logged in. Log in or create a new account to post comments.