strawberry in Kanji?
March 8, 2006 7:31 AM   Subscribe

A history of computer character sets in Japan JIS X 0208 (originally JIS C 6226) of 1978 was the first JIS character set to include kanji. It specified 6,335 kanji, arranged by frequency into two levels ... Many bizarre mistakes were made in transcribing names, resulting in several new kanji coming into existance.
posted by delmoi (14 comments total)
I knew there was a lot of history behind it all but didn't actually know what it was. Thanks.

Since this liberalization, hundreds of new name kanji have been suggested ('strawberry' seems to be a common one).

I hope they bring out official kanji for all the rediculous katakanaised foreign loan words that are absolutely everywhere. They, IMO, cause the most headaches when trying to read Japanese.
posted by Jase_B at 8:01 AM on March 8, 2006

Fascinating example of how many levels of indirection arises to describe what initially seems a simple problem.
Unicode was generally built on the following paradigm.

One Character has many Glyphs

However, for Han characters it is often better to think of it this way:

One Character has one or more Variants, which have many Glyphs
posted by orthogonality at 8:01 AM on March 8, 2006

This kind of esoteric techno-ridiculosity makes me very happy. Thanks.
posted by Floach at 8:39 AM on March 8, 2006

I don't know much about Japanese, or character sets, or unicode. But this was fascinating. Thanks.
posted by klangklangston at 8:41 AM on March 8, 2006

Yeah I've been spending hours lately trying to organize my mp3s, a lot of which are encoded with Shift-JIS. I've been using this program mp3tag to "convert codepage" which makes the tags readable in Media Player, Tag & Rename, and Quintessential, but not Winamp or Foobar2000. I'm not even sure what format my mp3 tags are now, if it's UTF-8 or UTF-16 or some mix. Anyway the post is informative to why everyone doesn't just use Unicode.
posted by bobo123 at 10:32 AM on March 8, 2006

oh man, this is great. thanks. i have several shift-jis keyboards for testing various things. now this gives me something to point people at and say go read, instead of trying to endless explain this crap.
posted by 3.2.3 at 10:37 AM on March 8, 2006

People who are into this sort of thing may want to download the cjk.inf file created by Ken Lunde. Ken is the authority on Asian script encodings. If you're really into it, buy his book "Understanding CJKV information processing."
posted by adamrice at 10:43 AM on March 8, 2006

"V" for Vietnamese, adamrice? I thought Vietnamese was written in the Roman alphabet nowadays, albeit with loads of diacritics. Or does the "V" stand for something else? (Not quibbling, just curious. This is neat stuff.)
posted by nebulawindphone at 11:29 AM on March 8, 2006

Yes, V for Vietnamese--it's true that it is written in the Roman alphabet now, but it used to be written in Chinese characters.
posted by adamrice at 11:44 AM on March 8, 2006

Thanks for posting this, I've been dealing with Japanese encoding issues at work lately. Much more informative than staring at character sets over at
posted by A dead Quaker at 4:01 PM on March 8, 2006

Wow, great stuff! (I missed this originally, and was directed to it by No-sword.)

I have one question. This sentence has been cut off at a crucial point:

[hentaigana] are absent from Unicode (and it would be very complicated to add them), but even if they were present there would still be

Anybody know (or have an educated guess) how that should go on from there?
posted by languagehat at 8:42 AM on March 12, 2006

LH--Why don't you e-mail the author? My guess is that he might have meant "difficulties in processing hentaigana input" or possibly "a strong case that modern Japanese can be fully represented in Unicode" (I've been translating Japanese since '89, and only learned of the existence of hentaigana this year, doing a little exploring into the history of Japanese orthography).
posted by adamrice at 3:32 PM on March 12, 2006

Good idea. I thought at first that the author was hiding (no byline), but now I notice the "e-mail me" bit.
posted by languagehat at 4:48 PM on March 12, 2006

'Hentaigana' sounds rather perverted... :P
posted by delmoi at 7:19 PM on March 12, 2006

« Older Internet Success Story: Clap Your Hands Say Yeah   |   Expat Power? Newer »

This thread has been archived and is closed to new comments