June 27, 2000
6:24 PM   Subscribe

I've never actually seen this in use (probably because I'm totally amerocentric in my browsing), but apparently you can get some wäĉkŷ characters in your domain name if you want to, assuming you want a dot-nu domain. If your browser speaks Japanese, you can even have a kanji domain. That's pretty neat.
posted by endquote (12 comments total)
Neat? I'd rather call it pretty unpractical. I mean, the web was meant as a way to make information accessible, right? I doubt using weird charactersets in you domain-name helps. I understand some people want to use their native language on the web (Well, actually I don't, but we'll discus that some other time), but this is ridiculous.

(not to mention the fact that it's confusing the hell out of my bind-utils).
posted by fvw at 6:48 PM on June 27, 2000

And, indeed, it violates RFC 1035.


Ok. RFC 1035 doesn't *quite* say it's illegal to use non-ASCII characters. It just says your a damned fool if you do.
posted by baylink at 8:04 PM on June 27, 2000

fvw, "weird" character sets are a language-specific concept. To Americans, björk (Swedish for birch, and a common family name) looks like it has a "weird" character, but I assure you that to a Swede, the word bjork is equally weird. It just isn't an "o". (But I guess you understand that well enough, since you point out English isn't your native tongue.)

Of course, they're used to it, but then they put up with a lot from an anglocentric world.

Anyway, there has been a furious push to internationalize internet standards that were mostly written by English-speaking Americans back in the day. In particular, there's been an International Domain Names working group working through the IETF for a while now, and they've reached the point of an internet working draft on international characters in the DNS system -- one step short of an RFC standard. As Jay found, RFC 1035 doesn't explicitly forbid such characters (and it seems that many DNS systems support them anyway).

The key issue that's been a problem is that a simple extension from UTF-7 ASCII to UTF-8 ASCII (i.e. including high-bit characters) breaks CaPiTaLiZaTiOn, that is, leading to ibm.com and IBM.com being two different domains. To solve this, they actually had to come up with a new UTF-5 encoding standard that allows high-bit characters but not capitals. (Actually, it's Unicode, not ASCII, but English Unicode maps close enough to ASCII anyway that it's a convenient shorthand.)

So, this will be solved shortly. Meanwhile, if you're so concerned about interoperability, how come you block IE browsers at your home page? :-P

posted by dhartung at 1:10 AM on June 28, 2000

Of course, I should probably point out that not everyone WANTS to keep all their legacy alphabets. Germany, for instance, trimmed spelling rules from 212 to just 112, and virtually eliminated the ß character (replaced by ss in most cases). The Russian Cyrillic alphabet was simplified in 1917, and Swedish itself was simplified in 1906.

Aldho we culd uze sum chanjes in English az weL.
posted by dhartung at 1:17 AM on June 28, 2000

this is an interesting topic which came up in the typographic community not too long ago. the whole thing got divided into two camps. one said that including characters such as the c with cedilla and esstet were outdated conventions which should probably discarded, since they're legacy forms from way back when the roman alphabet got splooged into typographic systems. the other camp says that disincluding such characters was no less of a linguistic splooge, and in fact was a form of bigotry.i'm more partial to the second argument, especially since these characters indicate native sounds which aren't duplicated in english.but god, it'd be nice to be able to create a font someday and NOT have to create 256 custom drawings of letters. until then, i'm more than happy to support non-english cultures.it may be impractical to english-speaking surfers to use such character sets, but i'd rather see respect paid to liguistic legacies besides our own.
posted by patricking at 9:15 AM on June 28, 2000

I caught enough grief about the underscore in the URL for my old weblog at [http://c_harmful.pitas.com]. Seems that some browsers, firewalls, etc. choked on that minor a variation from letter/numbers/hyphens.
posted by harmful at 9:23 AM on June 28, 2000

dhartung: Blocking IE is not the same. It is a conscious decission, where I feel it is worth the loss of interoperability.... Now if I started blocking a random browser each day... (That would be fun actually. I'll have to implement that this weekend :-) ).
posted by fvw at 10:15 AM on June 28, 2000

I'd say that ASCII-centrism is a kind of bigotry: I was thinking of the way that Atatürk single-handedly marshalled the conversion of the Turkish language from Arabic script to a modified Roman alphabet in order to improve literacy across his fledgling state... dump the legacy system when it doesn't suit the language, but don't be confined to existing alternatives.

Unicode ain't gonna happen soon, and I can't see IPA becoming the world's alphabet, so until then, it's important to acknowledge that "é" isn't "e", "ñ" isn't "n", and "æ" isn't "ae" for the people who matter.
posted by holgate at 11:01 AM on June 28, 2000

apparently adobe's work on unicode font standardization has stalled *again*, which is hardly a surprise...they've been banging at it with microsoft since 1996. and they've not made the discussions open, so that means there'll be (of course) proprietary adobe creation tools which (of course) don't work very well. i can hardly wait.
posted by patricking at 11:40 AM on June 28, 2000

One blessing, perhaps, is that Linux is making inroads in China. With that kind of critical mass, I18N efforts may be appropriated and pushed on by the people who need unicode support on a day-to-day basis.
posted by holgate at 2:49 PM on June 28, 2000

With more characters available the number of possible urls becomes even larger. Most likely it will never exhaust itself -- the number of combinations is too great -- but increasing the infinite number of possible url's is like a story from Barthes. The library is limitless -- only in as much as we can catalog it.
posted by birgitte at 4:10 PM on June 28, 2000

If anyone's still interested, here's what my friend at .cc had to say...

Technically it is rather simple, as they are expanding the namespace from 7 bits to 8 bits. Browsers and email clients, which represent 95% of DNS traffic are predominantly compatible with this, but the issues arise
where traffic may flow across older legacy systems that are internet connected (older machines / OSes - Stratus VOS, DEC VAX, IBM 36 & 38, etc -- generally at educational institutions) may not process namespace > 7 bits per char.

There are also litigious TM issues that arise from it, like the need for companies to protect their TM by registering the resulting exponential variants:

posted by endquote at 5:21 PM on June 28, 2000

« Older the don rickles of the web speaks   |   Newer »

This thread has been archived and is closed to new comments