I could probably fake fluency in Bengali
December 1, 2004 9:05 PM   Subscribe

I never realized how great Wikipedia was for quick-and-dirty guides to languages. For example, did you know that Esperanto uses affixes to cut the number of adjectives one must learn in half? Or that Finnish has fifteen noun cases, including six locative declensions? Or that Vedic Sanskrit was tonal? How about that Cherokee verbs each have 21,262 inflected forms? I could play with this forever.
posted by borkingchikapa (35 comments total) 1 user marked this as a favorite
They even have the wiki itself in Cherokee. (It doesn't look right though, because Windows doesn't ship with a Unicode-compatible Cherokee font.)
posted by smackfu at 9:16 PM on December 1, 2004

I should take this internet moment to confess: every night I get drunk, high and I read Wikipedia for hours.
posted by orange clock at 9:21 PM on December 1, 2004

I'm glad I'm not the only one, OC. Thanks for sharing.
posted by gwint at 9:40 PM on December 1, 2004

Can someone explain to me why Wikipedia is so reliable, given that anyone can edit most pages? And is it in fact as reliable as it gives the impression to be?
posted by Pretty_Generic at 9:56 PM on December 1, 2004

posted by Pretty_Generic at 9:57 PM on December 1, 2004

Yes, this is fascinating stuff. Thanks, borking.

As for Finnish having fifteen noun cases, the ones listed in the table look more like suffixes to me. The noun ("talo" in the table) doesn't seem to be inflected at all. So it's apparently more like suffixes as in other agglutinative languages like Hungarian, Japanese etc.
posted by sour cream at 9:59 PM on December 1, 2004

Pretty_Generic, it seems to me like there's a fairly large Wikipedia community dedicated to keeping it reliable and "balanced". Check out nearly any article about current events (specifically the Iraq war), they're constantly having their neutrality disputed. I'm sure there's some incorrect information in there, but I've never noticed any while reading about something I knew.

Sour cream, as far as Finnish noun declension goes, I would argue that they are in fact true cases. Look at the Wikipedia entry for inflection - "talo" is indeed inflected, but weakly. A declension doesn't necessarily have to change the root of the noun, it doesn't in Arabic or Esperanto (though I believe it usually does in German). Japanese doesn't have anything as far as noun suffixes/affixes or cases go, only particles. I can see where you're coming from though, since the particles usually can't be seperated from their nouns. "Watashi wa mizu to biiru o nomimashita," "I drank the water and the beer," can be written (weirdly) "Mizu to biiru o watashi wa nomimashita" and still make sense, because the particles don't move relative to their noun. I don't know anything about Hungarian, but I assume it's the same deal as Finnish considering that they're at least distantly related.
posted by borkingchikapa at 10:31 PM on December 1, 2004

"Wikipedia" is not a single thing, it's 400,000 different articles of varying quality. Some are excellent, some are crap.

It's reliable because the effort to vandalize a page exceeds the effort to fix a vandalized page. And the ratio between "vandals" and "fixers" is at least 10 to 1 if not more. Thus, they make very little impact, most vandals are fixed in hours, if not minutes.

Also in this day and age we have a lot of "proams" which means professional amateurs .. people who know something at a professional level but pursue it as a hobby. The open source software movement is one example. It's a result of changing demographics as people have more free time, live longer, have access to better quality information and tools, better education, etc..
posted by stbalbach at 10:34 PM on December 1, 2004

Actually, Wikipedia is a single thing. The fact that article quality, accuracy and comprehensibility, varies wildly is indicative of the current underlying process. Ultimately, Wikipedia presents(or atleast ought to) a singular image as a successful open-source reference, with the emphasis on the alleged superiority of the [many eyeballs --> bugs shallow] paradigm. If users have to deal with uncertain and varying accuracy, then it fails its purpose as an assembled collection of information and knowledge.

It's my opinion that Wikipedia's appointed management, could use an audit to ascertain the quality of the project. My rough idea is, pick the 10 most popular articles, 10 random articles of moderate-to-high traffic, 10 random articles of low traffic and then do a compare/contrast against 'reputable' references. Then, check those references (and Wikipedia) against primary source references (if they exist, like journals/textbooks, for medical facts..etc). It will a better and quantified metric of the quality, acting as a rough indicator of where Wikipedia stands.
posted by Gyan at 10:53 PM on December 1, 2004

I knew that Basque was a language isolate, with no known living genetically related languages. I didn't know that Japanese was too.

There is also some cool linguistic experimentation with a constructed "woman's" language called Láadan. (Though most of the interesting stuff is only accessible through external links.)
posted by ontic at 10:59 PM on December 1, 2004

Gyan, it's been done, sort of, recently. Related /. thread.
posted by ..ooOOoo....ooOOoo.. at 11:17 PM on December 1, 2004

borkingchikapa: Yes, I was thinking of Japanese in particular. If Finnish has 15 noun cases, shouldn't Japanese also be considered to have 20 noun cases (or whatever the number of noun-modifying particles is in Japanese)? How are the Finnish post-positions different from Japanese particles?

ontic: Well, there's a lot of discussion going on about Japanese being a language isolate, like the Wiki article says. Interestingly, Korean grammar is amazingly similar to Japanese grammar, yet the vocabulary is completely different (if you discount loan words from Chinese). It's almost like someone stripped away the words to leave only the grammar and then filled in new words.
posted by sour cream at 11:53 PM on December 1, 2004

Wikipedia:Replies to common objections.

After making the odd IP-only edit for a couple of years, I finally created an account and now contribute something daily. (Sorry, MeFi.) It's amazing how easily a bad article can be improved. There are loads of crews devoted to clean-up duties (usually after getting bored with the research for new articles), from fixing grammar and punctuation, to reorganizing, to improving articles from "good" to "great". Once you get into the community spirit there's a great team effort and pride in accomplishment. That isn't to say the site is perfect; I was disappointed at yesterday's front-page feature due to the C+ grammar, and I've already gotten myself up to the neck in a point of view dispute which I won't link to here. I think the site needs a team that will actually go over the writing of articles, once they're at a certain point, and make them sound less committee-voiced. Right now the site is considerably heightening the emphasis placed on sources and attributions. But yes, it's easier to "revert" an article than to vandalize it in the first place, and just changing the text back and forth is severely frowned upon; habitual vandals get banned by IP if necessary. And the improvement in quality over the last year and a half has been tremendous -- I certainly never expected it to start getting this good, this fast, but it has. In the end, Gyan, Wikipedia will by its convenience, breadth, and depth more than meet the principle of good enough test, even if individual articles and factoids have problems.
posted by dhartung at 12:29 AM on December 2, 2004

Some nights, I get drunk and edit Wikipedia for hours...then I do it again at work, when I'm not building the corporate wiki my boss is in love with.

How does everyone else explain Wikipedia to the uninitiated? I keep trying, using punchy marketing-speak ("If you can do email, you can do the Wikipedia") or geeky word-mounds ("It's an open-source collaborative knowledge management and distribution system...").

But once I get them to try it, they go as nutty for it as I am...even my completely non-technical parents keep talking about it.
posted by paul_smatatoes at 12:42 AM on December 2, 2004

How are the Finnish post-positions different from Japanese particles?

Do the Japanese particles ever change the form of the word they modify? Certain Finnish cases do, one of the most trivial being pankki 'bank' vs. pankissa 'in a bank'.
posted by oaf at 1:09 AM on December 2, 2004

Do the Japanese particles ever change the form of the word they modify? Certain Finnish cases do, one of the most trivial being pankki 'bank' vs. pankissa 'in a bank'.

Japanese particles don't modify nouns, no; they're postpositional, and postpositions don't cause nouns to change. Osushi is the same whether it's "osushi o taberu" or "osushi ga suki da" (I eat sushi and I like sushi, respectively).

The Esperantist in me did a little jig here, just because any exposure the language gets is a good thing from my point of view. The Esperanto Vikipedio (here) now has over 18,566 articles, including the only one on Zamenhof's philosophy of Homaranismo (which means "Human-group-member-ism" using the affixes).
posted by graymouser at 2:51 AM on December 2, 2004

This is so good. Thanks so much borkingchikapa. I've just got back from 6 months Japan which gave me the opportunity to learn a second language properly for the first time in my life. For the last couple of weeks I've considered studying linguistics and this site has opened up a whole new world for me.
posted by Jase_B at 3:37 AM on December 2, 2004

Finnish has fifteen noun cases, including six locative declensions?

This is an oft-cited statistic aimed at making Finnish seem scary and complex (and Hungarian, likewise), but it's silly. I mean, is remembering '-n, -ssa, -sta' harder than remembering 'into, in, out of"? (Hint: no, not really).

The hardest thing about Finnish is that it's changed a lot over the years, leaving as artifacts a lot of irregularities in the ways endings go onto words. Hungarian has been stabler longer and seems to have relatively many fewer exceptions. Another Finnish idiosyncracy is the existence of multiple infinitives for a given verb.

The '15' cited are all endings you have to learn how to stick onto a word. Finnish also has postpositions which come after the word unattached (though they require the word to be in a particular case):

pankissa, in the bank (locative case ending)
pankin takana, behind the bank (genitive + postposition)

Cherokee verbs each have 21,262 inflected forms
I can't track down the table I'm looking for but I'm guessing that's still a drop in the bucket compared to the Georgian verb.
posted by Wolfdog at 3:51 AM on December 2, 2004

The only Wikipedia article I have ever been disappointed with is Dentistry. I chalk it up to the universal hatred and fear of dentists.
posted by blacklite at 4:24 AM on December 2, 2004

sour cream, adding on to what Wolfdog said, they are not just mere suffices but real declinations; the word talo was probably chosen for the example just because it is simple and does not confuse the learner with the inflection. But there are quite a few words that really do change their root form, let me know if you want me to dig those up.
posted by keijo at 5:36 AM on December 2, 2004

For the most part stems don't change too much when you add endings; "consonant gradation" is common, but predictable, like in my example: pankki is the nominative, but some endings force the -kk- to become a single -k-, as in pankin.

There are also some oddities involving final -i and -e; for example, ovi, door; oven vieressä, by the door.

Then again there are some truly irregular stems, like mies + n = miehen.
posted by Wolfdog at 5:49 AM on December 2, 2004

Yesterday I attended a talk given by Jimmy Wales, wikipedia's founder, and when asked that exact same question he said they couldn't guarantee (nor pretended) that wikipedia's content is accurate -see their disclaimer - but the constant "peer review" encouraged by the wiki format means quality keeps improving all the time and changes and corrections can be made in matter of hours unlike any other encyclopedia.

He jockingly added that wikipedia's disclaimer is not unique and many well known encyclopedias have similar disclaimers...

On the other hand he pointed out that more difficult to tackle than accuracy -and therefore more of a worry- is the "systemic bias" caused by the similar profile of the majority of contributors: western, computer-savy, in their 20s-30s, male.
posted by blogenstock at 6:27 AM on December 2, 2004

The other thing with wikipedia is that you think you're off in a tiny corner of it changing stuff and nobody is watching. But there are lots of people watching the "most recently changed pages" list looking for blatant abuse. So you can't hide.
posted by smackfu at 6:47 AM on December 2, 2004

Esperanto uses affixes to cut the number of adjectives one must learn in half

posted by alasdair at 7:04 AM on December 2, 2004

Another language in the same vein as Láadan is Lojban: the logical language. It was also developed for testing the Sapir-Whorf hypothesis, but without special concern to women. It's pretty interesting, kind of like newspeak, actually - there's only 1300 root words. It also has special words called attitudinal indicators which specify how you feel about what you're talking about.
posted by nTeleKy at 7:38 AM on December 2, 2004

Esperanto uses affixes to cut the number of adjectives one must learn in half


It's actually been documented that Newspeak was based on Basic English and not Esperanto per se. Now, the proponents of Ido have always been critical of forming adjectives from affixes (like malbona, malgranda, malsana for bad, small, and ill), but it makes sense from a learning-unit standpoint. There was actually a study that showed students could recall the Esperanto words much more easily than with Ido equivalents like mala, mikra, etc., but I don't know where I saw it cited. In all, Esperanto is pretty straightforward and fairly easy to read once you get through the big hurdles.
posted by graymouser at 7:46 AM on December 2, 2004

There are 27 Cherokee words for "Jeep"--all of them derisive.
posted by weapons-grade pandemonium at 8:50 AM on December 2, 2004

If users have to deal with uncertain and varying accuracy, then it fails its purpose as an assembled collection of information and knowledge.

First off, that is not Wikipedias purpose. Second off, every source has the problem you describe. Third, the same argument could be made about any collaborative effort. Basically you seem to making the "authority" argument that Wikipedia has no authority, unlike for example something published by Harvard University Press which you can trust to be accurate because it came from Harvard. There's problems with that position as well.
posted by stbalbach at 9:08 AM on December 2, 2004

stbalach: From Wikipedia: It is, or aims to become, primarily an encyclopedia. From the encyclopedia link: An encyclopedia (alternatively encyclopædia) is a written compendium of knowledge. I know, this is getting kinda anal, but that is Wikipedia's purpose, as stated.

Second off, every source has the problem you describe. Third, the same argument could be made about any collaborative effort.

The other sources (attempt to) have quality-control over inputs and have a peer review of submissions. Wikipedia doesn't have the first and the second is contingent on number of (knowledgable) eyeballs that fall on the article. Wikipedia is a decentralized, many times 1-pass collaboration. The popular, high traffic articles will tend to be much better, but most of the articles aren't. Make no mistake, I support Wikipedia and have contributed/referred a lot. I just think an audit, like the one I suggested, will clear the air on how it really compares.
posted by Gyan at 9:40 AM on December 2, 2004

i have seen some wikipedia articles that are guilty of the sin of ommision : giving some pieces of information, but leaving out others. however, this is of course only when i go looking for things i already know about [because if i didn't know about it, how could i tell something important was missing?] i don't trust it entirely, but it's certainly more than reliable for a starting point on a topic.
posted by grapefruitmoon at 12:13 PM on December 2, 2004

Wikipedia itself is perfectly clear on its unreliability:
Equally, we have articles that are stubs, that are inaccurate, misspelt, biased, poorly written, or just plain rubbish. That comes with our ambitious goals, and the way we work. And on many of these articles, such as stubs, we do actually have under construction signs!

Wikipedia is both a product and a process. Even if the product is not yet perfect, the process ensures that at the end of every day, the encyclopedia is higher quality than it was at the beginning of the day. That doesn't ensure we will eventually attain perfection (if such a thing is even possible), but it's something to believe in.
(From dhartung's "common objections" link.) I find it a useful resource, but I think of it as a well-informed but often unreliable blog rather than as a reference source. Sure, all references have mistakes and omissions, but a printed encyclopedia is orders of magnitude more reliable than a wiki, and probably always will be.

Gyan, I don't really see how the audit thing would work. How will it do anything other than improve the quality of 30 articles? Yeah, if you find fewer mistakes six months later in another batch of articles, that's a good sign, I guess, but it doesn't do anything to help the site in and of itself.
posted by languagehat at 1:44 PM on December 2, 2004

languagehat: How will it do anything other than improve the quality of 30 articles?

If you can find systematic types of errors or deficiencies, it helps you identify them. It also makes explicit the kind of non-error differences between WP and paper refs. Other than that, it's just meant to be an assessment tool (for the management). Given good results, it can be an advertisement as well.

Sure, all references have mistakes and omissions, but a printed encyclopedia is orders of magnitude more reliable than a wiki, and probably always will be.

That's the thing. The aim is to make the above statement (gradually) false. A systemic analysis of a pseudorandomized sample can't hurt.
posted by Gyan at 2:08 PM on December 2, 2004

Gyan, you could do the audit. Therein lies the diffrence that, with time, could make WP as good if not better than traditional works.
posted by stbalbach at 9:02 AM on December 3, 2004

stbalach, I half expected that suggestion. I suppose I could it over the winter break. I'd need to pitch the idea to WP, in order to get the list of articles. I'd need someone to tell me of 3 'reputable' sources (I can think of ??Britannica, World Book & Encarta??). It seems my local lib has all of them. I'll think about it. It seems like a lot of reading work for one person alone (barring mad grad studs.).
posted by Gyan at 11:29 AM on December 3, 2004

It seems a German computer magazine conducted a similar audit to what I proposed. Archived Wikipedia mailing list discussion thread here.
posted by Gyan at 9:08 PM on December 23, 2004

« Older Annoyed by the oven mitt   |   All I Want For Xmas Is An Anthropomorphic Humanoid... Newer »

This thread has been archived and is closed to new comments