Join 3,512 readers in helping fund MetaFilter (Hide)


Linguistic Time Travel
February 11, 2013 9:10 PM   Subscribe

"The discovery advances UC Berkeley’s mission to make sense of big data and to use new technology to document and maintain endangered languages as critical resources for preserving cultures and knowledge. [...] it can also provide clues to how languages might change years from now."
posted by batmonkey (21 comments total) 21 users marked this as a favorite

 
Just out of curiousity, have they tested this with known languages and proto languages (ie, reconstructed latin from the romance languages?)
posted by empath at 9:42 PM on February 11, 2013 [1 favorite]


A cromulent discovery that will certainly embiggen the corpora. Hopefully, they'll reconstruct Kwyjibo!
posted by not_on_display at 9:57 PM on February 11, 2013 [1 favorite]


I too am wondering how much massaging the data going in requires, since from the article it looks like they're just feeding it cognate sets, and haven't even done the sort of testing empath asks about. "Future plans" include North American languages, and the whole thing that's being reported is a first pass through Austronesian, a language whose reconstruction is already well-documented and particularly well-suited to independent branching changes, as the geographical distances involved between, say, Hawaii and Samoa prevent a lot of linguistic contact.

So they've developed algorithms for reproducing proto-Austronesian in a way that is...85% accurate to what we have now? I hate to sound dismissive, but would be nice to read the article instead of the breathless press release.
posted by Earthtopus at 10:01 PM on February 11, 2013


Hey, I saw Cloud Atlas, I know in the future we'll all speak like Forrest Gump!
posted by Catblack at 10:29 PM on February 11, 2013


EGO operor non votum meus lingua muto. Gratias ago vos summopere usquam.

.. ..-. -.-- --- ..- -.- -. --- .-- .-- .... .- - .. -- . .- -. .-.-.- .- -. -.. .. -... . - -.-- --- ..- -.. ---
posted by Splunge at 10:41 PM on February 11, 2013


The true test of its mettle will be if it can help Young Earthers scientifically prove the existence of the Tower of Babel.
posted by XMLicious at 10:42 PM on February 11, 2013


Nos adhuc non curo loqui ad professor Hawking.
posted by Splunge at 10:49 PM on February 11, 2013


Sadly, even Big Data can't reconstruct actual science from Science Reporting.
posted by b1tr0t at 11:03 PM on February 11, 2013 [2 favorites]


The paper in question: Automated reconstruction of ancient languages using probabilistic models of sound change.
posted by RichardP at 11:14 PM on February 11, 2013 [3 favorites]


I too am wondering how much massaging the data going in requires, since from the article it looks like they're just feeding it cognate sets

Nope:
Cognate Recovery. Previous reconstruction systems (22) required
that cognate sets be provided to the system. However, the creation of these large cognate databases requires considerable annotation effort on the part of linguists and often requires that at
least some reconstruction be done by hand. To demonstrate that
our model can accurately infer cognate sets automatically, we
used a version of our system that learns which words are cognate,
starting only from raw word lists and their meanings. This system
uses a faster but lower-fidelity model of sound change to infer
correspondences. We then ran our reconstruction system on
cognate sets that our cognate recovery system found
posted by empath at 11:20 PM on February 11, 2013


I am a linguist. I know someone who works on this type of automated reconstruction. It took him and his project team 10 years to construct the sort of cognate set compilation that they "just feed it". Then it only took the program a few hours to produce reconstructions, so a win I guess...

There are people doing this sort of reconstruction without first carefully checking and pruning and partially reconstructing the cognates in the way quoted by empath, but general consensus is that their systems pretty much produce junk.
posted by lollusc at 11:26 PM on February 11, 2013 [1 favorite]


Clever to call it a "linguistic time machine", though! That's the sort of approach that keeps the grant funding rolling in.

(Just ask my husband, who works on "quantum teleportation". [Spoiler alert: has nothing to do with teleportation.])
posted by lollusc at 11:29 PM on February 11, 2013 [1 favorite]


There are people doing this sort of reconstruction without first carefully checking and pruning and partially reconstructing the cognates in the way quoted by empath, but general consensus is that their systems pretty much produce junk.

That quote was referring to this system. They did not use cognate sets, they used raw word lists, and reproduced cognate sets with 85% accuracy.
posted by empath at 11:46 PM on February 11, 2013


Now they're reconstructing language sir. First the purple desert orbs and now the language. They're progressing faster than our estimates...

You have no faith in the plan.

I have faith in the plan, but the boundaries must remain intact.

What do you know of boundaries?

I...I...am frightened.

The plan pushes all of us beyond fright. Beyond boundaries. Do not call again.

I...I...
posted by Lipstick Thespian at 4:48 AM on February 12, 2013 [1 favorite]


Oh, wow, thanks, RichardP! I looked for that last night and didn't find it, but was admittedly exhausted and shouldn't even have been up.

I know what I'm reading for lunch today!
posted by batmonkey at 7:55 AM on February 12, 2013


i'm wondering how many years it will be until sports arena announcers start the night with, "good evening, bitches, and welcome to..."
posted by joeblough at 9:55 AM on February 12, 2013


85% accuracy (for a definition of accuracy including "within one segment of the right answer"... but only 62% recall. And they do address my concern about not all language families conforming to the phylogenetic tree model, if only to say it might be helpful anyway.

It would be nice to see the results of feeding it closely-related languages, or languages with borrowings, or creoles, in order to see its actual limitations, as opposed to just discussing the hypothetically. I suppose it's just jealousy at not having the database access to play with word groups of this size...

I agree that the automation of cognate assignment is an important one, and hope it can be refined to capture more cognates more accurately, even in situations of wildly divergent surface forms. Thanks for the actual paper. It would have been nice to have this first rather than the science journalism, which triggers GRAR in me.
posted by Earthtopus at 9:59 AM on February 12, 2013


sorry. I was excited, and I didn't find the paper in my first searching. if a mod wants to delete for a better, less angry-making post by someone else, that would be fine with me.
posted by batmonkey at 10:08 AM on February 12, 2013


That came out harsher than I intended it. I'm happy to have been pointed at the paper, and meant no anger in your direction. Thanks for the article, without which I wouldn't have gotten to read their paper, which I do hope lives up to at least some of the hype!
posted by Earthtopus at 12:09 PM on February 12, 2013


>I suppose it's just jealousy at not having the database access to play with word groups of this size...

Now you can, Earthtopus. They used the publicly-available Austronesian Basic Vocabulary Database.
posted by fontor at 5:27 AM on February 15, 2013


well, then it's just a matter of funding and spare time, I suppose.

...crap. ;)
posted by Earthtopus at 12:31 PM on February 19, 2013


« Older Remember that replica of HMS Bounty that went down...  |   Neal Goldman and his company ... Newer »


This thread has been archived and is closed to new comments