Thirty crore speakers, seven lakh words
March 31, 2015 11:16 AM   Subscribe

A group of volunteers has broken the record for most words added to Google Translate in a day.

Bangladesh's Ministry of Information and Communication Technology (ICT) coordinated with Google to organize a one-day event, timed to coincide with স্বাধীনতা দিবস (Bangladesh's independence day). Over 4,000 volunteers added 700,000 Bengali words, handily exceeding their target of four lakh (lakh = 100,000). Bengali counts around 30 crore speakers around the world (crore = ten million) and is the official language of Bangladesh.

Google Translate runs on Google's proprietary statistical machine translation algorithms but also relies heavily on volunteer assessments via a community site, http://translate.google.com/community, where anyone can contribute translations or check existing ones.
posted by tractorfeed (12 comments total) 7 users marked this as a favorite
 
Yay for the voluntariat!

Some days, I feel like the metaphorical frog in the pot.
posted by NoxAeternum at 11:20 AM on March 31, 2015 [1 favorite]


You know, the modern world is miraculous. There was a time when we had to invest money in having our military actually enter faraway nations and force their leaders to capitulate. By the 1980s and 1990s, we'd gotten to the point where we figured out that you could just pay organized gangs to do it for you, and we didn't have to directly use our military much anymore at all for that sort of thing. Up to a few years ago, it was apparent that if you forcibly repress small nations and then "kindly" offer the destitute poor a few cents per hour to do arduous labor, they'll pretty much be forced to. But now, we don't have to involve the government or sweatshops in any way, and it's easier than ever before – we can use the magical sheen of a brave new world of cooperation and friendship to convince foreign workers to make money for our companies without any compensation whatsoever, and they'll be so happy to do it that they'll actually celebrate while they're doing it. The amount of progress we've made in forty years is truly astounding.

God bless America.
posted by koeselitz at 1:42 PM on March 31, 2015 [7 favorites]


(Sorry if I seem pessimistic. I truly hope this helps Bengals everywhere. If so, it would be wonderful. I've just gotten to the point where I don't trust Google to use this kind of data for fine and wonderful humanitarian purposes. Are all these encoded words available free to the public? I don't see where they are, sadly. So I'm not feeling distinctly optimistic about it. But, again, I'd love it if this turned out to be a fine and helpful thing in the end.)
posted by koeselitz at 1:55 PM on March 31, 2015


koeselitz: "God bless America."

You might be better off saying "God bless Google" ...
posted by TheLittlePrince at 4:01 PM on March 31, 2015


Are all these encoded words available free to the public? I don't see where they are, sadly.

Um, yes? https://translate.google.com/.

If your question is the raw data available to the public, I would imagine not. But the general public doesn't care about raw data, they care about functionality... And now Bengali translations are more available to anyone, which makes their culture more available to the general public.

I don't think of myself as a google 'fanboy' or anything, but there seems to be a strange amount of hate-on in this thread.
posted by el io at 7:49 PM on March 31, 2015 [1 favorite]


As a native Bengali speaker, thank you for this post! This is a great start.

I've spent some time tinkering with the Google Translate tool since coming across this post. It is pretty hit and miss. After an hour or so of inputting different phrases and coming up with either gibberish or something that was understandable but grammatically incorrect, I finally got a result that was grammatically correct both in English and Bengali.

The thing about Bengali (and it is not only Bengali, of course, but it's the only language I speak apart from English, so the only one I can comment on with any knowledge) is that it's so difficult to translate basic concepts sometimes because the word order is totally different from English and there's the added complication of verbs changing depending on who is being spoken about, like some European languages. Intonation plays a huge part in communicating meaning - the same sentence, when said with a certain intonation, denotes different meanings: formality, affection, irritation. In my family we rarely use "please" - it's all in the intonation.
posted by Ziggy500 at 6:45 AM on April 1, 2015 [2 favorites]


There are two sides to Google's Translation service: the free online translation service and the free translation of webpages, and the more robust commercial option, which is competing with other automated and personal translation services.

On one hand, this is great because Bengali internet resources are now more available to people who speak other languages, increasing the value (as a possible resource) of Bengali language sites and pages. On the other hand, Google was gifted a huge translation dictionary, which they can also monetize. But for a massive free service like Google's website and snippet translation services to be free means someone's paying for the back-end maintenance and support, and there are only so many philanthropists who can support mechanisms necessary for making information free.
posted by filthy light thief at 7:09 AM on April 1, 2015 [1 favorite]


If your question is the raw data available to the public, I would imagine not. But the general public doesn't care about raw data, they care about functionality... And now Bengali translations are more available to anyone, which makes their culture more available to the general public.

I don't think of myself as a google 'fanboy' or anything, but there seems to be a strange amount of hate-on in this thread.


Google is a multi-billion dollar multinational corporation. I'm pretty sure they could find the money to pay for the labor involved.
posted by NoxAeternum at 12:12 PM on April 1, 2015


Google is a multi-billion dollar multinational corporation. I'm pretty sure they could find the money to pay for the labor involved.

Not that I'm generally a fan of late-stage capitalism, but keeping to this seems like a short path to Google going "Fine, I guess we don't need to offer our services in Bengali, it's not worth *that* much to us". Google Translate is pretty close to an international public good by this point.
posted by CrystalDave at 2:23 PM on April 1, 2015


Not that I'm generally a fan of late-stage capitalism, but keeping to this seems like a short path to Google going "Fine, I guess we don't need to offer our services in Bengali, it's not worth *that* much to us". Google Translate is pretty close to an international public good by this point.

So in other words, it's okay that we devalue this labor, because it's for the greater good.

Am I the only person who sees the problem with this logic?
posted by NoxAeternum at 3:09 PM on April 1, 2015


I'm pretty sure they could find the money to pay for the labor involved.

Would there be a business case for doing so? I mean google does a ton of things that don't necessarily have a direct return on investment... But if they didn't get this for free, chances are it just wouldn't happen.

I assume you're against yelp and tripadvisor as well, because they crowdsource free labor to generate their content (well, I"m not a fan of yelp either, but not for that reason).

Look, I agree that it would have been 'better' if all these people donated their time to an open language translation project... But shouldn't you be angry at the government that encouraged this, rather than the corporation that sponsored this?
posted by el io at 9:30 PM on April 1, 2015 [1 favorite]


Would there be a business case for doing so? I mean google does a ton of things that don't necessarily have a direct return on investment... But if they didn't get this for free, chances are it just wouldn't happen.

And the "for the greater good" argument raises its head again. At what point do we say that devaluing labor is wrong, even if being done "for the greater good"?

I assume you're against yelp and tripadvisor as well, because they crowdsource free labor to generate their content (well, I"m not a fan of yelp either, but not for that reason).

Yes, I'm opposed to the concept in general, because you wind up with a situation where a few connected insiders make bank off of the labor of many individuals. There's also the issue that amateur crowdsourcing systems tend to be less experienced and less savvy than professionals (see Kickstarter and Greenlight for examples here.)

Look, I agree that it would have been 'better' if all these people donated their time to an open language translation project... But shouldn't you be angry at the government that encouraged this, rather than the corporation that sponsored this?

Yes, I am angry at them as well. But blame is a lot like fertilizer - you have to spread it where it belongs. And while the government is at fault here as well, the root cause is that Google is looking for handouts instead of paying for labor.
posted by NoxAeternum at 6:58 AM on April 2, 2015


« Older "Entertainment is a business, no matter what it...   |   It's the Leaning Tower of Cheese-A Newer »


This thread has been archived and is closed to new comments