Translating the U.S. census form into Arabic
June 6, 2016 9:26 AM   Subscribe

The fastest growing language in America provides some special challenges for designers of the 2020 U.S. census form.
posted by clawsoon (8 comments total) 13 users marked this as a favorite
 
Localization is always difficult, and working with databases capable of handling right to left languages is also difficult, even if you use one that can handle full unicode.

At least they're trying, which is a good thing. Too many outfits don't even bother trying, look at Facebook for a prime example. But yeesh is it a non-trivial problem, both on the front end design side as the article here talks about, and also on the back end database engineering side.
posted by sotonohito at 10:45 AM on June 6, 2016 [2 favorites]


Glad to see they are thinking of adding a "Middle East/North Africa" category. It's always been a conundrum for my wife what to fill in on these kinds of forms. Apparently Arabs are supposed to fill out "White/Caucasian," but she's (North) African looking enough that she gets called N***** when we are in the South.
posted by jackbrown at 11:26 AM on June 6, 2016 [1 favorite]


Also, Arabic optical character recognition (especially for handwriting) is just not as sophisticated as it is for other scripts (so far).

But also—is there no equivalent census in any Arabic-speaking country from which the census could gain insight?

For that matter—surely forms exist in Arabic-speaking countries?
posted by Dr and Mrs Eaves at 11:32 AM on June 6, 2016 [5 favorites]


Though the information in the article is correct, it's odd that it's presented as a new thing... PDFs have supported Arabic input for years; doesn't the government have this functionality already? Plus the things about capitalization, and people's names, apply to Chinese which the Census already supports.
posted by zompist at 2:38 PM on June 6, 2016 [1 favorite]


surely forms exist in Arabic-speaking countries?

I think the real issue here is that the census sorta wants to do data collection in two distinct codings. Getting the database to go along isn't the showstopper. It's getting the data into the system that's the hassle.

I used to do industrial grade tax form processing. I think the thing here is finding the balance between scanning/automated OCR/ and MANUAL INDEXING. Worst case scenario, every form needs to be reviewed by TWO clerks ( in parallel, they don't know they're cross checking each other ) and each field keyed by hand. ( They look at an image, and do data entry on a split screen. No actual 'forms' are involved after data-capture....)

That's something a lot of people have spent a lot of time and money moving away from. It's labor intensive, and filing via the web is a whole lot simpler.
posted by mikelieman at 2:38 PM on June 6, 2016 [2 favorites]


This reminds me of, of all things, a WWDC talk Apple gave a year or two ago about how they'd done sweeping system-level changes to RTL languages' localizations for when you set an iOS device to Arabic or Hebrew, putting the "back" button on the right side of the display and swiping right to left to unlock and just generally mirroring the entire implied hierarchy of screens.

This stuff is serious business and it is FASCINATING
posted by DoctorFedora at 4:19 PM on June 6, 2016 [4 favorites]


jackbrown: "Glad to see they are thinking of adding a "Middle East/North Africa" category"

It might help out a bit with the current "Black, African American, or Negro" category, as well. Growing up I knew a guy whose dad was (white) Algerian and whose mom was (white) American, which made him pretty much 100% "African American" yet at the same time 0% "African American".
posted by Bugbread at 6:52 PM on June 6, 2016 [3 favorites]


Visual layout consistency across languages is one of the biggest problems with translating survey instruments for right-to-left languages. I research epistemic assumptions embedded in survey interfaces that might increase bias on data collection for RTL language-speaking respondents. There is a phenomenon called spatial agency bias that causes a directional orientation based on written language script direction. It becomes important in survey instruments when visual scan patterns are oriented LTR but a respondent's spatial agency bias is RTL.

That this is coming up on the context of the census is interesting, because it will give us more data to work with in the future to develop design heuristics that do not privilege one linguistic or cultural orientation over another, generating biased data that might not be very trustworthy.
posted by paco758 at 1:28 PM on June 7, 2016 [1 favorite]


« Older My year of buying nothing - six months in   |   Why I Quit My Job to Travel the World Newer »


This thread has been archived and is closed to new comments