Join 3,431 readers in helping fund MetaFilter (Hide)


Artist's Notebook: Ramsey Nasser
May 5, 2014 1:44 PM   Subscribe

"Arabic programming languages with the honest goal of bringing coding to a non-Latin culture have been attempted in the past, but have failed without exception. What makes my piece قلب different is that its primary purpose was to illustrate how impossible coding in anything but English has become."
posted by invitapriore (46 comments total) 39 users marked this as a favorite

 
I find it apt that the tile piece he created was "large, heavy, and impossible to move." Sounds a lot like what he ran into in the software realm.
posted by rouftop at 1:53 PM on May 5


tl;dr:

1) He's using MacOS and his terminals are running with LANG=C instead of a unicode setting like LANG=en_US.utf8 or whatnot.
2) git might have a path handling bug that should probably get fixed? But I think that's actually still just his language being set wrong
3) github apparently doesn't like non-latin names
4) tile calligraphy is neat, but I don't see how it actually solves any of the problems. It's too bad he didn't explain that either.
posted by atbash at 1:57 PM on May 5 [4 favorites]


Ooh, fascinating. Reminds me (sort of) of Alif the Unseen, and now I'm mentally trying to translate the code into Malay (which can be written both in Roman letters and in Jawi/modified Arabic).

k = 0
pusingkan
k += 1
jika k > 5 jadi
cetak "Hai Dunia!"
jika tidak
cetak "Sekejap..."
tamat
tamat
posted by divabat at 1:58 PM on May 5 [3 favorites]


More transliterations please.
posted by PMdixon at 1:59 PM on May 5


Lisp and tilework? This guy has an appreciation not just for the functionality of programming, but for its elegance.
posted by daveliepmann at 1:59 PM on May 5 [7 favorites]


This is very cool.
posted by b1tr0t at 2:07 PM on May 5 [1 favorite]


It used to be the case that many high schools here in Argentina introduced a Spanish transliteration of LOGO to teach programming and problem solving skills. Now that I peeked at Wikipedia for curiosity's sake, seems this was not as unusual (translated LOGO) as I originally thought.
posted by Iosephus at 2:10 PM on May 5


It is indeed fascinating. My fave bit is either the way he managed an Arabic recursive acronym for the name, or the bit about code as poetry.
posted by Sebmojo at 2:11 PM on May 5 [2 favorites]


I've really appreciated Ramsey Nasser's efforts to program in Arabic. I'm a fan of Kufic calligraphy and think ASCII-only software is bullshit, so more power to him. Also he's uncovering a lot of bugs. But I wonder if at least some of it is the extra challenge of right-to-left scripts, not just Unicode. And also if part of it is his environment. Not trying to diminish his frustration, but maybe it's easier to solve than it looks at first blush.

FWIW, here's a non-ascii diff on GitHub and a screenshot of how it looks in my Mac console (LANG=en_US.UTF-8). Symbols like π and φ work just fine. The D3.js code takes advantage of Javascript's Unicode support to write math formulas in math symbols and it's in pretty wide deployment. But this is a pretty simple demonstration of non-ASCII programming.

Filenames are worse, although Linux does pretty well at least with ISO-Latin-1 encoded as UTF-8.
posted by Nelson at 2:40 PM on May 5 [4 favorites]


An interesting experiment, especially considering Algorithm's etymology.
posted by Jon Mitchell at 2:52 PM on May 5


I guess fixed-width fonts suck for Arabic more than proportional fonts suck for programming...
posted by aubilenon at 3:03 PM on May 5


Thank you for expanding my expectations for what is possible in the world..
posted by Captain Chesapeake at 3:06 PM on May 5


This is impressive, thanks for posting!
posted by JoeXIII007 at 3:07 PM on May 5


Neat. It made me wonder — in that dangerous "well maybe we need to generalize this solution" way that technical people do — whether there's a way to separate out the human-language aspects from the computer programming language.

In other words, we think of a computer programming language as having statements like "printf" or "if" and "else" but really those are just arbitrary tokens. It seems like you ought to be able to tell the compiler what human language you're writing the code in, and it could use a different set of arbitrary symbols to represent the same computer-language constructs. The compiler shouldn't care whether the 'if' statement is represented as the values 0x49 0x46 or something else; why can't it just have a lookup table that it uses based on some language specifier you put into the source file? That's in addition to the actual file-encoding ASCII / UTF-8 issue; this is all one level up from that.

For compatibility with existing programs you'd need to default to assuming ASCII and English, but it seems like you could bolt on different human-language vocabularies to existing computer languages fairly easily.

And it also seems like (comments aside) it would be possible for a machine to translate source code from one language to another fairly easily (assuming the code compiles). So source control might even be able to provide source code to people in the language they prefer, or maybe their IDE could even perform the conversion. How useful that would be if the comments were all in a different language is a valid question, as well as what the real demand for such a feature is, but it seems like it should be doable.

Probably it'd be easier in some languages than others — something like Prolog, which IIRC was used heavily by the Soviets for the Buran program and other defense applications, and presumably not by people who necessarily spoke English, doesn't have any human-language constructs; it's all based on predicate logic. I've never seen many non-trivial Prolog programs but I suspect (again, comments aside) they're probably quite close to human-language-neutral already. They just have the character-set issue.
posted by Kadin2048 at 3:29 PM on May 5 [5 favorites]


More transliterations please.

His Hello World example was actually a valid translation, not just a transliteration. As with the rest of his language's primitives.

This is really great. I've spent a hugely disproportionate amount of time struggling with non-Latin fonts, hopefully he can find some decent workarounds.
posted by xqwzts at 3:35 PM on May 5


The big issue is variable names (including function names). Generally it's possible to figure out what well-written code is doing if it has meaningful variable names but no comments, while the alternative (heavily commented code where are the variables are named like variable_38234 and function_352) would still be a nightmare.
posted by Pyry at 3:40 PM on May 5 [1 favorite]


Neat. It made me wonder — in that dangerous "well maybe we need to generalize this solution" way that technical people do — whether there's a way to separate out the human-language aspects from the computer programming language.
It is absolutely possible. One of the early steps in the compilation process is tokenization - the process of converting strings of text into symbols that can then be processed by the compiler.

The bigger problem, I suspect, is variable naming in multi-lingual teams. If everyone speaks Arabic, then there is no problem. But if you have two Arabic-speaking developers, three Mandarin speaking developers and a Spanish speaker, the problem becomes more difficult.

On preview, what Pyry said.
posted by b1tr0t at 3:42 PM on May 5


xqwzts: "His Hello World example was actually a valid translation, not just a transliteration."

I noticed he translated Hello World to Assalamu Alaikum. Is this typical of Arabic programming tutorials? It makes me want to change try-catch blocks to Inshallah blocks.
posted by hanoixan at 3:44 PM on May 5 [31 favorites]


Wikipedia has a page about non-English programming languages. Some interesting stuff there. As far back as 1960 when APL was developed it wasn't clear yet if we would be programming in symbols or in English.

The working interpreter is buried at the bottom. Though I think antirez explains better what it is like to be a non-native-English speaker in technology.
posted by RobotVoodooPower at 3:47 PM on May 5 [2 favorites]


my first thought was " this seems like a solved/solvable problem" but then went down a Google hole when I saw the tiles, which I now want to cover my entire house with.
posted by Dr. Twist at 3:50 PM on May 5 [1 favorite]


It makes me want to change try-catch blocks to Inshallah blocks.

Counterproposal: Inshallah blocks are for unit testing, equivalent to assertions.
posted by Tomorrowful at 4:03 PM on May 5 [4 favorites]


In other words, we think of a computer programming language as having statements like "printf" or "if" and "else" but really those are just arbitrary tokens. It seems like you ought to be able to tell the compiler what human language you're writing the code in, and it could use a different set of arbitrary symbols to represent the same computer-language constructs. The compiler shouldn't care whether the 'if' statement is represented as the values 0x49 0x46 or something else; why can't it just have a lookup table that it uses based on some language specifier you put into the source file? That's in addition to the actual file-encoding ASCII / UTF-8 issue; this is all one level up from that.

Microsoft's VBA (Visual Basic for Applications) did this for a little while. That's the language you use for macros in Word, Excel, and other Office programs. They stopped doing it because it caused a lot of problems:

You could only run a macro written in Polish VBA if your version of Excel had the Polish localization installed.

It made it a lot harder for people to share code. Even if I have the Polish localization installed, I still won't be able to understand or modify the macros. Are they doing scary things? Are they safe?

You can't to use keywords as variable names, but the list of keywords depends on your locale. What words are safe to use?

It probably worked better for some languages than for others. Like how do right-to-left locales work for this?

I'm sure there were lots of internal reasons why it was expensive for Microsoft as well. Documenting, testing, and supporting a programming environment is difficult, and this could only make it a lot worse.

It's an interesting proposal at an abstract level, but at the nuts-and-bolts level of actually doing it, it's a nightmare.
posted by aubilenon at 4:14 PM on May 5 [1 favorite]


The generally small list of keywords could easily be translated - here are the Java keywords:
abstract continue for new switch
assert default goto package synchronized
boolean do if private this
break double implements protected throw
byte else import public throws
case enum instanceof return transient
catch extends int short try
char final interface static void
class finally long strictfp volatile
const float native super while
and some of them aren't even used.

A problem beyond teamwork is that libraries are also in English. C, which has even fewer keywords, has a pretty minimal standard library which could be translated readily, but then you end up using a lot more non-standard libraries generally in English too. Java throws just about everything and then some in - here is a list of standard Java packages, each containing classes containing fields and methods, all in English.

I dunno about ripping out all the curly brackets. I haven't even seen one of those in English, only in math and computing.
posted by save alive nothing that breatheth at 4:39 PM on May 5 [1 favorite]


It reminds me in a way of Damian Conway's Lingua::Romana::Perligata (a canonical example of things that are possible in Perl which shouldn't be). Instead of just translating keywords and such, he asked "what features of the natural language grammar could be used to express programming language grammar?" You end up with things like noun case replacing sigils/data types.

It's certainly an interesting thought experiment.
posted by sbutler at 4:45 PM on May 5 [5 favorites]


Keywords being in English is an issue, but not an insurmountable one. There's still a limited number of them, and you don't have to understand the entire etymology in order to recognize and use "if" or "throw". Typing them might be a pain on non-English keyboards, but I'm sure there's an autocorrect tool out there that you could use to create more comfortable shortcuts.

The "core" libraries are more of a concern, because there's going to be a lot more in them and they'll be covering much more complex and nuanced concepts. However, in the end you could still just create a "wrapper" library that simply renames the methods, classes, etc. to your familiar language.

What gets me is the stuff that imposes language constraints on the users' code. Go and Haskell have a lot of merits, but they both use capitalization as a syntactical construct which instantly makes a second-class citizen out of any human language that doesn't have caps. You could always prepend an "X" or something to your identifiers, but you shouldn't have to pollute your own code to work around something like that.
posted by Riki tiki at 6:03 PM on May 5


But I wonder if at least some of it is the extra challenge of right-to-left scripts, not just Unicode.

I am eagerly awaiting the first Arabic coding style debate.
posted by Tell Me No Lies at 6:21 PM on May 5 [2 favorites]


Keywords being in English is an issue, but not an insurmountable one.

Back in 2000 a new VP came into Cisco who wanted us to internationalize (i.e. make it easy to slide in different words for different languages) the user interface. It took quite a few people to make him understand that while the base keywords were English the other 95% of the interface was in a dialect of Geekese that would not change no matter what you did.

(which isn't to say there weren't problems. We had to switch an entire section of config from "master" and "slave" to "primary" and "secondary" due to some pushback from Asia.)
posted by Tell Me No Lies at 6:32 PM on May 5 [2 favorites]


It makes me want to change try-catch blocks to Inshallah blocks.

That. That would be awesome.
posted by odinsdream at 7:11 PM on May 5 [2 favorites]


I can't wait to convert one of our more complex Ruby-on-Rails models and associated controller and views at work into Arabic in a branch and then send it out for formal code review. I know exactly which two coworkers I want to name as reviewers: they're the ones who bitch most about I18n.
posted by double block and bleed at 7:13 PM on May 5


Semitic languages have a cool feature where certain groups of three consonants represent a topic, and changing vowel sounds transform it into different words.

For example: k-t-b represents writing:
ataba he wrote
katabû they wrote
katabat she wrote
katabnâ we wrote
yaktubu he writes
yaktabunâ they write
taktubu you write
naktubu we write
'uktub write!
katîb writer
kitâba the act of writing
kitâb some writing, book
kutub books
kutubî bookdealer
kutayyib booklet
maktûb letter
maktab school, office
maktaba library, literature
maktabî individual office
miktâb typewriter
mukâtaba correspondence
iktitâb registration
istiktâb dictation
I've thought ever since I first read about it, I've thought that it would be an interesting thing to apply to programming languages.
posted by empath at 7:47 PM on May 5 [3 favorites]


You can do all sorts of interesting things with Semitic inflections (or Latin), et cetera. But English is one of the most suitable languages for programming precisely because the grammar is nearly all positional rather than inflected. Chinese would be great, but the logograms don't help. Alphabtic or Amjadic writing is easier.

Perhaps Afrikaans would be an interesting alternative.
posted by ocschwar at 8:08 PM on May 5 [1 favorite]


Regarding curly brackets, he was using ruby and lisp as bases, and neither uses curly brackets heavily in standard dialects.
posted by idiopath at 8:16 PM on May 5


Riki Tiki: Go and Haskell have a lot of merits, but they both use capitalization as a syntactical construct which instantly makes a second-class citizen out of any human language that doesn't have caps. You could always prepend an "X" or something to your identifiers, but you shouldn't have to pollute your own code to work around something like that.

Prolog uses initial capitalization to indicate that an identifier is a logical variable instead of an atom. The developers of SWI-Prolog came up with a method to allow identifiers in scripts that don't have the case distinction. They expanded the rule to say that an identifier that either begins with an upper case character or an underscore is a logical variable, and otherwise it is an atom. It's probably not quite aesthetic to prepend underscores to words in some scrips, but that is at least a consistent solution to a problem the original design of the language didn't address. (I have no idea how easy they are to enter on non-Latin keyboards though.)

Ref: Unicode Syntax in SWI-Prolog

On another note, I recall from my limited reading of linguistics that the word 'if' is not an universal feature of natural languages. Some languages express the idea with something like verbal inflections or suffixes, but these suffixes cannot stand alone as words. It seems possible there could be similar problems with other common keywords lacking a clear and unambiguous counterpart in some natural language.
posted by tykky at 1:48 AM on May 6


Ramsey Nasser & code? My heart skipped a beat! Alas, wrong Ramsey Nasr, former poet laureate of The Netherlands.
posted by ouke at 2:34 AM on May 6


On another note, I recall from my limited reading of linguistics that the word 'if' is not an universal feature of natural languages.

It's not a universal feature of programming languages, either.

Concepts which don't have a specific word can in some language can generally still be expressed in that language in some other way.
posted by empath at 2:47 AM on May 6


Perhaps Afrikaans would be an interesting alternative.

Isn't Afrikaans essentially a dialect of Dutch, i.e., a fairly vanilla Germanic language?
posted by acb at 4:13 AM on May 6


Part of the problem is using language as a metaphor - it's not essential to programming. Everything, from ASCII on up, is a user interface, syntactic sugar. I once read an article about an engineer who programmed the microcode on a processor, which by necessity was done in binary, with two buttons mounted on rings he could wear, one for each hand, that he had wired into his workstation and integrated with his editing software.

At its core - variables and other data containers, loops, incrementers/decrementers, conditionals and comparators - programming can be reduced to logograms. Most of it emerged from mathematic set theory and formal logic, which absolutely has their own logography. To make code accessible across cultural boundaries like language, perhaps it should be - we don't live in a world of typewriters and paper. Increasingly, we're no longer living in a world where keyboards are common. If you look at python, where you need to declare an object with a double underline, or javascript, where you have fun stuff like "===", the limitations of using the punctuation on an 19th century typewriter to represent modern computer science concepts are becoming silly. It's also kind of alien and a barrier to entry to those who don't read alphabetic languages or use European symbols like "#", "." or "(", or even have concepts for them. ";" is syntactic sugar, remember - provided you read English or another language that uses semicolons.

More, these core elements of programming can even be represented as visual GUI elements of one type or another - we've been refining human-machine interfaces via computer gaming for almost a half century, now. Maybe it's time to leverage some of that know-how into other areas of computer interface - including programming.
posted by Slap*Happy at 5:14 AM on May 6 [1 favorite]


I've thought ever since I first read about it, I've thought that it would be an interesting thing to apply to programming language

The problem when you apply that to programming is that a lot of those differences, which you can see when transliterated, aren't written, they're just spoken.

But English is one of the most suitable languages for programming precisely because

My guess is that the reason that English is suitable for programming is that a lot of programming was designed by English speakers?
posted by MisantropicPainforest at 6:37 AM on May 6 [2 favorites]


Isn't Afrikaans essentially a dialect of Dutch, i.e., a fairly vanilla Germanic language?

Precisely. And the syntax is more simplified than that of English, for much the same reasons.
posted by ocschwar at 6:39 AM on May 6


The problem when you apply that to programming is that a lot of those differences, which you can see when transliterated, aren't written, they're just spoken.

Which is also interesting to think about in terms of programming languages, no?
posted by empath at 7:09 AM on May 6


Maybe it's time to leverage some of that know-how into other areas of computer interface - including programming.

On some further thinking and an interesting discussion with some work colleagues, I think the challenge that you run into when working on this problem — and it's something that seems to come up every few years, probably when a new generation of computer scientists decides to try their hand at it — is that you can definitely create a programming environment that isn't at its core an ASCII text editor, and in fact such things exist, but they're not popular and thus never take off except as research projects and tend as a result not to influence the trajectory of software development as a discipline or an industry. The history of CS is littered with Good Ideas that fail to offer enough of an advantage over the worse-is-better C/Unix/ASCII/Qwerty mainstream to have much of an influence.

Real-world adoption and use of programming languages in non-academic contexts is driven largely, when there is a choice not dictated by compatibility concerns, by efficiency. And it is apparently quite hard to beat the typewriter-keyboard + text editor combination.

There have been improvements over the years in the dominant code-editing/writing paradigm, but they've all been evolutionary. Compared to writing code in ed in a hardware terminal, a modern IDE gives a programmer a lot of support. E.g. mouse, windowing, syntax highlighting, autocomplete, in-editor error detection, etc. Although some of those features are still controversial, they've gotten included (presumably) because enough people find them useful to justify them as incremental improvements.

I don't know how you get from there to a more visual system. The only real successes I'm aware of in visual programming languages are those aimed at beginners or non-programmers, and as a result they tend to not be regarded as "serious" programming languages and as sub-par, probably because they're often used by beginners and non-programmers who are writing sub-par code with them. (If your language is designed to be used by business analysts rather than developers, it's probably doomed as a "serious" language from the outset.)

So far, nobody has managed to find a niche that offers an actual advantage to people who are already familiar with text-editor-driven software development. Doesn't mean it can't be done — eventually, maybe enough programmers will want to do serious development from their iPads that they'll feel the need to scratch the itch and develop a touch IDE that doesn't just give you an onscreen keyboard — but it's yet to be done successfully.
posted by Kadin2048 at 8:52 AM on May 6 [1 favorite]


As a Russian programmer, I've seen a couple non-English programming languages: 1C:Enterprise's built-in scripting language (Visual Basic-style) and the School algorithmic language (ru.wikipedia.)

And people love to joke about them instead of taking them seriously :-)
posted by floatboth at 10:03 AM on May 6 [1 favorite]


Just for kicks, I cut and pasted one of his strings into Emacs. Wow—moving the point becomes extremely confusing when mixing left-to-right and right-to-left text. (Emacs, correctly I think, treats the left arrow key as as moving you logically backwards in the text, and the right arrow key as moving you logically forwards, so that when the point is over a character in a right-to-left alphabet, the right arrow key actually moves the point leftwards on the screen. But when you cross boundaries between left-to-right and right-to-left, and especially when you start inserting text at those boundaries, weird things happen.)
posted by enn at 10:40 AM on May 6


The only real successes I'm aware of in visual programming languages are those aimed at beginners or non-programmers, and as a result they tend to not be regarded as "serious" programming languages and as sub-par, probably because they're often used by beginners and non-programmers who are writing sub-par code with them.

That's probably not as big an issue as you think - see, for instance, Javascript, TCL and Perl. A much larger issue is that these visual programming environments are usually toys, meant strictly for some ill-defined educational purpose. With TCL (and LUA and Javascript), the expectation was there that you could perform some serious, touring-complete computing with this environment. It was lighter (much lighter) than (Objective)C(++) or Java or Fortran, while having a lot of the grunt-work of setting up a running application taken care of by the language and its environment, and therefore more nimble.

Stuff like Google's App Creator and almost anything on this Wiki page (the stuff that's not actually macro recorder/generators rather than visual programming, that is) are either incomplete and dinky and otherwise poorly implemented "educational tools", or hyper-specific extensions of a particular software package that is of limited or no general use, even in its host app's own field.

It will take an interesting and otherwise intractable problem that can best be solved by a new general purpose programming environment, and someone deciding to implement a visual programming paradigm as that solution, before we'll see something like it take off.
posted by Slap*Happy at 10:46 AM on May 6


> I cut and pasted one of his strings into Emacs. Wow—moving the point becomes extremely confusing when mixing left-to-right and right-to-left text ... when you cross boundaries between left-to-right and right-to-left, and especially when you start inserting text at those boundaries, weird things happen.

This feature is by no means limited to Emacs.
posted by nangar at 11:17 AM on May 6


This post answered a long-standing question in my mind: Do programming languages get translated? I mean, I understand how it might be difficult to write
10: Print "Hello, world."
in Mandarin, but it seemed to me plausible that at least somewhere there was a version of QBASIC that allowed
10: Imprimes "Bonjour, la monde."
Nope, it appears no one is attempting anything that foolish... at least not for serious purposes. QBASIC, of course, is intrinsically and intentionally English-like, but when you get to something as abstract as (in R)
cat ('Hello world!')
or (in PL/SQL)
SET SERVEROUTPUT ON;
BEGIN
DBMS_OUTPUT.PUT_LINE('Hello, world!');
END;
it quickly becomes unsightly at best, intractable if you aren't careful.

However, while searching for some Hello-World examples, I ran across Ezhil:
Ezhil, in Tamil language script (எழில்), is compact, open source, interpreted, programming language, originally designed to enable native-Tamil speaking students, K-12 age-group to learn computer programming, and enable learning numeracy and computing, outside of linguistic expertise in predominatly English language based computer systems.
And from there, a list of Non-English-based programming languages, many of which (but not all) were developed as teaching tools to introduce students to programming. Among this list, I noticed Jeem ج, "Arabic programming language, based on C++ with simple graphics implementation", which seems to be completely unrelated to the language that opens this FPP.
posted by IAmBroom at 12:43 PM on May 6 [1 favorite]


« Older The New Yorker Jigsaw Puzzle....  |  57-minutes of a live-stream co... Newer »


This thread has been archived and is closed to new comments