Regular Expressions Can Be Simple and Fast
March 17, 2010 5:55 AM Subscribe

Russel Cox, one of the people behind Google's new programming language Go, has written a three part series on regular expressions. It's a nice mix of computer science theory, programming, and history: Regular Expression Matching Can Be Simple And Fast, Regular Expression Matching: the Virtual Machine Approach, and Regular Expression Matching in the Wild.
posted by chunking express (57 comments total) 69 users marked this as a favorite

A somewhat related article I saw yesterday:

'Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.'
posted by dammitjim at 5:58 AM on March 17, 2010 [11 favorites]

Ten bucks says ole Russel Cox has used regular expressions to parse HTML.
posted by xmutex at 6:04 AM on March 17, 2010 [5 favorites]

I just let my 3 year old bang the keyboard for a bit, and whatever comes out that's the RegEx string.
posted by Artw at 6:05 AM on March 17, 2010 [2 favorites]

Dammitjim - I'd say that bit about Perl having the readability of PostScript is an overestimation of Perls readability in RegEx heavy cases.
posted by Artw at 6:15 AM on March 17, 2010

'Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.'

I came across this the other day, actually. It was an existing regexp and I discovered it wouldn't work against some new data. Or rather it would, but only if I did a ton of work to go through the pattern and escape all the special characters (a user-entered pattern, essentially). I eventually decided to search through the data myself which was not only less programming work but also less CPU work, it turned out.

The only downside is now I have a long convoluted comment left there to explain why the regexp solution won't work. The comment is longer than the function.
posted by DU at 6:18 AM on March 17, 2010

Wow, that Thompson NFA approach is awes.
posted by DU at 6:25 AM on March 17, 2010

NFAs totally destroy DFAs in speed? this is great news! i had always imagined that simply because you can convert NFAs into DFAs they would perform the same....

(no mention of where DFAs might still be faster, i imagine for small expressions with low numbers of states DFAs might give better locality.)
posted by dongolier at 6:35 AM on March 17, 2010

I just let my 3 year old bang the keyboard for a bit, and whatever comes out that's the RegEx string.

If you'd taught him that when he was two, he could have configured sendmail.
posted by eriko at 6:44 AM on March 17, 2010 [12 favorites]

TBH that's probably a better match for what I do with RegEx, as a web dev and mainly use it for validation of form field inputs. I wonder if there is an argument to be made for grabbing length and switching between the two.
posted by Artw at 6:44 AM on March 17, 2010

This link probably comes up any time regexes are mentioned, but just in case anybody hasn't seen it: full proper regular expression for email addresses.
posted by kmz at 6:50 AM on March 17, 2010 [7 favorites]

Eriko - if I hassle her too much to be a programmer it will probably end up like the time I bought a really awesome book for her on how she could be an astronaut one day, read it to her, then a little later she got a worried look on her face and wailed "Daddy, I don't want to go into Space" - not a great success.
posted by Artw at 6:54 AM on March 17, 2010 [7 favorites]

(Later it was explained that she didn't have to go to space if she didn't want to, and that astronauts only go for a little bit and (mostly) come back, and everything was smiles again.)
posted by Artw at 6:56 AM on March 17, 2010 [2 favorites]

There have been many cases where bright-eyed young computer science professionals have tried to create faster regex engines than Perl's. Invariably they look at the underlying code, which is pretty scary-looking, and think wow, wouldn't this be better expressed as c++ classes, or whatever.

And indeed, the listed author is apparently testing only one condition which is quite rare in regular expressions -- looking for long strings in gigantic corpuses. And then he handwaves about the other 99% of regular expression use. So that's a red flag.

What it has historically come down to is that perl's regex engine fits neatly in the cache of modern computers, while still providing a rich and wide set of functionality; and while you can do better at one task or another task, generally speaking the reason why perl and dynamic languages became a gigantic success is the tight optimization of the regex engine and its applicability to an overwhelming majority of data searching and manipulation tasks in real world scenarios.
posted by felix at 7:01 AM on March 17, 2010 [1 favorite]

'Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.'

Actually, it is three problems because nowadays you always have skip past this comment.
posted by srboisvert at 7:32 AM on March 17, 2010 [11 favorites]

...looking for long strings in gigantic corpuses. And then he handwaves about the other 99% of regular expression use.

But this is specifically for use by Google, if I read the articles correctly. I frequently, daily, do web searches which are more than 30 characters. If the suggestions that Google makes on the searches are indications of other common searches, then a lot of other people are doing the same thing.

I think you misestimate the statistics and the use-case he's concerned with. He isn't trying to solve the problem of validating field input on a single core machine, he's talking about long string searches (with no need for lookback/forward) in very big corpuses, in other words, web searches.
posted by bonehead at 7:42 AM on March 17, 2010

The corollary to the Zawinski quote.
posted by zamboni at 7:48 AM on March 17, 2010 [9 favorites]

I believe that Google's v8 javascript engine incorporate's the regex library (re2) on which Cox has worked. In the current language shootout results for regex-dna, it wins with a slight margin over even C.
posted by melatonic at 8:18 AM on March 17, 2010

Google's new programming language Go

Ha. I can't wait for the "Everything must Go" ads.
posted by octobersurprise at 8:27 AM on March 17, 2010

re:re, they shall pry Regexp-based URLs in Django, cold dead hands, etc.
posted by signal at 8:41 AM on March 17, 2010

bonehead: interestingly, the example you choose (searching for actual fixed strings) has the same implementation in perl DFAs or Cox's NFAs---its the trivial example which runs fast anywhere.

however, when Google adds regular expression support to the search engine, we can crown them Supreme Rulers of the Nondeterministic Automata... (and be concerned for other reasons).
posted by dongolier at 8:53 AM on March 17, 2010

NFAs totally destroy DFAs in speed?

Looks as if the answer is really "NFAs (implemented by simulating being in multiple states simultaneously) totally destroy other NFAs (implemented with backtracking)" in speed. As the article says, "Awk, Tcl, GNU grep, and GNU awk" use DFAs, and they perform a lot more like the Thompson NFA algorithm than the backtracking algorithm (still slower, but they also probably have more cases to consider since you can do more with the results, and not so much slower that they're "destroyed" by the NFAs).
posted by kenko at 9:02 AM on March 17, 2010

(It wouldn't really make sense to say "NFAs destroy DFAs in speed" without considering what the implementation does with them, since NFAs can be converted, as the article goes on to observe, into DFAs. I have ~~fond~~ memories of doing this by hand for class.)
posted by kenko at 9:06 AM on March 17, 2010

...so to correct the above:
NFAs converted to DFAs totally destroy NFAs and naive (large state) DFAs in speed?

(and the good people at perl, ruby, python really should convert their NFAs to equivalent DFAs like Cox recommended back in Jan2007 and like the old unix folks did 30 years before.)
posted by dongolier at 9:29 AM on March 17, 2010

[glad to see awk and grep get a pat on the back.]
posted by dongolier at 9:30 AM on March 17, 2010

Weird, I was just thinking about making a Regex post. I'll just do a link dump instead.

Javascript Regex Editor
Regex Editor for PHP
Regex Explorer
Flash-based Regex Editor
posted by gwint at 9:52 AM on March 17, 2010 [5 favorites]

Eriko - if I hassle her too much to be a programmer it will probably end up like the time I bought a really awesome book for her on how she could be an astronaut one day, read it to her, then a little later she got a worried look on her face and wailed "Daddy, I don't want to go into Space"

Then, for god's sake man, get her the Bat Book *right now*. Do you want her to grow up to be a Sysadmin?
posted by eriko at 10:03 AM on March 17, 2010

Yes, it's in all likelihood that Perl's regex engine performs much better. But as Cox explained on Open Source at Google RE2 is designed to perform well on worst-case input because it runs on Code Search, a code search engine that (astoundingly) takes regular expressions as queries, in addition to their Sawzall and BigTable algorithms. So you can imagine how hooking up your search engine, which is publicly available for nonmalicious and malicious users alike, to a regex engine that might have exponential run time would make you a little antsy.

This also isn't the first regex engine Google has pushed out. Apparently there's a lot to be said about optimizing your regex software for your own problem domain.
posted by shadytrees at 10:06 AM on March 17, 2010 [1 favorite]

Yes, it's in all likelihood that Perl's regex engine performs much better

Some might argue that this test is unfair to the backtracking implementations, since it focuses on an uncommon corner case. This argument misses the point: given a choice between an implementation with a predictable, consistent, fast running time on all inputs or one that usually runs quickly but can take years of CPU time (or more) on some inputs, the decision should be easy.

Is it even anywhere established that the Perl RE engine performs better on the non-malicious cases?
posted by kenko at 10:21 AM on March 17, 2010

There have been many cases where bright-eyed young computer science professionals have tried to create faster

Jeez Felix, the guy wrote a good hunk of Plan 9 back at Bell, if you consider him to be in the bright dewy eye'd camp of programmers, I'd be curious to see who makes it past journeyman.
posted by PissOnYourParade at 10:24 AM on March 17, 2010 [2 favorites]

I just let my 3 year old bang the keyboard for a bit, and whatever comes out that's the RegEx string.

Funny, that's how I used to use TECO. [/codger]
posted by The Bellman at 10:35 AM on March 17, 2010 [1 favorite]

kenko: "Is it even anywhere established that the Perl RE engine performs better on the non-malicious cases?"

To be fair, Perl's backtracking allows it to handle non-regular languages. I think Larry Wall even suggests that nobody should call them regular expressions, but rather use the term 'regex' to refer to their bastardized version.

Russ's first article is now over three years old; I wonder if Perl ever went with the 'backtracking engine only when necessary'.
posted by pwnguin at 11:10 AM on March 17, 2010

"It has been observed that a TECO command sequence more closely resembles transmission line noise than readable text"
posted by MtDewd at 11:14 AM on March 17, 2010 [3 favorites]

Artw: /x. DU: quotemeta maybe?
posted by nicwolff at 11:38 AM on March 17, 2010

'Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.'

Am I the only one who LIKES writing regular expression? I'm in the middle of some big coding project that involves hundreds of classes, this little gem of a logic problem crops up. I get a break from my main language and get to spend a few minutes solving an interesting problem using a specialized tool.

I've always thought it would be cool if regular expressions were taught in school. They are good puzzles, and they are useful outside of CS. My wife is a legal secretary, and every once in a while she has to do some complex parsing with a Word document, and I know her task would be easier if she understood regular expressions.
posted by grumblebee at 11:49 AM on March 17, 2010 [4 favorites]

I like it when I'm done with it and it works... the bot before that, not so much.
posted by Artw at 11:57 AM on March 17, 2010

I've always thought it would be cool if regular expressions were taught in school. They are good puzzles, and they are useful outside of CS. My wife is a legal secretary, and every once in a while she has to do some complex parsing with a Word document, and I know her task would be easier if she understood regular expressions.

Instead of teaching them in school, it would be better to design a decent UI so that you wouldn't have to "learn" them. I think Word already has something when you do find where you can insert character classes that match special characters. They just need repeat blocks, optional blocks, and disjunctions and then we'd have regular expressions.

Good UI is way better than learning esoteric syntax. (Someone please tell Linux programmers.)
posted by esprit de l'escalier at 12:59 PM on March 17, 2010

Looks as if the answer is really "NFAs (implemented by simulating being in multiple states simultaneously) totally destroy other NFAs (implemented with backtracking)" in speed.

There's no logical difference between an NFA that's in multiple states and a DFA, except I suppose it might have a more compact representation. The speed should be the same.

Anyway, I have actually parsed HTML with a regexps in Java, to extract hyperlinks. It actually works pretty well, for the most part. But an early version would choak (as in, lock up the CPU and die) on URLs with single quote marks (such as javascript links), so I assume java uses one of those backtracking NFAs or something like that. I rewrote the RegExp and it worked. Since I just wanted the tag the tag, and not the entire element, the nesting issue didn't come up.

I've also written plenty of hand-coded state machines.

Instead of teaching them in school, it would be better to design a decent UI so that you wouldn't have to "learn" them.

State machine diagrams are certainly easier to read and understand then regexps.
posted by delmoi at 1:25 PM on March 17, 2010

Am I the only one who LIKES writing regular expression?

I'm also a fan. I try not to get carried away when they're not well-matched (heh) with the problem domain (say, markup, or parsing large files, or both), or when a simpler function will do. But wow can they save time and reduce code size when they're used right. I was looking at this 20 line sequence in WordPress MU the other day (wpmu-settings.php, though this is a hacked installation so I don't know if it's originap WPMU code or not). When I realized it was essentially trying to pull domain and subdomain information out of a URL with a dozen calls to strpos and substr, I wondered why they hadn't just used single call to preg_match and maybe two lines of followup code. Maybe they didn't know regexes? No; the ironic flourish is that one of the last things the coder at the end of this sequence did was call preg_replace three times to strip out some information their previous substr and strpos calls missed.

In that situation? Give me regexes any day.

I've always thought it would be cool if regular expressions were taught in school. They are good puzzles, and they are useful outside of CS.

I agree. If I ever teach math in a public school again, they'll be going into my curriculum. I think they're a fantastic exercise in symbolic thinking and so potentially useful in editing text / processing documents that it's a half-crime they're not taught now.
posted by weston at 1:45 PM on March 17, 2010

State machine diagrams are certainly easier to read and understand then regexps.

The thing is that presenting it as an extension to 'find' functionality is where normal people would look for regular expressions and also makes regular 'find' a special case, so you wouldn't need any new dialogs. Also, making a state machine to find n copies of some pattern means making three copies of the nodes, (or can you think of good UI for that?)
posted by esprit de l'escalier at 1:56 PM on March 17, 2010

State machine diagrams are certainly easier to read and understand then regexps.

De gustibus non est disputandum!
posted by phliar at 2:01 PM on March 17, 2010

Ten bucks says ole Russel Cox has used regular expressions to parse HTML.

Nothing wrong with it when in small doses.
posted by Blazecock Pileon at 3:26 PM on March 17, 2010

There's no logical difference between an NFA that's in multiple states and a DFA, except I suppose it might have a more compact representation.

Well … right. Which is why the claim that the backtracking NFA beat DFAs is false.
posted by kenko at 4:10 PM on March 17, 2010

xmutex: "Ten bucks says ole Russel Cox has used regular expressions to parse HTML."

Everybody stand back. I know regular expressions.
posted by Rhaomi at 5:49 PM on March 17, 2010 [1 favorite]

A bad UI can make easy problems hard. But a hard problem doesn't become an easy problem due to a good UI. So when you talk about offering a friendly UI to regexps, just what do you propose? Just how limited a subset of regexps' capabilities do you propose to offer in the name of usability?

It's like the frequently-promised never-delivered programming language in plain human-friendly language that anyone can read and write. You can arrange things such that some programs can be written in an easy to read fashion. But, ultimately, it's still code; it still has some precisely defined syntax and semantics, even if its design has tried to disguise this, and something will ruthlessly and literally interpret your program without humor, sense, or mercy. If you don't understand this syntax and semantics, or if you haven't thought ahead to edge cases, it's going to blow up in a way that mystifies you sooner or later. The illusion of plain language in your programming language doesn't make you any better at thinking about edge cases.

Regexp syntax hasn't evolved despite usability; it has evolved for usability, because it's a concise, specific way to say exactly what you mean. And you still have to say exactly what you mean, no matter what syntax or interface you're using. And if you don't understand that, you're not going to be able to use them, no matter what you're using. (Some tiny, tiny, subset, probably, but nothing slightly resembling their full power.)

If you think I'm wrong, I'm all ears. Just what is your easy to use regexp interface that's going to give people who can't get their heads around regexp syntax access to regexps' power?

As for Perl's regexp speed, here's Tim Bray concluding the jury's out on whether Perl's regexp strategy is better or worse than Java's.
posted by Zed at 10:51 AM on March 19, 2010 [1 favorite]

I agree with you, Zed. I know that Regular Expressions can look like alien writing, but when people say they could be replaced (or enhanced) with a good UI, it makes me suspicious that they don't use them very much.

The thing that makes them powerful is how much you can tailor them to specific situations. Let's say you need to find every occurrence of the number nine in a document, but only if it's preceded by two other numbers, a period, and a letter that's either Q, N or S. And only if it's followed by any character except a question mark -- OR a question mark by itself. And lets say that once you do find those matches, you want to replace the question mark, if it's there, with a semi-colon and the number before the 9 with a 2.

What's a good UI for that?
posted by grumblebee at 11:28 AM on March 19, 2010

UIs are useful when we're only going to approach an item in a few specific ways. For instance, the average person only needs to turn a TV on, off, change changes and change volume. So it makes sense to give him a UI. But what's the UI for the TV-repairman? ANYTHING in it could be broken. He needs to be able to roam freely under the hood.

When "he need to be able to do anything" or "I need to be able to do anything" comes into play, that's a good sign that a UI won't work. You need a language. The repairman's language is his tools and how they interact with the parts inside the TV. There's an almost infinite way he can mess with things -- and all those options need to be available to him.

Since the text you're searching with regular expressions could be ANY sort of text, and the way you want to search it could be ANY imaginable way, I don't see how anything other than a full-blown language can work.
posted by grumblebee at 11:35 AM on March 19, 2010

The UI that I'm imagining is this. You have a regular text box that is your search. You can highlight sections of text and do the following (using buttons):
- Create a repeat group. Then, little basket |____| would appear under those letters, and under the basket would be the symbols: "0, infinity". If you right-click the basket, you can delete it, or click to change the numbers. So, you could change the numbers to "2, 5" or whatever. If you change the numbers to "0, 1", the numbers change to say "optional". The basket also has an option called "non-greedy" that, if checked, causes the group to match as little as possible.
- create an optional group
- create a capture group: a different colour basket, which has the name of the capture group underneath it.
- create a disjunction (whose first option is the highlighted text.) The options in the disjunction are listed as a vertical list of regular expressions. (Each item in the list can have a corresponding look-ahead matching filter, like in PERL)
You should also be able to insert character classes.

That's it. Basically, I've replaced the ?: that no one can remember with options, and parentheses that need to be escaped sometimes, depending on the whim of the implementation, with baskets. In fact, nothing needs to be escaped since the special items are atomically inserted by buttons rather than typed in. Also, a UI like this would save me from ever having to read documentation since everything is accessible in the UI, like it should be.

Obviously, I would not suggest better UI if I didn't use regular expressions.
posted by esprit de l'escalier at 8:04 PM on March 19, 2010

Since the text you're searching with regular expressions could be ANY sort of text, and the way you want to search it could be ANY imaginable way, I don't see how anything other than a full-blown language can work.

Regular expressions are great, but they're not that powerful. They're the least powerful language in the Chomsky hierarchy.

My suggestion is to replace the character representation of the constructs with UI that is easily accessible and impossible to forget (because it has tooltips, and a clear presentation.) You could have that without losing rapid text entry.

I certainly can't remember how to do any complicated RE stuff. Everyone has a different implementation (vim, PERL, python, boost, etc.) It would be a lot easier to just have standard UI in the OS, just like there's a standard tree-view.

Do you honestly know, without checking a reference, how to specify named capture groups? Non-greedy repeat blocks? A block that can repeat 5-10 times? lookahead filtering?

Maybe you know these things, but then why should regular people have to learn them? Why not make it easy?
posted by esprit de l'escalier at 1:45 AM on March 20, 2010

I guess.

It doesn't sound like much of a UI to me. It's like if you're trying to program in a standard C-family language, and instead of typing in a common construct, you have a dropdown that lists things like if, while, do, for, etc. Pick one and it appears in the editor. It's really more of a typing aid than anything else. I don't have a problem with it. It's just not what I'd call a UI. A UI -- in the sense I generally know the word -- is a major abstraction away from how things really work under the hood.

Regular Expression is a tiny language. After a week or so of working with it, I had most of it memorized without trying. But I have a little cheatsheet on my office wall, next to my computer, that I glance at sometimes. I guess, for me, a good "UI," would just be that cheetsheet -- maybe it would pop up if I clicked a button next to wherever I type the regular expression.
posted by grumblebee at 12:49 PM on March 20, 2010

Do you honestly know, without checking a reference, how to specify named capture groups? Non-greedy repeat blocks? A block that can repeat 5-10 times? lookahead filtering?

Some of those, yes, at least using the Perl syntax. Though I'll admit that when I'm trying to do a regex in Vim the different escaped/literal conventions drive me nuts and I have to use a reference to get almost anything at all done.

But even so, I don't know that's a good test of whether or not they're good tools. There's all kinds of stuff I don't keep in my head that I just look up when I need it. Part and parcel of programming, I think.

And while I think your UI idea sounds like a good one for a regex builder, but I don't know how it fits into a language context... wouldn't you either have some intermediate representation (probably that would have a lot in common with current symbolic regexes), or have a non-text-oriented language?
posted by weston at 2:12 PM on March 20, 2010

A UI -- in the sense I generally know the word -- is a major abstraction away from how things really work under the hood.

A UI is just a clear presentation. An options dialog is a UI, and it's much clearer and easier to change than a configuration file, and with tooltips, it supplants help files. It's not a major abstraction because it doesn't need to be.

The point of my RE UI is this:
- the time to find functionality is reduced because everything is presented in the interface
- the time to learn functionality is reduced because documentation is replaced by tooltips
- the time spent debugging is reduced because you don't have to worry about matching parentheses or escaping
- the UI is standard, so you don't have to worry about learning for every environment.

So, this is more than just drop-down if, while, etc.

Any expert can say that text-only entry is ideal. I'm not an expert, and I don't see why I should sink my life into committing this stuff to memory.
posted by esprit de l'escalier at 2:59 PM on March 20, 2010

Some of those, yes, at least using the Perl syntax. Though I'll admit that when I'm trying to do a regex in Vim the different escaped/literal conventions drive me nuts and I have to use a reference to get almost anything at all done.

exactly.

But even so, I don't know that's a good test of whether or not they're good tools. There's all kinds of stuff I don't keep in my head that I just look up when I need it. Part and parcel of programming, I think.

...part and parcel of programming today. It doesn't have to be like this.

And while I think your UI idea sounds like a good one for a regex builder, but I don't know how it fits into a language context... wouldn't you either have some intermediate representation (probably that would have a lot in common with current symbolic regexes), or have a non-text-oriented language?

Yeah, you're right. Either would work. The intermediate representation would be okay for now, but in the long run, I'd like programming languages to move away from a completely text-based representation. I realize that it would be a big change, but it would open up a lot of possibilities.
posted by esprit de l'escalier at 3:02 PM on March 20, 2010

I would love to experiment with a programming language that is not text-based. Have you ever seen an example of one that worked? That was either easier to use than a text-based one or had key features that a text one lacked?

By not-text-based, I'm assuming you mean a language that's not a linear arrangement of symbols, because HEART-SYMBOL SUN-SYMBOL CLOVER-SYMBOL is, to me, just as much text-based as "if then."

I'm guessing you mean some UI in which you drag tokens around and arrange them in a spacial relationship with each other that is not necessarily linear -- and then maybe drew lines and arrows connecting them.

It's a cool thought, but I don't get how this could ever have the expressive power (or speed of construction) as a text-based language. I admit I may be thinking inside a box from which I can't escape.

(Various IDEs, e.g. X-Code, allow you to drag and drop to build your app's UI, but they don't extend this approach to business logic.)

Incidentally, I'm guessing this is not what you're talking about, but Macromedia (Adobe) has tried -- and failed -- for years to make programming easier. There's a lot of money it for them if they succeed, because Director and Flash are marketed to non-programmers. IF they could come up with a simple UI to help the non-programmers program, they'd quadruple their sales.

Their first attempt was LOGO, a language that expressed formal constructs in a lazy-looking, English-like language. Instead of writing for ( x=0; x < 10; x++), you'd write something like "Do the following ten times." In the end, the casual language so obscured the logic (and was so verbose), they had to rewrite the language to make it more C-like.

Then, in Flash, they tried letting people code via forms. You'd pick which statement you wanted via a dropdown, and then, after you'd chosen, say, "for," you'd get a form that allowed you to fill in the details: starting value____, conditional test____, ending value_____.

No one used it.

When I taught Flash programming, the beginners would say, "Do we really have to type everything? Isn't there a way to compose the code by choosing commands from menus or something?" So I showed them the form-thing. They were overjoyed but stopped using it after half an hour or so. It was just way to slow.

Adobe has removed it from Flash. Now, their version of programming for non-programmers is canned scripts.
posted by grumblebee at 7:07 PM on March 20, 2010 [1 favorite]

I would love to experiment with a programming language that is not text-based. Have you ever seen an example of one that worked? That was either easier to use than a text-based one or had key features that a text one lacked?

Have you seen so-called procedural languages? One place they work really well is visual effects software, like houdini. The nodes in the visual effect software are stand-ins for functions, which used to be implemented as individual unix programs in the very early days. Having a visual interface meant that they could be connected with wires (instead of unix pipes) and visualized. The network topology, which is a directed acyclic graph, allows lazy updating, so that changing one of the parameters on a node only causes nodes below that node to update. The language is turing complete, but you'll probably need to download the software from their website to get a feel for the language. Animation is pretty tricky to learn though, even for a programmer.

A text-based language wouldn't work because these nodes (functions) have so many parameters that it would be impossible to keep track of them: f(blah = [5,2,3], ............

It's a cool thought, but I don't get how this could ever have the expressive power (or speed of construction) as a text-based language. I admit I may be thinking inside a box from which I can't escape.

I don't mind text entry, but

1. the code should not be stored as text:
Why can't it parse my entered text, and then store it as parsed XML? Then, I could rename variables without search-and-replace (which is error prone), factor out functions automatically (I know that there are scripts, but they don't work super well), etc.

2. the code should not be displayed exactly as it's stored
having control of the display means that I don't need to type out the class definition apart from the declaration. It means, hiding comments or viewing them, or displaying the code in the myriad of ways I might want it.
posted by esprit de l'escalier at 8:51 PM on March 20, 2010

I would love to experiment with a programming language that is not text-based.

I experimented with something called Prograph a long time ago. Simple programs were interesting. I didn't get deep enough to figure out how hard complex programming would be. Based on my limited experience with MAX and puredata, I suspect that while they're generally capable, you actually start to miss text after a while. You know the feeling of fluidity software has compared to patching hardware together? That's what I start to feel like when I explore dataflow languages after a while. Maybe there's something better down that road.

Scratch is kindof interesting. It's still pretty much statements/symbols, but they're sortof like statement-legos... there's a literal block-quality to them and how you snap them together. I feel like it's limited (intentionally) but I wonder if you could explore this. Still, after a while, I suspect a programmer would probably prefer just text.

I'm a fan of autocompletion and inline hints offered by an editor, but I don't know how much better things are going to get than that. For a lot of human-computer interaction, typing is the highest-bandwidth option, particularly if we're talking about symbolic work like programming.
posted by weston at 9:00 PM on March 20, 2010

Yeah, I've worked with stuff like that. I agree it's okay for SFX. But I'm talking about business logic (though I fucking hate that term). Do you see any way that a non-text-based UI would be helpful for, say, writing a sort algorithm, creating a complex DB query or validating a credit-card number? Or for creating a macro structure, such as a Model-View-Controller pattern (not just the view part)?

1. the code should not be stored as text:
Why can't it parse my entered text, and then store it as parsed XML? Then, I could rename variables without search-and-replace (which is error prone)

SEARCH AND REPLACE????

No one should be doing that! Work in an IDE with refactoring tools. If I change an identifier name in one part of Eclipse, it updates all references to that identifier.

Why can't it parse my entered text, and then store it as parsed XML?

You might want to look at Adobe Flash Builder (a.k.a. Adobe Flex). It allows you to code interfaces using either drag and drop or xml. At compile time, the xml is converted to Actionscript (Flash's C-family language), but you never see the Actionscript (unless you want to). It's sort of a reverse of what you talked about. I'm not a fan of it. I prefer to code in straightforward Actionscript, but many people like using the xml.
posted by grumblebee at 9:06 PM on March 20, 2010

« Older Uncut Vinyl | To Infinity and Beyond! Newer »

This thread has been archived and is closed to new comments

MetaFilter

Regular Expressions Can Be Simple and Fast
March 17, 2010 5:55 AM Subscribe

Tags

Share

Regular Expressions Can Be Simple and Fast March 17, 2010 5:55 AM Subscribe

Tags

Share

Regular Expressions Can Be Simple and Fast
March 17, 2010 5:55 AM Subscribe