Waiting for Godel
May 26, 2013 2:44 PM   Subscribe

Perl Cannot Be Parsed: A Formal Proof In the man page for PPI, Adam Kennedy conjectures that perl is unparseable, and suggests how to prove it. Below I carry out a rigorous version of the proof, which should put the matter beyond doubt.
I've become interested in the question because I've just released an alpha version of a general parser (Parse::Marpa) on CPAN, which I think will allow static parsing of large portions of Perl 5, and I wanted to know what is achievable. Parse::Marpa accepts any BNF that's free of infinite loops. The BNF can be recursive, have empty productions or even be ambiguous. If Marpa works for parsing Perl 5, it will do it with a lot less cruft than ad hoc solutions. Parse::Marpa is based on new research into combining LR(0) precomputation with Earley's algorithm and so far speed seems good -- quite acceptable for utility purposes.
posted by jenkinsEar (70 comments total) 16 users marked this as a favorite
 
So what does parsableness mean vizavis computer language and what does this have to do with Godel?
posted by ishrinkmajeans at 2:52 PM on May 26, 2013 [1 favorite]


I think Perl was modeled after human language, so I'd guess the solution lies in there. If my memory is correct, Larry Wall wanted perl to be like spoken languages in that one doesn't need to know all the grammatical rules or vocabulary before a person can communicate. So I guess that makes Perl easy to assemble but hard to disassemble?
posted by drowsy at 2:56 PM on May 26, 2013


So I guess that makes Perl easy to assemble but hard to disassemble?

Perhaps hard to manage, depending on whose idioms you're reading.
posted by Blazecock Pileon at 3:00 PM on May 26, 2013 [1 favorite]


I think this is related to the distinction between a Context Free Grammar and a Context Sensitive Grammar.

A context free grammar is linearly reducible by a turing machine based on a set of regular rules. A turing machine is a set of states and rules, each state having a set of rules for a given input. A context sensitive grammar has multiple rules for the same state, so the parsing is not linearly deterministic - what will differentiate the application of the overloaded rules is if one of them leads to a parsable input, while the other leads to an error.

A context sensitive grammar can be simulated in a context free grammar with the addition of backtracking and extra state variables, but the parsing is very inefficient, compared to a context free grammar that can be read left to right without having to back up and try other parsing options.
posted by idiopath at 3:13 PM on May 26, 2013 [2 favorites]


The halting problem means that if you are going to work with Perl and you don't immediately halt, you have a problem
posted by East Manitoba Regional Junior Kabaddi Champion '94 at 3:14 PM on May 26, 2013 [31 favorites]


So I guess that makes Perl easy to assemble but hard to disassemble?

That Perl is a write only language has been a programmer joke for a very long time.
posted by hanoixan at 3:17 PM on May 26, 2013 [12 favorites]


I think Perl was modeled after human language, so I'd guess the solution lies in there. If my memory is correct, Larry Wall wanted perl to be like spoken languages in that one doesn't need to know all the grammatical rules or vocabulary before a person can communicate. So I guess that makes Perl easy to assemble but hard to disassemble?

Well, except that parsing natural languages is decidable. In fact, natural languages are widely believe to be in the set of "mildly context sensitive" languages [pdf], which means not just that parsing is decidable but there are also guaranteed to be efficient (polynomial time) parsing algorithms. So if parsing Perl is not even decidable, then someone screwed up.

(Incidentally, as a linguist, I have always thought that it is a terrible idea to try to make the syntax of programming language like that of a natural language...)
posted by advil at 3:20 PM on May 26, 2013 [9 favorites]


As a programmer, I agree with you
posted by RustyBrooks at 3:22 PM on May 26, 2013 [9 favorites]


This post is basically a scroll of Summon Chromatic.
posted by boo_radley at 3:30 PM on May 26, 2013 [6 favorites]


In popular culture terms: perl is Skynet.

Hmm, maybe perl is part of Skynet, the parts that we can't integrate with.
posted by Mario Speedwagon at 3:31 PM on May 26, 2013


How does it do its job if it isn't parsable?
posted by gjc at 3:32 PM on May 26, 2013


Marpa is actually pretty neat and the papers and other academic-ish stuff that go along with it is well worth a read if you're into parsers and such.

And queue the Perl hating trolls....
posted by zengargoyle at 3:32 PM on May 26, 2013 [1 favorite]


Can anyone match the date SkyNet comes online, with the date of any Perl 6 preview versions? I think it would be very instructive.
posted by wenestvedt at 3:33 PM on May 26, 2013


Being theoretically unparseable or equivalent to the halting problem is not a particularly interesting property for a designed system. If anything it's evidence that Perl just has a design wart, no particular news to anyone who's worked with it. The Hacker News discussion may be useful too.

In other Perl news, I was surprised to learn recently that the rough consensus among hackers is apparently that Perl 6 is at a dead end, and that the future of Perl is Perl 5. I switched from Perl to Python 10 years ago and have never looked back, so I must have missed the memo.
posted by Nelson at 3:34 PM on May 26, 2013 [1 favorite]


gjc: "How does it do its job if it isn't parsable?"

Parsing a programming language is the act of figuring out what it means syntactically without carrying out the instructions invoked.

Perl is designed in such a way that you cannot know what a symbol will mean (whether "/" will be interpreted as a regular expression delimiter or a division operator in this case) without evaluating the code. This is as compared to languages that have formal definitions such that a static analysis can be carried out without executing the program. This is significant because static analysis allows the compiler to optimize the program - produce an alternate version of the program that requires fewer instructions to execute, but is guaranteed to produce an identical result.
posted by idiopath at 3:38 PM on May 26, 2013 [13 favorites]


perl is a glorious mess
posted by Annika Cicada at 3:39 PM on May 26, 2013 [11 favorites]


How does it do its job if it isn't parsable?

Well, the fact is that most real world implementations of _any_ programming language don't do exactly what the spec says, for reasons that are typically a little more trivial than this and seldom arise in practical life. (E.g. the spec is weird on some obscure point, something is hard to implement right, a bug in a reference implementation becomes standardish, etc. For an example, see this (pdf) interesting, if slightly outdated, discussion of the sml spec.) So presumably if you tried to use the relevant construction in that proof to write a halting-detection function, some deviation from what theoretically should happen given the spec will prevent such a function from working. I can only imagine what it must be like trying to actually write a reference implementation of perl.
posted by advil at 3:48 PM on May 26, 2013


Knowing perl got my career jumpstarted pretty damn fast, and for that, I'm grateful.

Many other people can say the same thing.
posted by seanmpuckett at 4:00 PM on May 26, 2013 [5 favorites]


That Perl is a write only language has been a programmer joke for a very long time.

Joke?
posted by DU at 4:00 PM on May 26, 2013 [7 favorites]


I still love Perl. Now I'm writing code for no-one but me, I love the way it can be made to sing, shout, gibber, chuckle, scream and gurgle. Sometimes you get these big ugly data structures you're really never going to have to deal with again, so some judicious shotgun debugging with Data::Dumper::Simple and some random permutations of \, @, {} and [] will pull out the stuff you need right now.

I don't care if it's not pure. I don't care if it's not clean. It works.

(the only thing I wish it had was RPN syntax, because some problems explain themselves so much better with stacks. Plus, the potential for code lulz would be immense!)
posted by scruss at 4:02 PM on May 26, 2013 [5 favorites]


A proof for what we've all known all along.
posted by deathpanels at 4:02 PM on May 26, 2013


I have never been able to understand why anyone would want to code in Perl.

(C++ all the way.)
posted by Foosnark at 4:11 PM on May 26, 2013


that the future of Perl is Perl 5. I switched from Perl to Python 10 years ago and have never looked back, so I must have missed the memo.

Yeah I think this is pretty common. I can't remember the last time I did anything in Perl (unless you count PHP, which is a whole other kettle of excrement). Python is increasingly being turned to as the stitch-other-things-together-and-make-them-work language.
posted by Jimbob at 4:12 PM on May 26, 2013


Foosnark: "(C++ all the way.)"

It is possible to write a C++ program that invokes an infinite loop at the parsing stage while compiling (in practice it will either crash the compiler or make it complain about the parsing stack being blown).
posted by idiopath at 4:14 PM on May 26, 2013 [1 favorite]


Looks like we all misunderestimated it.
posted by blue_beetle at 4:36 PM on May 26, 2013


I LOVE Perl, even though I know all the cool kids use Python for everything. Its hackishness and difficult to read-ness and messiness and weirdness is what drew me to it in the first place (ok, so all the scripts being written in perl at my first job is what actually drew me to it, but that's what KEPT me using it).
posted by pravit at 4:43 PM on May 26, 2013 [1 favorite]


Now I'm writing code for no-one but me

Future you is gonna hate current you if they have to maintain your code.
posted by flaterik at 4:46 PM on May 26, 2013 [7 favorites]


just re-write it?
posted by Annika Cicada at 4:55 PM on May 26, 2013 [1 favorite]


Academics really need to stop using engineering language.
posted by effugas at 4:58 PM on May 26, 2013


> Future you is gonna hate current you if they have to maintain your code.

What is… maintain? Is that like going easy on the self-modifying code because someone else might not understand your definition of “works”? Or is it putting your code in a shell wrapper which does some things in sed so that you can massage a new input format to be the same as the old one?
posted by scruss at 5:14 PM on May 26, 2013 [1 favorite]


True fact: my personal blog still runs on Blosxom, this 10 year old Perl hack that runs as a CGI. It still kind of works but I'm scared to even try to touch it. I had to go modify it a few years back because some of the code rotted and I still have the mind-welts to $#{ show $_> for ${obj}it.
posted by Nelson at 5:23 PM on May 26, 2013


(oh, and while we're on the subject of Python: that language has no soul. Cold, dead shark eyes and a "fitter, happier" robovoice. Syntactically-relevant whitespace? C'mon! The only other language that did that was Acme::Bleach, and it was meant to be a joke.)

A sign I still have above my desk:
 __________________________
 |IN CASE OF PYTHON ATTACK|
 |                        |
 |        s/^\s+//;       |
 |________________________|
posted by scruss at 5:27 PM on May 26, 2013 [14 favorites]


So now I realize that the worst thing about Perl is that its advocates are willing to have the same whitespace conversation in perpetuity whenever the topic of Python comes up.
posted by invitapriore at 5:31 PM on May 26, 2013 [21 favorites]


Though I grant that your sign is pretty funny, scruss.
posted by invitapriore at 5:33 PM on May 26, 2013 [1 favorite]


On the other hand, Perl's detractors also get to have the same linenoise conversation in perpetuity. Balances out overall, I find.
posted by We had a deal, Kyle at 5:34 PM on May 26, 2013 [4 favorites]


Fair enough!

(Also, as far as the cool kids go, I think they're all reimplementing their Python components in Go.)

Let it be known additionally that this is my comment no. 1337.
posted by invitapriore at 5:36 PM on May 26, 2013 [2 favorites]


I know several cool python kids taking the Go path, but I am not yet one of them. I left Perl in disgust over a decade ago. It just stopped being worth the energy trying to guess at punctuation strings to pull a data structure apart, when python just had dots and square brackets.
posted by migurski at 6:07 PM on May 26, 2013


People who don't understand why anyone would ever want to write something in perl are most likely people who have never written a shell script with awk in it. It's still hard to surpass at its original killer application, writing scripts and acting as cross-process glue code on unix-based systems. And it's actually not as hard as people say it is to write good, maintainable perl, you just have to be a little less lazy about it. (On the other hand, it is just as easy as people say it is to write awful, unmaintainable perl code.)

All that being said, these days once my own scripts get too big I rewrite them in python, not perl.
posted by whir at 6:13 PM on May 26, 2013 [2 favorites]


Guido’s rationale for omitting multi-line anonymous functions from python is why I will always love the language:
The unspoken, right brain constraint here is that the complexity introduced by a solution to a design problem must be somehow proportional to the problem's importance.



And there's the rub: there's no way to make a Rube Goldberg language feature appear simple. Features of a programming language, whether syntactic or semantic, are all part of the language's user interface. And a user interface can handle only so much complexity or it becomes unusable.
posted by migurski at 6:14 PM on May 26, 2013 [2 favorites]


From TFA: It's very evident that Perl 5 is Turing-complete. No referee at a math journal would require something that obvious and that tedious to be proved.

No, I insist. Turing-Complete doesn't just mean you can simulate a Turing Machine. It means you can simulate ANY single-tape Turing Machine. A Turing Machine is a hypothetical construct, it has an infinite supply of tape, so no computer language running on any physical computer is Turing Complete.

That sort of article convinces me of the general weakness of mathematical skills of computer programmers. A short quote from his extremely bad essay on The Halting Problem:

These articles will avoid mathematical notation in favor of Perl 5. Perl 5 has its disadvantages for this work but it is an excellent way of presenting algorithms, in many ways much less awkward than the mathematical notations.

And for reasons that are obvious (at least to me) Perl algorithms are far less rigorous than mathematical notations, and for his purpose of constructing a proof, essentially useless. You can't invoke the Entscheidungsproblem and Godel when you need a remedial class in propositional calculus.

Programmers like this bug the shit out of me. Go learn some language that is more rigorous like APL. Oh no! That would require hard math skillz. Math is hard! That's why we became programmerz, so we could do mathy stuff without actually knowing any math, the computers know all that math shit, don't they?
posted by charlie don't surf at 6:29 PM on May 26, 2013 [4 favorites]


no computer language running on any physical computer is Turing Complete.

No Turing machine existing in physical space is Turing complete either, by that argument. Perl 5 exists as a language and language specification apart from any specific realization, just like a Turing machine does, right? So even though a specific implementation in the physical world isn't Turing complete, you can still reason about the language as being Turing complete.
posted by BungaDunga at 6:42 PM on May 26, 2013 [8 favorites]


As programing languages evolve they approach perl5. If your favorite language isn't as weird as perl, it just isn't that useful.
posted by humanfont at 6:57 PM on May 26, 2013 [4 favorites]


No Turing machine existing in physical space is Turing complete either, by that argument.

Now you're getting it.

Perl 5 exists as a language and language specification apart from any specific realization, just like a Turing machine does, right? So even though a specific implementation in the physical world isn't Turing complete, you can still reason about the language as being Turing complete.

Not the way this guy does it. There doesn't seem to be any unambiguous interpretation of how to turn this hypothetical abstract Perl specification into a specific instance of a real coding language. So it is futile to try to establish whether Perl can be unambiguously parsed, when they can't even agree on how Perl is parsed.

As programing languages devolve they approach perl5.

FTFY. As programming language evolve, they approach APL, which is basically an attempt to implement pure mathematical notation as a coding language. Mathematica is a fairly good attempt.
posted by charlie don't surf at 7:09 PM on May 26, 2013 [1 favorite]


Well, that explains why everyone uses APL derivatives these days.
posted by whir at 7:38 PM on May 26, 2013 [5 favorites]


Math class is hard!

I remember when I was a young programmer, learning FORTRAN. It was originally conceived as a rigorous way of translating math formulas into code, thus FORmula TRANslation. But Comp Sci was a young science, and the few Grad Students I knew who attempted to prove these languages had any sort of mathematical rigor were soon stymied. In those days, the old style of plugboard computers were just dying out, and high speed computation starting. It was hard enough to get practical answers without glaring math mistakes like iterating a non-convergent series, or getting a cumulative rounding error. I remember times when I checked my computer program's results with my slide rule.

Then Pascal came along, and the concept of Structured Programming. Yeah, that didn't work out so well either. Pascal was an attempt to implement some very abstract concepts like Backus-Naur Form as used in Algol. This notation would supposedly enable algorithmic designs to be uniquely parsed and accurately translated into code. There were gadgets like Nassi-Schneiderman diagrams that attempted to make it practical to produce algorithms rigorous enough to be proven correct. It was easier to use the Railroad Tracks notation, I still remember having the famous Apple Pascal poster hanging over my desk, I would spend hours staring at it while coding. We used to take Algol pseudocode straight from Knuth's books, run it through Wirth's notation, and churn out code that we thought was rigorous. That was a naive misconception, but we were young and dumb enough to believe what they taught us in college.

Well it isn't getting any easier to prove rigor. You can be a reductionist and take it down to single bit flipping on a simulated Turing Machine. And then you can take it back up to a higher level language and try to parse that down to Turing Machine operations. Both directions are foolish, if you don't understand Turing. And nobody does. Nobody has even read Turing. If you read Turing, you wouldn't be trying to use the Entscheidungsproblem to prove that Perl is uniquely parseable, you'd be citing something like Computability and Lambda-Definability. But that's too hard. It uses real math notation. Math is hard!
posted by charlie don't surf at 8:38 PM on May 26, 2013 [5 favorites]


No Turing machine existing in physical space is Turing complete either, by that argument.
Now you're getting it.


Everybody who knows what they're talking about is already this taking into account when they talk about Turing completeness.

There doesn't seem to be any unambiguous interpretation of how to turn this hypothetical abstract Perl specification into a specific instance of a real coding language. So it is futile to try to establish whether Perl can be unambiguously parsed, when they can't even agree on how Perl is parsed.

Except that I guarantee you that there's a subset of Perl that is decidable, and furthermore that that subset permits variable assignment, conditionals, subroutines and loops (does Perl have goto?) and is therefore Turing-complete.
posted by invitapriore at 8:49 PM on May 26, 2013 [4 favorites]


Nobody has even read Turing.

I mean, I'm a baby at this stuff, but I'm reading 'On computable numbers' right now- well, Petzhold's "The Annotated Turing,' which contains the whole paper with annotations and examples. Understand? maybe not quite yet...

Also, that article online about how Perl (and every other language) doesn't even implement regular expressions* as finite automata which, I mean, what?!

*because perl and perl-like regular expressions aren't actually regular...
posted by BungaDunga at 8:49 PM on May 26, 2013


I'm glad I know and have used Perl to solve real problems in the real world. Still, I wish Clojure's library ecosystem was up to the standard of Perl or Python so it could occupy that space in my toolkit.
posted by ob1quixote at 9:01 PM on May 26, 2013


FTFY

No you broke it. Typical. Some guy comes in, claims to fix the Perl script that has been working fine for years; and in doing so breaks it, because who can be bothered to learn to run the Unit tests we provided. Then he complains about all the difficult to maintain Perl code; even though we seem to be making our ship dates. Eventually consultants are paid to rewrite things in Java or whatever the REAL language of the moment is. The new code is fragile, slow and lacks the features we had in Perl. Eventually we go back to the Perl script.
posted by humanfont at 9:06 PM on May 26, 2013 [14 favorites]


Well it isn't getting any easier to prove rigor.

I'm sorry Pascal didn't work out for you. Maybe try Agda or Idris?
posted by Jpfed at 9:19 PM on May 26, 2013 [1 favorite]


perl is a glorious mess

Oh God, that's so true! Except the part about it being glorious.
posted by newdaddy at 10:12 PM on May 26, 2013 [1 favorite]


Except that I guarantee you that there's a subset of Perl that is decidable, and furthermore that that subset permits variable assignment, conditionals, subroutines and loops (does Perl have goto?) and is therefore Turing-complete.

Yeah, that's The Turing Myth.

Turing Thesis: Whenever there is an effective method (algorithm) for obtaining the values of a mathematical function, the function can be computed by a TM.

Turing's thesis has since been reinterpreted to imply that Turing Machines model all computation, rather than just functions. This claim, which we call the Strong Turing Thesis, can be found in today's popular undergraduate theory textbooks:

Strong Turing Thesis: A TM can do (compute) anything a computer can do.

This bold claim is part of the mainstream theory of computation, yet turing himself would have denied it.. In fact, the Strong Turing Thesis is incorrect -- the function-based behavior of algorithms does not capture all forms of computation.


This is the biggest misconception about Turing's work. He didn't think about programs as algorithms, he thought about programs as mathematical functions. A TM algorithm is a representation of a function. If the function can be encoded to run on a TM, it is Turing Computable. This is a huge restriction that makes it very difficult to use a TM as a Universal Computer.

So none of that is going to help you prove that Perl is uniquely parsable. If you want to do that, you'll have to reduce Perl to a set of complete system of formal logic. Alas, a Turing Machine is a crappy tool for doing Lambda Calculus.
posted by charlie don't surf at 10:39 PM on May 26, 2013 [1 favorite]


I'm not really sure why you keep harping on how difficult it is to saw wood with a hammer, but anyways, the question of Turing-completeness just refers to whether a Turing machine can be simulated in a given system, right? The idea being that any system in that category can emulate any other one?

I think you've seized on an argument that nobody is making, that all computation can be modeled with a Turing machine, which ok, I believe you, but it doesn't really seem relevant to the issue of Turing-completeness.
posted by whir at 11:38 PM on May 26, 2013 [1 favorite]


TMs cannot compute all problems, nor can they do everything that real computers can do. (from the Myth paper).

But then you'd expect Perl to be more powerful than a Turing machine, so it should be able to be Turing complete and more- so why does that mean a subset of Perl isn't simulable with a Turing machine? Specifically, a subset that obeys the TM I/O restrictions.

Alas, a Turing Machine is a crappy tool for doing Lambda Calculus.

On the other hand, didn't Turing do exactly that, to show his model is as powerful as Church's? He literally builds a Lambda Calculus TM which mechanically performs reductions and so on! Why can't we do the same thing with modern languages, by building a Turing machine simulator, to show they're Turing complete? Since Turing has done the hard work of reducing his Turing machines to propositional calculus, you've shown that your computer can calculate any recursively computable function (if not more, since it's probably actually a decision machine).

At least some of Turing machines' limits hold in our computers- we haven't found a way to use them to calculate halting problems, for one. If we had an oracle that could do it, we'd have an oracle machine, but we haven't found one. Using the environment (as a decision machine can) doesn't seem to help with this either. If we had an oracle machine we'd get stuck anyway, since it wouldn't be able to calculate other oracle machine's halting problems unless you find a better oracle, etc. It seems likely that you'd have the same problem if you had a decision machine that could calculate TM halting problems.
posted by BungaDunga at 12:03 AM on May 27, 2013


The linked article uses a lot of words to formally explain something that is really very simple. Perl distinguishes between a compilation phase and an execution or runtime phase, and it allows the user to choose which phase a given block of code is run in. As a crude timeline example:
parse... parse... parse... run_some_code... parse... run_some_code...
|--------------- compilation phase ------------------|--runtime phase-|
The first run_some_code can potentially have side effects that influence any parsing that comes after it, so its execution can't be skipped. That means that anything that parses Perl must also be able to execute Perl, making tooling a lot harder to write -- there's an old saying that the only thing that can parse Perl (capital-P: the language) is perl (lower case p: the reference implementation.) And it means that in theory you could write a program that can't be parsed -- just write an infinite loop that runs during compilation phase, which would prevent the parser from ever being able to see anything past that point.

In practice, these issues don't matter too much. Being able to write a program with an infinite loop at compile time is not in any way interesting or relevant except in a very remote academic sense. It's otherwise indistinguishable from a bog standard infinite loop in any other language. Most actual Perl programs do only a minimal amount of code execution during compilation phase. The most common thing is evaluation of the use statement, which causes a module's import hook to be executed (which adds symbols to the current namespace) before parsing resumes. And for the most part, regular everyday code does not depend on pathological corner cases of parsing, such that it's possible to almost-parse Perl without having to actually execute Perl as a side effect. PPI is an example of this.

Also, this is no excuse to pile on Perl. C++ templates for instance are Turing complete and allow you to do the same kind of metaprogramming -- that is, write code that is executed at compile time rather than runtime. You could certainly create "unparseable" C++ with an infinite template instantiation loop. Most implementations would probably die with an error after a certain nesting depth is reached rather that actually looping forever, but it's exactly the same idea. The C++98 standard lists in Annex B a suggested implementation limit of 17 for recursive template instantiation depth, but this is strictly a suggestion and not a language requirement. If you argue that an infinite loop is undefined behavior then instead you could easily write something that doesn't require infinite nesting but does require a deeper level than any exiting implementation supports and it would be valid code that is theoretically parseable but not actually parseable. In any case, the point is that Perl is not special in allowing code to be executed at compile time.
posted by Rhomboid at 2:41 AM on May 27, 2013 [5 favorites]


Re: "In fact, natural languages are widely believe to be in the set of "mildly context sensitive" languages [pdf]"

My best understanding of the issue so far was that only two natural languages have been considered context sensitive so far, and only one of those, Swiss German, is widely believed to be context sensitive.

See http://www.eecs.harvard.edu/~shieber/Biblio/Papers/shieber85.pdf

So most natural languages (basically all of them) seem to be syntactically context free.

Am I mistaken?
posted by 2uo at 3:50 AM on May 27, 2013


And just to prove the point, here's a Perl program that has invalid syntax on even days of the month and valid syntax otherwise:
#!/usr/bin/env perl

BEGIN { *foo = ((localtime(time))[3] % 2) ? sub() {} : sub($) {} }
    
foo / 25 ; # / ; harbl *
Example using the faketime helper library which lets you lie to a process about the date or time:
$ date
Mon May 27 05:00:51 PDT 2013

$ perl -c foo.pl
foo.pl syntax OK

$ faketime '2013-05-28' perl -c foo.pl
syntax error at foo.pl line 5, at EOF
foo.pl had compilation errors.
posted by Rhomboid at 5:03 AM on May 27, 2013 [9 favorites]


Except that I guarantee you that there's a subset of Perl that is decidable, and furthermore that that subset permits variable assignment, conditionals, subroutines and loops (does Perl have goto?) and is therefore Turing-complete.
Yeah, that's The Turing Myth.


No, it's not. Turing completeness and the expressiveness of Turing machines vs. computers are orthogonal concepts. I'm done with this, though, it's pretty obviously cargo cults all the way down with you here.
posted by invitapriore at 6:56 AM on May 27, 2013


> I think this is related to the distinction between a Context Free Grammar and a Context Sensitive Grammar.

Sorry, this is not so. Context-sensitive grammars can be parsed by a linear bounded automaton, which means that you can parse it in a bounded amount of time.

The claim is that parsing Perl is undecidable - which means that you need the full power of a Turing machine, and that you cannot bound how long a parse will take.

The proof seems dead easy - except that I don't understand the tiny Perl snippet that produces the paradox:
    whatever  / 25 ; # / ; die "this dies!";
But I'm completely willing to believe it has the properties claimed, and if it does, then yes, parsing Perl is undecidable and cannot be accomplished by a context-sensitive grammar.
posted by lupus_yonderboy at 9:56 AM on May 27, 2013


There are two factors at play here: Perl allows you to omit the parentheses when calling a function that has already been defined, and Perl allows for optionally specifying the arity of a function definition.

If whatever has been previously defined as a function with arity zero (a function taking no arguments), then the line is interpreted as
whatever() / 25; # ...
On the other hand, if whatever has been defined as a function with arity one then it's interpreted as
whatever(/ 25 ; # /); die "this dies!"
where /.../ is the regex literal syntax.
posted by Rhomboid at 10:09 AM on May 27, 2013 [4 favorites]


Regarding Perl vs Python...

When I first saw Python, I was taken aback. I hated the significant whitespace - the language felt too "plain".

Luckily, the GOOG forced me to use it, and I changed my mind PDQ. What I realized is that I was looking for cheap thrills in exciting language syntax - but this was wasting some portion of my productive time.

And I changed 100% on the whitespace. I realized that if you presented me with, say, C++ code that was incorrectly indented I would tell you that this was "wrong" even though it compiled and ran correctly - because correct presentation of the code so that later maintainers don't make stupid mistakes IS a key part of programming.

Certainly I personally would never ever check in C++ code that had indentation errors.

So if I'm going to all this work anyway, why shouldn't it be working for me? Why do I ever need those { and }s?

The key thing I get out of Python is that I can write huge volumes of code that work correctly the first time. For me, this is the gift that keeps on giving - I'm simply much more productive in Python than any other (yes, even C++ where I have over 20 years' experience!)

The problem I have with Perl is that I cannot read other people's Perl - heck, I can't even read my own. (I mean that literally - I have a poetry mangler I wrote for the poet Abigail Child and for the life of me, I have no idea what I did... I can see that the code is clear and well-organized, but...)

There are also all sorts of traps that I fall into every time I try to Perl after not doing it for a few months. The classic one is using @ instead of $ or vice versa. The crazy variable naming offers me nothing of value, and provides numerous traps - I could never do production programming again.

As I get older, I also realize the idea that "There's more than one way to do it" is actually a terrible design philosophy. People are trying to express themselves by making cosmetic choices about their code - but it isn't getting the cows milked - and worse, each time I read a new Perl programmer's code, I have to spend a couple of hours figuring out How They Do It - even if that programmer is "me, a decade ago".

It's not that Python doesn't have multiple ways to do things - all mature programming languages do - but that there's often an "most Pythonic" (i.e. most obvious!) way to do things that nearly all programmers will take.

I was wrong. Plain is good! Nowadays, I want to express myself in my results - lots of solid functionality and design - rather than in the form of my code.

On preview: Rhomboid's explanation has the ring of truth - and also typifies everything I do not like about Perl. Why should I have to interpret a complex pun to understand how a line of code works? And while the example is clearly intended to be pathological, I've been bitten on the arity 0 vs 1 issue before...
posted by lupus_yonderboy at 10:18 AM on May 27, 2013 [8 favorites]


If you think Perl → Python is good for your sanity, try Python → Lua. Perl's motto is "there's more than one way to do it". Python is "there's one right way to do it". Lua is "there's only one way to do it, and once you figure it out, you're good". It's a lot of fun programming in a language with almost no syntax, one real data type, and simple straightforward libraries. Sometimes it's nice not to have choices. Lua's not great for general purpose programming, but it sure is awesome as an embedded extension language. The Nginx+Lua stuff is particularly fascinating.
posted by Nelson at 10:44 AM on May 27, 2013


I've looked at Lua, but I simply don't have the real need for a language that doesn't do general purpose programming.

From what I've seen, it hits the same unsweet spot as a moped - gives you enough power to get into trouble but not to get out of it.

It's likely also that as a language, it's become tainted for me because it's used for configuration files. I personally believe that having executable code in configuration files is a very bad idea...

These days I use Yaml for configuration files. It's very expressive - but still static data with no code in it (and it's backwards compatible with JSON!)
posted by lupus_yonderboy at 11:06 AM on May 27, 2013


charlie don't surf: Not the way this guy does it. There doesn't seem to be any unambiguous interpretation of how to turn this hypothetical abstract Perl specification into a specific instance of a real coding language. So it is futile to try to establish whether Perl can be unambiguously parsed, when they can't even agree on how Perl is parsed.

Most languages don't define complete operational semantics, even if they do provide an unambiguous grammar. Being able to statically parse Perl, while convenient for some meta-applications and analysis, is not necessary for its (interpreted) execution.
posted by qxntpqbbbqxl at 11:27 AM on May 27, 2013 [1 favorite]


lupus_yonderboy: "These days I use Yaml for configuration files. It's very expressive - but still static data with no code in it"

yaml is code

Yaml invokes constructors. A constructor that executes with side effects can do anything an arbitrary program with the same permissions can. Most languages allow side effects in constructors.
posted by idiopath at 1:11 PM on May 27, 2013


@lupus_yonderboy
do you have any substantial experience with Ruby? The reason I ask is because is that I'm considering investing some time in learning Python. I have several full-time years of Ruby experience and I'm interested in what people have to say about the long-term comparison between the two (I'm familiar with the bullet-point list of syntax differences, and have written a few hundred lines of Python, but without "learning" it -- I just looked at other Python code to figure out how to do what I wanted to do)
posted by lastobelus at 1:53 PM on May 27, 2013


I think the YAML thing is just Rails/Ruby being stupid, their synopsis of a fix was:
Replace our module_eval with a define_method
Change Psych to check super classes and ensure it’s a hash
Stop using normal Ruby methods to set hash key / value pairs
Pulling in data from outside your program and then eval-ing is bad, who'd have thunk that? Perl has thunk that. Perl has a taint mode that marks every bit of data that comes in from the outside and prevents you from using that data in dangerous contexts (eval, system, etc.) until you have done something specific to un-taint the data first. One of these days the other languages will pick up on this bit of Best Practice.
posted by zengargoyle at 3:49 PM on May 27, 2013 [4 favorites]


My best understanding of the issue so far was that only two natural languages have been considered context sensitive so far, and only one of those, Swiss German, is widely believed to be context sensitive.

See http://www.eecs.harvard.edu/~shieber/Biblio/Papers/shieber85.pdf

So most natural languages (basically all of them) seem to be syntactically context free.

Am I mistaken?


Well, if a language is context free, it is context sensitive. So in that response I was using the term as an upper bound; the question I was addressing the relative complexity of perl and natural languages, which should be the same given Wall's original goals for perl, and giving an upper bound is the most generous in terms of seeing how close they are.

But it isn't so simple. I think there are at least 5 or so really solid established cases where some language has been rigorously shown to be at least context sensitive and not context free (these include Bambara, Zurich Swiss German (linked above), Dutch (paywall), English (paywall, self-link), Yoruba). But the fact that there are few doesn't mean so much: it isn't trivial to really solidly develop an argument for non-CFness, such arguments aren't exactly all-encompassing of the entire language (but rather typically involve some very specific pocket that requires extensive fieldwork for non-native speakers to find) and such arguments aren't the primary concern of most linguists. Beyond that, they aren't easy to interpret. For instance, in the self-linked article above we argued that it is not the _syntax_ of English that leads to non-CFness (there is no need for a string copying operation), but the interaction of the syntax with the semantics. In general, if a language has not yet been shown to be non-CF, that doesn't necessarily mean it is strictly CF, just that no one has found the argument yet.

The general thinking, among those who think about this (not exactly a mainstream pursuit), is that any grammar formalism that expects to be sufficiently general should be able to handle at least string copying (which is a non-CF operation) and cross-serial dependencies; these are the two main ways in which natural languages have been shown to be outside the set of CF languages. I don't really expect these to be rare things that a grammar needs to handle, especially in terms of 'strong generative capacity' either, but for the stars to line up to make a really clean non-CFness argument out of such phenomena is somewhat rarer.

I'm not sure how you are judging "widely believed" but I think my beliefs, in broad terms at least, are pretty mainstream among those I know who are interested in mathematical linguistics.
posted by advil at 5:54 PM on May 27, 2013


On Formally Undecidable Propositions of Principia Mathematica and Related Systems, Kurt Gödel, Dover Publications, 1992

Gödel's Proof, Ernest Nagel & James Newman (Douglas R. Hofstadter, Editor & Foreword), NYU Press, Revised edition 2008
posted by ob1quixote at 7:23 PM on May 27, 2013


Waiting for Godel wins for best title evar.
posted by Mental Wimp at 11:24 PM on May 27, 2013


« Older “Now I feel like I’ve earned my medal."   |   Iraq's constitution has something America's... Newer »


This thread has been archived and is closed to new comments