How To Program
October 24, 2009 8:37 AM Subscribe

A free computer-programming course on reddit. Click "prev" for more lessons. 113 lessons so far.
posted by grumblebee (89 comments total) 152 users marked this as a favorite

Superb. Probably better than any of the introductory programming books I read at university.

My personal guidelines for learning how to program:
1) Learn assembly language (preferably for a simple CPU)
2) Optimize for fun and learning, not program performance. That is to say, use a language that speaks to you. Something like Ruby, Processing or perhaps Haskell. Maybe Python, but I don't like it personally. You might.
3) Do something diretly in your domain of interest. Use a thing you already know - don't be learning the code AND the idea you're trying to express at the same time.
4) Build it wrong, but build it.

Finally, theoretical computer science is both fascinating and beautiful, and borders on philosophy.
posted by krilli at 9:05 AM on October 24, 2009 [7 favorites]

Finally, something productive comes out of reddit! [/endsnark]
posted by tybeet at 9:10 AM on October 24, 2009 [1 favorite]

tybeet, I know what you mean but look closer - reddit regularly has some good stuff
posted by krilli at 9:14 AM on October 24, 2009

I liked these articles, well written introductions to the fundamentals.

However, I think this approach will leave a lot of people bored as even by the end of lesson 10 you haven't written a single line of code yet.
posted by sp160n at 9:27 AM on October 24, 2009 [1 favorite]

Learn assembly language (preferably for a simple CPU)

I am a huge fan of the SPARC32 architecture.
posted by jock@law at 9:30 AM on October 24, 2009

Holy crap, there's a use for the 'self' category.
posted by A Terrible Llama at 10:07 AM on October 24, 2009 [2 favorites]

However, I think this approach will leave a lot of people bored as even by the end of lesson 10 you haven't written a single line of code yet.

Agreed, and a lot of it is meaningless, like "convert 54 to binary." A far better way of going about this, from a pedagogical perspective, is to explain why. A far better introduction to programming is Eric Robert's Art and Science of Java. The reddit guide explains the science of computers, how they count, etc. But much more important for a beginner, I believe is to understand that "computer science is not so much the science of computers as it is the science of solving problems using computers." I know I have mentioned this before, Stanford'sthree-course Introduction to Computer Science is really a fantastic resource. After taking those three-courses you can really feel comfortable going to any programming language and have a great base to go and actually start understanding practical things like Eric Evans' Domain-driven design: tackling complexity in the heart of software and the canonical Design Patterns: Elements of Reusable Object-Oriented Software.

I don't mean to be too fan-boyish with Stanford, MIT's introductory courses are amazing, but do not have the same nice segue from course to course that Stanford has laid out. You can, however, go all the way through graduate courses. This is amazing, I think simply being aware of which texts are being taught is an invaluable service unto itself and half the battle for an autodidactic. The alternative is to simply read as much as possible and hope you hit the right book.

I'm beginning to feel like a NYT writer as I'm really beginning to shill for the establishment here, but for practical applications Harvard's extension school contain such gems as CSCIE 160 Java For Distributed Computing.

Really amazing the wealth of such high quality material available. I sort of hope this turns out to be computer science's Velvet Underground: not many people listened to it, but everyone that did went out and started a band.
posted by geoff. at 10:24 AM on October 24, 2009 [48 favorites]

Oh and I believe it is the first lecture in programming abstractions that Julie Zelenski briefly touches on why they take the approach they do to teaching computer science.
posted by geoff. at 10:27 AM on October 24, 2009 [2 favorites]

This is a good little primer. Perfect for someone who fell into programing by tripping over HTML in the art department.
posted by device55 at 10:36 AM on October 24, 2009

Eh. Not bad and certainly would get you somewhere. I do like reddit a lot, the quality is amazingly high.

But I have a lot of beefs with this presentation - the overall one being that he's not teaching you good habits from the get-go. This is "the hacker's guide to self-taught programming" - I think the fact that binary numbers are lesson 3 is a giveaway to that.

(Why shouldn't you teach binary numbers so early? Bangs for the buck. Learning binary really gets you nowhere at that early stage; every programmer will eventually need to learn it, but will have to go on and learn hex too; but you can go very far into programming and not know a thing about binary numbers.)

In fact, as I go on he's sort of teaching you facts at random. We have more binary, then in section 8 he introduces you to... include files? Most languages these days don't even have "include" files.

He's talking about code reuse. That's a key concept - but "include files" is absolutely the wrong way to think about code reuse.

OK, I'm a dozen into it and it is actually bad. Take section 8. What exactly is conveyed in section 8? He's vaguely teaching you about three concepts, sort of related, but it neither teaches anything specific, good practice, or a philosophical point.

I will end up writing such a thing myself :-D it is seeming inevitable, and here's what I might do if I wrote it today.

I'd start with a dramatis personæ: the three players are programs, data, and the physical world (it astonishes me he never really explains what a "program" is as far as I can see).

Data are simple things like letters and numbers that get put together into more complex things called data structures. Programs are specific recipes telling what to do with the data. Programs can affect and be affected by the real world, because some of that data also corresponds to real world inputs and outputs.

The next section would reveal the two dirty secrets of the trade.

The first that programming is fairly easy but writing programs that actually work is very hard. It's not a contradiction - it's quite easy to write a program that works once or twice on the data you already have - it's very hard to put a lot of such programs together so it works always on any data at all.

The second secret is that programming is fun and debugging is not - and yet most people rush through the programming part and spend disproportionate amounts of their time debugging, which can be very stressful.

At this point, I'd present the solution to all this - which is also the emphasis of the rest of the lessons - and that's automatic testing.

If you're going to become a programmer, you're going to be making all these tiny machines called programs, and if you want these machines to really work, you should be expecting to make a testbed for each one, so you can exercise it completely and show it really works.

When they build the Apollo mission, there were so many parts that if each one had worked with 99.99% correctness, there would still have been multiple failures on each flight - they had to build parts that were 99.9999% correct.

It's fairly easy to do with programs, it's fun, and it's the only way you'll ever be able to build all your little programs into a large system that works reliably.

End of introductory chapters!

At this point I'd pick a specific language, almost certainly Python because it's a free, modern language that's available on any platform, encourages good habits, is very testable, and encourages rapid development.

I'd start with four or five lessons teaching Python as simply a desk calculator. (And Python is seriously the best desk calculator ever made...) First we'd deal with numbers, then introduce named variables then strings.

A bunch of really short exercises.

Still in the desk calculator, I'd have a lesson on lists, then one on dictionaries, then two or three on using Python's built-in library functions. A bunch of short exercises after each of these lessons.

The next less would show how to actually create a "trivial" Python program. Such a thing is very short - two lines - but it would take a chapter to explain "passing parameters" and "the return statement".

The next chapter shows you how to test that tiny program. I say, "Here are specific magic instructions that will test your program and here are the results."

I'd also jump back for a philosophical point and say, "Up until now I've explained every part in detail. But you can't understand every part in detail - there simply aren't enough hours. You have to learn to follow the instructions of other people and hook up their programs and simply trust that it works that way. You can save a huge amount of time that way, but you have to be very careful to follow the directions super-carefully - and of course, you're testing your results."

There would be also be a shameful, bleh chapter here because at this point the poor sap needs to learn to use a text editor because they're starting to save and edit their programs.

I'd probably direct them to "the default text editor for their operating system" and apologize for the fact that the various programmer's editors are somewhat arcane, but that if they became serious about this they should go off and take a few evenings and get really good at emacs or some other two-letter editor.

After this necessary diversion, it'd be pretty similar to your regular programming course, where each set of lessons had a motivating example, probably taken from my own work to date (because I'm lazy...) - and also this is very tl;dr by now....

(But note that I still haven't introduced the concept of binary, and I made no attempt to avoid it!)
posted by lupus_yonderboy at 11:29 AM on October 24, 2009 [18 favorites]

Why is it that "programming" is a valid subject for this kind of lessons?

You don't expect "How to become a brain surgeon" on reddit, right? It makes it look like software development is still something that can be learned "in 118 easy lessons" and that's simply not true anymore.
posted by DreamerFi at 11:32 AM on October 24, 2009 [1 favorite]

Why is it that "programming" is a valid subject for this kind of lessons?

Pretty well anything you can learn is a valid subject for this sort of thing.

It makes it look like software development is still something that can be learned "in 118 easy lessons" and that's simply not true anymore.

I disagree entirely, as my posting above I hope demonstrates...
posted by lupus_yonderboy at 11:45 AM on October 24, 2009

You don't expect "How to become a brain surgeon" on reddit, right?

Oh, for fucking out loud. I simply want to learn a little bit more about programming. Just like I want to learn some Chinese.

I promise you: I will perform surgery on your brain in Chinese, or program a robot to do it.
posted by Dumsnill at 11:46 AM on October 24, 2009 [1 favorite]

Lesson 90 is "Introducing PUSH and POP" Lesson 81 is "Allocating memory for a data structure" Just reading through the titles, I agree this seems like a pretty arbitrary and random set of lessons. It also seems geared toward people learning to program in 1995, or earlier.

Really, if you want to learn to program, I do think it's better off to start in a language where you can "do something" right away to get a feel for it. A language like processing where you get immediate visual feedback on what you're doing would be great.

2D Graphics programming is awesome and I think, at least when I started learning programming people rarely got into it. If you think about it, these days if you're programming in Java doing really simple graphics (like drawing squares, lines, etc) using the Graphics2D API is really easy. We're not talking about low-level OpenGL or the old Win32 Device Context stuff.

If I were going to try to teach people how to program, I would probably start with some "fun" graphics stuff, teach loops and conditional statements, and functions, then I would move on to built-in data structures like Lists and maps and have them do a simple video game (maybe with level data made in XML).

One of the coolest things I learned in college was Boolean circuit design. We learned how to make logic gates from transistors, flipflops and adders and multiplexers from logic gates, all the way up to building a simple CPU (simulated on the computer, of course). I always thought it would be cool to teach people how to build a CPU, then write assembly language code for their own CPU.

But anyway, after the basics of "Programming" I would probably want to teach OO design (boring but useful), Functional programming and closures, Assembly, Algorithms, and of course underlying CS theory. For more advanced stuff I would want to teach Compiler/interpreter design, High performance computing, and Machine learning.

Honestly, this would have been better as a list of books to buy and instructions on what sections to read, what questions to do, etc. ALA MIT Open Course wear.
posted by delmoi at 12:22 PM on October 24, 2009

It makes it look like software development is still something that can be learned "in 118 easy lessons" and that's simply not true anymore.

No it doesn't. It makes it look like basic computer programming can be learned in 118 steps, which happens to be true.
posted by alligatorman at 12:23 PM on October 24, 2009

You don't expect "How to become a brain surgeon" on reddit, right? It makes it look like software development is still something that can be learned "in 118 easy lessons" and that's simply not true anymore.

I don't know that these 118 lessons are all that good, but you can certainly become an OK programmer on your own pretty quickly. I wouldn't want to maintain the code that you left behind, but you can certainly learn enough to be able to write PHP pages and little programs for yourself. Most professional software developers became programmers pretty quickly, and spent the rest of their time learning advanced techniques.

Honestly, I think learning to program today would be far easier then when I started around '92 with plain C. The programming languages are just more user-friendly. Not only with something like Java, but also Python, Ruby, PHP, Javascript, etc.
posted by delmoi at 12:26 PM on October 24, 2009

You don't expect "How to become a brain surgeon" on reddit, right?

Actually, I do expect how to become a brain surgeon, or something similar in the future, on reddit.

That's one of the great thing about true hackers, in the original sense of the word. They don't just apply the hacking philosophy to computers, but to life in general. Look at things like biohacking, where amateurs are experimenting with genetics and biotechnology. I think part of this philosophy is a certain skepticism of the exalted status that experts (like brain surgeons) have in our society.

I know this might sound absurd to some people. But the fact that you brought up brain surgeon is kind of ironic, given the original status of most hackers as AI researchers. If the brain is a computer, why not hack it?

Some might argue that brain surgeons must be licensed, for public safety. But if you're only experimenting on yourself, where's the problem? It comes back to a cognitive liberty issue, the belief that a person should have freedom over their own body, which seems to be another common hacker belief.
posted by formless at 12:31 PM on October 24, 2009

delmoi, you pretty much described my CS curriculum in college. The final project in my introductory class was to create a Breakout clone, which made me pretty excited for future projects...then second semester rolled around, we got into OOP, and we spent the entire semester creating a project scheduler. Enthusiasm: crushed.
posted by invitapriore at 12:34 PM on October 24, 2009

> 1) Learn assembly language (preferably for a simple CPU)

I don't think I'd agree with that as a universal starting point. I believe that a person who's initially aiming at higher-level programming (higher in terms of remoteness from the silicon) like web applications or basic desktop apps, can safely skip this step.

For example, there are many books around that teach most current languages (eg Perl, C, C++, Java) at the command-line or simple IDE level, starting from a "Hello World" point and building language knowledge as well as programming fundamentals. This sort of start still provides programming fundamentals, while also teaching something that will remain useful beyond the beginner level. I mean, c'mon, what percentage of all working programmers work in assembler?

If a programming student decides to veer off into more hardware- or low-level programming (eg device drivers, kernel programming, multi-threading, digital signal processing) then that's the point that assembler, and an understanding of execution at the silicon level becomes more important.

(Disclaimer: I do web applications in Java, C#, PHP, Perl for money, and microcontroller programming in assembler and C for fun)

On preview - another shout-out for hacking. New discoveries occur at the edges of the known world, and you can learn alot about things by just being curious about them.
posted by Artful Codger at 12:37 PM on October 24, 2009

and Basic still sucks, even VB.NET </personal bias>
posted by Artful Codger at 12:39 PM on October 24, 2009

Codger: I thought that was true before I learned assembly, but a cursory understanding of assembly and lower-level hardware function is actually a pretty easy thing to come by and makes it easier to optimize code effectively, because you inevitably come away from it with a much better idea of how the memory hierarchy works. I find it also makes it a lot easier to conceptualize how things like buffer overflow attacks work, which is always helpful when trying to secure your code.
posted by invitapriore at 12:47 PM on October 24, 2009

I learned programming around the age of ten by reading the examples in the Commodore 64 user's guide. Dad had actually gotten the computer several months before Christmas, and gave me the user's manual early, like July early. This was amazingly clever/cruel of him. As an obsessive-compulsive kid, I read through it so many times... it took me a week or two of puzzling to realize that there was nothing significant in the variable names in the bouncing ball demo. They were placeholders, just names. If I changed the letters "dx" throughout the whole program, it would still work. That was a breakthrough for me.

I started writing assembly because, on a Commodore 64, you are limited by what you can do without it. (Actually, I used machine code itself, since I didn't have an assembler. I looked up the opcode numbers in the C64 Programmer's Reference Guide for the instruction and addressing mode I wanted, then the numbers for the parameters. I entered these into DATA statements, then used a FOR loop to POKE them into memory, then SYSed to the start address. That wasn't nearly as burdensome as I make it sound, in fact it was a lot of fun, at least for a little while.)

C I learned a bit in college, but the Nethack source code game me an appreciation for what it, and tools yacc and lex, could do.

Python I learned on my own on the advice of someone here I think it was, and was amazed by how much it reminded me of programming back on the old C64. It really reawakened the love of programming for me again. I think new developers should probably get started on Python. Sure, it doesn't start you off doing all the little things C programmers have to do and so you later get annoyed when you have to start declaring variables and types ahead of time. But I value this annoyance. It gives me strength.
posted by JHarris at 1:06 PM on October 24, 2009 [3 favorites]

You don't expect "How to become a brain surgeon" on reddit, right?

Lighten up, Francis. I can make a metric assload of mistakes, and then some, in programming before I get within a few orders of magnitude of the type of damage that someone can do in a not-so-great day in the neurosurgery suite. (Although you can watch it online, if it do please ya.)
posted by Halloween Jack at 1:47 PM on October 24, 2009 [2 favorites]

I can make a metric assload of mistakes, and then some, in programming before I get within a few orders of magnitude of the type of damage that someone can do in a not-so-great day in the neurosurgery suite.

Bullshit. Integer overflows can be a bitch.

But in the average case, yes, you are correct. But it doesn't address the point. The point isn't about the amount of damage you'll do. It's about how insufficiency of such things for learning a deep, technical field.
posted by jock@law at 2:19 PM on October 24, 2009 [1 favorite]

jock@law:

Funny thing about people with a grounding in C and Assembly. Languages that run on VMs can behave very differently.

For example, in ruby string concatenation IS O(1). Now, this is because in ruby a string is not a C array of chars. I don't know the implementation details, but it likely is some sort of linked list. Of course a ruby string isn't as fast to work with as a C string, so this isn't necessarily better (in fact in almost every case it's likely vastly slower). Anyway, here's the code and benchmark results:

http://pastie.org/668324

running ruby 1.9.1p243 (2009-07-16 revision 24175) [x86_64-linux]
posted by sp160n at 2:21 PM on October 24, 2009

I disagree with you strongly, lupus. That's how you train codemonkeys ripe for the outsourcing. To train computer scientists, you start with RISC assembly -- or if you must, C. Otherwise you end up with a bunch of Microsoft(R) Certified idiots who don't understand why string concatenation isn't O(1).

Because you don't think they have "real" computer programmers in India?

Anyway, there is a difference between a "Computer scientist" and a really good programmer. A computer scientist can come up with new theorems without being an expert programmer. Of course it depends on the specialization.

Obviously if you're going for compiler design and low-level OS stuff you'll need to be great at bit bashing, whereas if you're just going to work on the mathematical foundational stuff like complexity classes or algorithms and so on you might not have that much programming to do.

Here's a list of CS sub-fields, they are all going to have different requirements as far as the level of low-level skill.

That said, I think low-level skill is critical to being a good programmer, or at least the experience of having learned the low level skills. If you've done lots of low-level OS programming, then you'll have a really good idea of why your OS is behaving the way it is. If you've done low-level network programming, then you'll understand how to debug your higher level web services.

I really think Assembler is key because when you learn assembler you learn what the computer is actually doing. You can probably get by without it, but Assembly makes it explicit in a way you don't get learning C. I mean, when I learned C I had no idea what "allocated on the stack" or how different types of method passing worked, at least not in a concrete. After I taught myself assembler, it all made sense.
posted by delmoi at 2:28 PM on October 24, 2009 [1 favorite]

Funny thing about people with a grounding in C and Assembly. Languages that run on VMs can behave very differently ... Now, this is because in ruby a string is not a C array of chars. I don't know the implementation details, but it likely is some sort of linked list.

That has nothing to do with the fact it's running in a VM. C++'s Standard Template Library allows you to implement strings however you choose.

There are always going to be tradeoffs. If you need a lot more random access to parts of the string, then storing it as an array is better. If you need to do a lot of concatenation and serial access, a linked list would be better.
posted by delmoi at 2:31 PM on October 24, 2009

I just realized that I was running an empty loop, doh!

Still holds mostly true however see: It's not O(1) but it's pretty close.

see this pastie for the correct code
posted by sp160n at 2:32 PM on October 24, 2009

Bullshit. Integer overflows can be a bitch.

Well, let's put that into perspective. Medical error causes 195,000 deaths in the U.S each year. Programmers screw up all the time, but their mistakes are fixable. Doctors have to go through rigorous (we hope!) training and assessments before they can practice, and they still kill a ton of people by mistake.
posted by delmoi at 2:37 PM on October 24, 2009 [2 favorites]

Programmers screw up all the time, but their mistakes are fixable.

That is, unless they are programming cancer radiation treatment timings, and they misplaced a decimal, as happened some years ago. People died.
posted by StickyCarpet at 3:06 PM on October 24, 2009 [1 favorite]

sp160n: this really looks more like O(n): basically, k₁n + k₀, with k₀ being dominated by object creation of all these strings (a fairly costly operation in a language like Ruby or Python).

If you want O(1) concatenation, you want ropes. (You could also do linked lists with a reference to the last element of the list).
posted by Monday, stony Monday at 3:30 PM on October 24, 2009 [2 favorites]

sp160n: it's not possible to do string concatenation unless you keep track of where the string ends or how long it is in addition to where it begins. Otherwise you have to iterate through each byte of the string to get to the end, in which case the runtime scales linearly with the number of bytes of the string: O(n).
posted by jock@law at 3:55 PM on October 24, 2009

Ugh. "not possible to do string concatenation in O(1)" is what I meant to say.
posted by jock@law at 3:56 PM on October 24, 2009

Monday:

Look a little closer

The benchmark shows that a single concat is O(1) no matter the string length. There's two things being tested on that benchmark, concatenating long strings vs short strings n times and by comparing the two runs with different sample sizes, showing that the number of sample sizes doesn't make individual appends any slower (that is, the 10,000th append isn't any slower than the 1st).

Concatenating a 20 char string n times takes just as long as concatenating a 5 char string n times. So that's O(1), showing that string length doesn't change the execution times.

What's O(n) is the benchmarking loop itself, the the absolute number of appends given n iterations. Of course if you iterate anything n times it's going to be at best O(n). The purpose of running the two benchmarks (n iterations and n*2 iterations) was to show that execution time depends linearly on the number of appends, not the length of the final string. That is:

"append" << "this"
is equally as fast as
"append" << "thissuperlongerstring"
and both are twice as fast as
"app" << "end" << "th" << "is"
even though strings 1 and 3 are identical in the end
posted by sp160n at 4:11 PM on October 24, 2009

To train computer scientists, you start with RISC assembly -- or if you must, C.

Utter madness! You might as well teach typing on manual typewriters!

Bear in mind that I have written, urg, at least 10^5 lines of C and C++ so I'd have a vested interest in this being true - but it is not.

Students need to understand how to think clearly about computers and computer programs before anything else. They need to understand best programming practices: how to create readable, testable, maintainable code.

You're going to turn off a lot of people and you're not actually going to teach them C because this is pretty subtle to get right. I might be able to read something like int *(f(int y, int *y)), which is pretty vanilla as pure C declarations go, but I'm sure as heck going to have a much better chance of writing bad code this way.

I assume you're in the "we learned the hard way, you should too, it makes us a particularly virtuous members of the species compared to other decadent humans who only know scripting languages" school of thought.

There is a certain brute enjoyment about learning C++, something like that of learning Chinese. Complex, arcane things like pointer arithmetic might "build character" but there's a lot of character floating around - what the world needs is better-engineered programs.

Of course, not every problem can be solved in Python or the like.

In particular C/C++ programs are more CPU-efficient. But most of the time everything I do runs "instantly" on the CPU - the slowness is the disk! Only a few things like graphics and digital signal processing really use much of the CPU at all. I literally don't care if my code runs in some scripting language. And you can definitely do real-time things - in the other window I'm working on processing my live MIDI in Javascript - so it's not like it's fall-in-the-floor slow.

For scientific work, there's numpy, a fast library for doing large calculations.

And ongoing work like Unladen Swallow shows great promise, and there's no reason to suppose that Python can't be made at least as efficient as Java, which in many cases performs as well for all effective purposes as C/C++ does.

Despite the fact that I've been working on C, C++ and Java for ten times as long as Python, I believe that I'm at least three times as productive in Python and in some important cases ten times as productive! (And the same is true for Javascript, though it's a bit primitive and doesn't have the library...)

That's a pretty strong statement but here's how it breaks down.

1. In an important number of important subcases, I can do it in Python without actually writing a program, or writing a program so trivial I just type it right in.

It's the "five minute program". If I can do something in five minutes then it's worth just doing it right now.

You simply have no idea how much more powerful an individual it makes me. If I wanted to know how many primes there were below 1000, I can sit down and write a program from scratch in 4 minutes and 25 seconds (I just timed it, and I hope the answer is 168....), and there's probably a library function that would do most of the work for me in this case. (The code is below, not particularly nice but quite readable.

2. For anything else, I get at least a three-fold improvement in productivity.

For many of my data types, I simply don't create a class at all - I have a "dictionary", a set of assignments of names to values. These are built-in to the language and I can read datafiles structured this way - they look like:

{
  in: {
    nano:  {port: 'nanoKONTROL SLIDER/KNOB'},
    wx:    {port: 'UltraLite mk3 MIDI Port'},
  },

  out: {
    live:  {port: 'from MaxMSP 1', ccmap: [2, 1]},
    vl:    {port: 'UltraLite mk3 MIDI Port'},
   },
}

And I'm getting that very readable data file "for free" - simply by writing the data as the language encourages me to.

It's amazing how much of the bulk of the code vanishes when you simply never have to write it in the first place - and how quickly you can rejigger the system.

In Python or Javascript, I'm creating a set of tests as I go - so every time I change my code, I run a battery of tests which catches the endless goofs I make, particularly during "rapid development". I often need to make violent redesigns if I'm running really fast, and again I am protected by my tests.

I will often make a series of a dozen or more small "change/run tests/check in" iterations (and yes, I run a local source control system even for personal development...)

If I make a small change and break tests, if I can't figure out in a few seconds what's wrong, I just revert the change and try it again! When the deltas between iterations are so small, it's not worth taking the time to debug goofs you've made.

You can write tests in C, and I suppose you could conceivably in RISC assembly, except you'd have to type hundreds of lines of code.... but it's so much easier to do things like mocks in a twenty-first century language like Python.

If you aren't teaching your students design for testing from the very beginning then you are teaching them to be a tinkerer, not an engineer. It is not engineering unless there is a test.

And finally, there are the libraries. I'm very familiar with the STL but it doesn't have the breadth that the Python library system has - and the universality. If you need it, it's probably there, and if someone else's third party needs it, they're probably using the Python library version too.

So Python is just a better language. You can get real work done, faster, and you can write more robust code. You learn the right habits early and not a much of weird obsolete trivia.

There are other potential languages to teach a first course in - Python is an appropriate language that I know well. But C and, shudder, RISC assembly are not at all appropriate!

--

Here's my primes program.... less than five minutes, five iterations (and that was a pretty poor showing, I rushed it and goofed a couple of times).

primes = [2, 3]

for i in xrange(primes[-1] + 2, 1000, 2):
  is_prime = True
  for prime in primes:
    if not (i % prime):
      is_prime = False
      continue
  if is_prime:
    primes.append(i)

print primes
print len(primes)

posted by lupus_yonderboy at 4:16 PM on October 24, 2009 [9 favorites]

1) Learn assembly language

Optimized C code is the new assembly language.
posted by StickyCarpet at 4:19 PM on October 24, 2009 [3 favorites]

sp160n: I'm not sure that's correct. Notice that the longer strings still take longer to concatenate. What might be happening is that the Ruby runtime may be optimizing your code. A smart runtime might see a loop that does nothing but string concatenation and start saving end-pointers after a couple runs.

http://pastie.org/private/bjohetjspaj0q4oxp2k72w

(Notably, my code sample doesn't directly prove my point that normal string concatenation is O(N). However, it does show that end-saved string concatenation is an order of magnitude faster, which easily leads to that conclusion - its not possible to be an order of magnitude faster than O(1). Also, I take no position on what Ruby does or doesn't do. My point only is that strings are harder than they look in high level languages and it makes for a world of hurt for people who don't understand why. Besides, part of the fun is the discovery of elegant solutions to problems you yourself face. You can't appreciate the elegance of the solutions if you never face the problems.)
posted by jock@law at 4:26 PM on October 24, 2009

Students need to understand how to think clearly about computers and computer programs before anything else. They need to understand best programming practices: how to create readable, testable, maintainable code.

I have to disagree with you. You don't train mechanical engineers by showing them how to use goopbegone to keep the grease off their hands. It's not about masochism and it's not about school-of-hard-luck-ism. It's about knowing what you're doing, understanding it at a technical, deep level, and it's the most important thing in any field.
posted by jock@law at 4:30 PM on October 24, 2009

"append" <>

I put it to you that you have no idea if this is true at all. It depends entirely on the language and whatever optimizations its interpreter and/or compiler are willing and able to do. The only way to prove this would be to benchmark it for a specific language and system.

And even more important, you should not concern yourself with such tiny details until the very last step, if at all. You should be concentrating on writing clear, correct, tested, documented code, see if the system you have delivered performs to specification, and in the rare case it is CPU bound, use benchmarks and a profiler to find that 10% of the code where 90% of the code's time is being consumed and optimize that.
posted by lupus_yonderboy at 4:31 PM on October 24, 2009

sp160n: just so we're clear, are these :

[andrew@fstop ~]$ ruby test.rb 10000000
                            user     system      total        real
Hello                   2.820000   0.140000   2.960000 (  2.960524)
HelloThere              2.940000   0.270000   3.210000 (  3.315655)
HelloThereMyFriendss    3.360000   0.490000   3.850000 (  4.010065)

[andrew@fstop ~]$ ruby test.rb 20000000
                            user     system      total        real
Hello                   5.630000   0.290000   5.920000 (  5.995643)
HelloThere              5.880000   0.530000   6.410000 (  6.621344)
HelloThereMyFriendss    6.370000   1.010000   7.380000 (  7.638905)

The results? If so, it really looks like concatenating "HelloThereMyFriendss" 10 million times takes more time than concatenating "Hello" the same number of times.

jock@law: I don't think you can concatenate strings in constant time if you represent them as contiguous arrays of bytes; in "s1 + s2", even if you know where s1 ends, you still have to copy s2 there, which will take O(n), where n is the length of s2.
posted by Monday, stony Monday at 4:32 PM on October 24, 2009

urg, that was supposed to refer to sp160n's code... but of course it contains HTML characters. Sorry!
posted by lupus_yonderboy at 4:33 PM on October 24, 2009

Monday, that's fair. There really are two terms. However, generally, string concatenation destination strings are orders of magnitude longer; they are the most significant term. But yes: by O(1) vs O(n) I mean O(1)+O(n) vs O(n)+O(n).
posted by jock@law at 4:35 PM on October 24, 2009

And even more important, you should not concern yourself with such tiny details until the very last step, if at all. You should be concentrating on writing clear, correct, tested, documented code, see if the system you have delivered performs to specification, and in the rare case it is CPU bound, use benchmarks and a profiler to find that 10% of the code where 90% of the code's time is being consumed and optimize that.

lupus: that's flatly wrong. algorithms that are orders of magnitude different in performance is not an implementation detail. using sound algorithms is emphatically a design phase issue. we're not talking about tweaking.
posted by jock@law at 4:37 PM on October 24, 2009

"Students need to understand how to think clearly about computers and computer programs before anything else. They need to understand best programming practices: how to create readable, testable, maintainable code."

I have to disagree with you. You don't train mechanical engineers by showing them how to use goopbegone to keep the grease off their hands.

Might I ask you if you actually work in the field? Because comparing testability to showing someone how wash their hands is........

Designing code for testability is hard. Any bozo can write a program and millions do. That is not software engineering and that isn't the code that's supporting the world's great software superstructures. If you want to write production code, code you can rely on to perform every time, you will have to do a lot of automated testing, and doing automated testing right is difficult and requires a particular mindset that takes a long time to develop.

It's not about masochism and it's not about school-of-hard-luck-ism. It's about knowing what you're doing, understanding it at a technical, deep level, and it's the most important thing in any field.

I think the key word in your head is "deep". In other words, you don't think it's really correct unless the person involved has a good idea what machine instructions are being executed.

In fact, what makes computer programming powerful is exactly that you don't have to go "deep" to make completely effective use of it. You rely on abstractions, on libraries, all the way up and down the programming world. If I had to really rely on knowing which operation was being executed at all times I'd go mad.

Certainly I don't believe you can really become a really mature programmer until you have learned C. And there's some advantage in knowing how machine language works well and if you're studying computer science you should probably take a course in it.

But these are stupid languages to start in. They teach you bad habits that take years to break. They are excruciatingly verbose and you become slow and stupid from endless typing. They have numerous, well-known and subtle traps, traps that an advanced programmer should certainly know to avoid but shouldn't be put under the feet of a journeyman programmer.

Beginners need to concentrate on correct, clear, tested, maintainable, concise code before learning large, arcane, thorny legacy languages with numerous quirks and difficulties.

They need to learn fundamental data structures like lists, stacks, dictionaries, trees - they don't need to learn about #include files and C lvalues. If they really understand the essence of programming then later they'll take the extra time when they finally learn C and do it right.
posted by lupus_yonderboy at 4:53 PM on October 24, 2009 [2 favorites]

jock@law.

You're totally right, ruby string appends are most certainly not O(n) the strings I was using were too short, the method call overhead was dwarfing the string append cost. I re-ran it with a larger disparity between strings (also, I stopped using cumulative appends as they would give me out of memory errors), here's the results with 1000000 iterations (left col is string length):

100 1.030000 0.010000 1.040000 ( 1.050520)
1000 2.470000 0.030000 2.500000 ( 2.578237)
10000 21.280000 4.140000 25.420000 ( 25.902005)
posted by sp160n at 4:56 PM on October 24, 2009

Finally, some information about programming computers on the Internet.
posted by DU at 5:27 PM on October 24, 2009 [1 favorite]

that's flatly wrong. algorithms that are orders of magnitude different in performance is not an implementation detail.

Absolutely that is an implementation detail - because nearly all the time these "orders of magnitude" performance differences are an illusion.

Suppose you have an indexed table and want to find elements in it. A linear search is O(n) but a hashmap is O(1)! So the hashmap wins.... right?

Not so fast, cowboy. It costs an awful lot just to set up a hashtable, all these buckets and pointers in memory, and then doing all the arithmetic to locate your buckets isn't free either. And in most real world cases, your collections are very small nearly all the time - sometimes you can guarantee they're never larger than a given value. It might be that the linear search is always faster than the hashmap in your actual application, and it will certainly consume a lot less memory. Often memory is a more expensive constraint than CPU anyway.

Exactly the same thing is true with the concatenation of strings being quadratic - most of the time it won't make a difference but your extensive testing should catch it before it hits the real world.

Spend the time writing clear, corrected, tested, concise, maintainable code, and if you do have to optimize at the end, these simple optimizations will jump out at you then.

Anyway, you're a lot less likely to have these "algorithms that are orders of magnitude different" if you can actually see all the code you're writing on one page.

Writing in assembly language requires literally pages of code to express a Python one-liner like:

  [f(expensive(x)), g(expensive(x)) for x in items]

It doesn't take a genius to be able to see that you can immediately eliminate that duplicate call to expensive(x) and perhaps save half your CPU off the top - it does take a genius to see this in assembly language.

The same is true with the classic "quadratic concatenation of strings" algorithm. In a language where there's every advantage to keep things as lists all the way through, you'd never even think of joining strings in a loop....

Again, everyone should know fundamental algorithms and O() notation, as I keep saying - lists, trees, hashtables, binary search and that sort of thing. But there really isn't that much to those, a few weeks of lectures perhaps, and these days your language always has a much better version of this sort of thing than you personally could ever write hidden under the hood.

What makes you a real programmer is correctly expressing your intention in code, and proving it with tests and clear and unambiguous code and interfaces. Knowing details about what the registers are doing is only of peripheral importance.

The world's servers average significantly less than 50% CPU utilization. For those of us that have multicore machines (which is most people with a newish machine these days), most of the cores sit there twiddling their thumbs nearly all the time, and yet we still wait on all sorts of calculations.

This problem isn't going to be fixed with your assembly-language bit twiddling. Instead, we'll be representing algorithms in higher-level languages like Python and then the compiler/interpreter/dispatcher will figure out how to break up the tasks amongst the available computing resources.
posted by lupus_yonderboy at 5:30 PM on October 24, 2009 [2 favorites]

VERY few professional programmers are working on anything remotely interesting. If you are getting paid, you are likely not working on anything cpu, gpu, or realtime (in the strict "interrupt" sense) intensive. You are most likely moving data around, and trying to interact with other systems, be they databases, webservices or what have you. I'm not sure if this list intends to produce good "professional" programmers though. Professionally, you are probably making CRUD applications by day; for interested hobbyists, it seems apt.

80% of the cost of software is maintenance. Maintenance means readability and modularity. I firmly believe that there are 2 things that determine whether or not a given programmer will produce maintainable software; communication skills and psychology. Optimizing and obsessing over assembly level details is unfortunately anti-thetical to communicating well. Someone with a solid grasp of grammar, and who is interested in applying it to the code is infinitely more valuable to a team than someone who speaks RISC, x86, C or C++, for most projects per my previous point above.

As far as psychology goes, it is EXTREMELY important for developers to be able to control their egos and manage the creative rush they get from building and "owning" their code. Someone who is adept at C or C++ has every motive to re-invent the wheel, and will be hard pressed to replace that snazzy custom connection pool they implemented with something OSS off the 'net.

And lupus's points about testing are bang on. That's why modularity is so important, and thats why training people to be obsessive coders who work in a vacuum terrifies me.
posted by butterstick at 5:46 PM on October 24, 2009 [4 favorites]

The world's servers average significantly less than 50% CPU utilization.

Right. Rely on the efficiency of hardware to buy you out of your bad algorithm. Meanwhile the rest of the world wonders why they need a new computer every few years even though their use cases havent changed significantly since 1998.

A linear search is O(n) but a hashmap is O(1)! So the hashmap wins.... right?

Nobody disputes that A*n will be sometimes be faster than A*1+k if A and n are small relative to k. Where your set size is insignificant compared to the overhead in the alternative, high-school algebra bears that out. Nowhere did I suggest that someone always optimize for large sets -- all software design involves knowing your use cases.

Nor do I think people should do significant application programming in assembly, as you seem to characterize my position. I never took that position and I don't believe in it.

What I DO believe in is understanding what the machine is doing when you tell it to do something. There should be no magic in the sciences.

The kind of education you're advocating leads to hourly in-house workers hacking together ugly VB forms for a backend inventory system. It's barely a step above hobbyist programming. Big shops don't hire the people you're talking about. Oracle, Microsoft, Apple, VMWare, Google. No student under your curriculum has a dream of working there. You can't write MapReduce without a deeper understanding than that.
posted by jock@law at 5:48 PM on October 24, 2009

Big shops don't hire the people you're talking about. Oracle, Microsoft, Apple, VMWare, Google.

I just retired after five years writing C++ and Python at Google, actually, and I'm pretty well spouting the gospel according to them...
posted by lupus_yonderboy at 6:07 PM on October 24, 2009 [5 favorites]

Big shops don't hire the people you're talking about. Oracle, Microsoft, Apple, VMWare, Google. No student under your curriculum has a dream of working there.

jock@law, lupus pretty much describes the curriculum at top programs. If you follow my links above you see that Stanford starts out with Java (actually Karel which is simplified Java) and MIT starts out with Python in their beginner course. In fact the real value at Google comes from the PageRank which is less to do about computer than it does with information theory. Sure there are some firms that work really close to the bare-metal, VMWare is one of them, but if you train a good engineer they can pick up and work with any tools you throw at them.
posted by geoff. at 6:13 PM on October 24, 2009 [1 favorite]

And now on to the tutorial criticism. It is VERY C specific. It is VERY procedural. It is not object-oriented at all. This is fine for a hobbyist or kernel hacker or what have you, but I question the value this provides in the real world. I haven't ever used most of this stuff since I got out of school, nor have most of my colleagues.

Lesson 5 specifically disturbed me, because it sounded like Carl is intending these lessons to be sufficient training for professional programmers. His About Me post reinforces that suspicion for me.

I'm also pretty worried about completely ignoring the linguistic aspect of programming. Why no lessons on how to choose good variable names? No lessons on how to structure code? The clarity of even his examples would be better served by adopting a "Pet Store" metaphor so he could avoid ridiculous variable names like "myPointer".

These pretty much read like he flips thru The Joy of C once in a while and rephrases from there. I'm a firm believer that a solid foundation is valuable to a programmer, but I think it is a more conceptual foundation... understanding Big Oh, Concurrency, Domain Modeling... all these things seem glaringly absent.
posted by butterstick at 6:14 PM on October 24, 2009 [5 favorites]

...and I always got my best marks for code quality on the endlessly, horrific performance evaluations that we tortured ourselves with once or twice a year....

A couple of years ago I had a Mapreduce to deduplicate US real estate listings. This is not an exact process, and in some sense you need to compare each pair of listings... which is an N^2 process, we're talking trillions of comparisons. Now, we were of course being a little clever and splitting things up so each machine ran one zipcode - but even then California was very slow, because it was N^2 on the number of listings in California.

I realized that the distribution of housing prices was logarithmic, and so if I only compared offers within a narrow price ratio, I could pretty well guarantee to be seeing all possible duplicates. And STL offers a nice datastructure called multimap that allows me to access all values within a price range in linear time!

I have to say that the fact that this thing was written in C++ made my whole task much harder. The people before me hadn't seen it, when I pointed it out there was some skepticism that I could make it work, and I didn't come up with a good way to gradually transition the code from the old to the new data structures (which I think is the mark of a good transition) but had to just throw out a chunk and rewrite.

But it worked. All the tests continued to pass and now the procedure ran as n^2 (it's always quadratic) but this time on the number of houses within a fairly small price range (there was a wrinkle because a few offers had no price but luckily there weren't very many...) - a much smaller number - so it went from 4.5 hours on 400 machines to 18 minutes on 40 machines. Bwahahaha!

Point is that all my knowledge of C++ register bit twiddling was, as usual, completely useless. I could have seen the fix much earlier if it were in Python and I could have implemented it incrementally, with more tests (though there's almost complete testing for C++), in an afternoon instead of a weekend and another day.
posted by lupus_yonderboy at 6:24 PM on October 24, 2009 [4 favorites]

I just retired after five years writing C++ and Python at Google, actually, and I'm pretty well spouting the gospel according to them...

I seriously, seriously doubt that's the case. If it were, I imagine Google would be hiring people with random Microsoft certifications instead of it basically requiring a graduate degree in CS to work on anything interesting there.

In fact, with at least some things you've mentioned, I know it's not the case. Google does not - in its flagship software - regard algorithm design as an implementation detail. Period.

And even if you were Sergey Brin talking on Metafilter yourself, well... all I'd be able to say then would be that I'm glad Google doesn't design most Comp Sci curricula.
posted by jock@law at 6:27 PM on October 24, 2009

geoff: in part he descibes it well, in that he advocates starting with a high level language. fine. but all top programs also include an architecture class at some point relatively early on, usually second year. there's also a heavy emphasis on analysis. i would not characterize, e.g., the yale cs program (of 5ish years ago, i dont know anything about it now) as leaving algorithms as trivial 'implementation details' or waiting to learn C until you got around to it one day.

i recognize that he and most people disagree with me about starting off low-level and they may be right. but some of the other things he's said strike me as very, very wrong.
posted by jock@law at 6:33 PM on October 24, 2009

If it were, I imagine Google would be hiring people with random Microsoft certifications instead of it basically requiring a graduate degree in CS to work on anything interesting there.

I'm not following this logic at all. Are you saying if Google has performant software they must hire people who pay attention to how hardware runs software? I would find that surprising for a company that uses so many dynamic or virtualized languages. It would seem to me that they are explicitly investing in quite the opposite, needing only a very few of these types to focus on VMs and runtimes.

i.e. they are leveraging modularity. Am I misunderstanding?
posted by butterstick at 6:34 PM on October 24, 2009

butterstick: im not railing against the use of highlevel languages. im railing against the idea of algorithm analysis as an 'implementation detail'
posted by jock@law at 6:37 PM on October 24, 2009

[object-oriented code]

Object-orientation is a useful code encapsulation and re-use strategy, but one that is greatly overused.

When I am tempted to create a new class these days, my first thought is to say, "Can I avoid it entirely?"

Can I use a dictionary or a list to do the job? Can I just have functions that operate on other classes? They're so much easier to test! Can I use containment instead of inheritance?

The trouble with an "object" is that the data and the behaviour are glued together... and sometimes that's exactly what you want, but most of the time what you really want to do is to reuse and share code. Sometimes you want to do things like "mixins" and "double dispatching" and inheritance is no good at all.

Having "state" that you can change is very important, but it's also dangerous, because anyone can change "your" data. Pure functions with no state are more robust and testable.

Now, pure functional code is no fun at all to write but the point is that you should keep a tight lid on your state and strongly consider keeping fairly dumb data and having a selection of statefree functions to operate on this data.

Don't get me wrong! I've created a bunch of new classes in the last 24 hours (though in Javascript, I'm doing a music programming thing). Classes are great!

Object-oriented code's a great technique which needs to be used exactly at the right time. I'd say you could go quite a long way learning how to code without actually writing a single class. You will note that I was a couple of dozen lessons into my lesson plan above and hadn't actually mentioned objects or classes yet...

An excellent call to action here with respect to C++ is here by the eminent Scott Meyers. If you write C++, please read that article. Let it trickle into your spine and think, "Non-friend, non-member function" instead of "method" wherever possible. It's counterintuitive but true!
posted by lupus_yonderboy at 6:42 PM on October 24, 2009 [2 favorites]

butterstick: im not railing against the use of highlevel languages. im railing against the idea of algorithm analysis as an 'implementation detail'

But it is an implementation detail. We try to isolate hard things as much as possible, so they can be tweaked or prodded or ripped out entirely or re-written or even left alone. You need entire teams of people to write everything else, but only one person to obsess over the specific formula. Like I said, 80% of the cost of software is maintenance, and most products are CRUD apps that just shuttle data around.

And Lupus, I'm in agreement on inheritance being bad, and gluing behavior to data is rarely useful. From an educational point of view, pure functional languages may be the way to go. I draw a sharp distinction between functional (scheme, lisp, clojure) and procedural (C, Pascal) though, and the tutorial in question is very procedural. If he covered OOP, students would at least be further along, if not as far as a functional curriculum. However, when it comes to describing a domain model in terms that conveys re-use to the business people, I still believe OOP is the way to go.
posted by butterstick at 6:53 PM on October 24, 2009

From an educational point of view, pure functional languages may be the way to go.

It's too hard. You need some place to stick your data. You need to have operations with side-effects!

Pure functional programming is hard, but side-effects are dangerous, you have to walk a fine line.

However, when it comes to describing a domain model in terms that conveys re-use to the business people, I still believe OOP is the way to go.

Amen!
posted by lupus_yonderboy at 7:11 PM on October 24, 2009

Students need to understand how to think clearly about computers and computer programs before anything else. They need to understand best programming practices: how to create readable, testable, maintainable code. -- lupus_yonderboy

Why do you think the order matters? Certainly it would be good if they knew how to write "readable, testable, maintainable" code at some point but there's no reason they should learn that first, since it would probably bore them to tears. Likewise, I don't think Assembly should be taught first. The first thing should be to learn a relatively easy language as quickly as possible, so they can get used to expressing themselves in code.

Just because you think something seems important doesn't mean it should be taught first

The other thing you brought up, using python, and claiming it makes you more efficient, well, many people believe that when it comes to writing larger programs the type safety of statically typed languages like Java or Scala make code much less buggy and easier to maintain. The amount of testing you actually need to do can decrease. I think I could probably write a prime O(N²) finder in 5 minutes in Java. (I know you're comparing Python to Assembler, and of course that's going to be far more efficient).

A long time ago I wrote a prime finder in C++ that could run in O(N) time using the Sieve of Eratosthenes, and that definitely took longer then five minutes, because I was doing everything I could to make it go as fast as possible, (I was having a contest with a friend). And I wasn't as good a programmer at the time, this was when I was a freshman in college. (Interesting side note: the Wikipedia article has a Haskell two-liner implementation. Heh)

And really, writing testable code isn't that hard. If you're doing it the right way in the first place, it isn't that hard to test.

because nearly all the time these "orders of magnitude" performance differences are an illusion.

Suppose you have an indexed table and want to find elements in it. A linear search is O(n) but a hashmap is O(1)! So the hashmap wins.... right?

Not so fast, cowboy. It costs an awful lot just to set up a hashtable, all these buckets and pointers in memory, and then doing all the arithmetic to locate your buckets isn't free either.

Picking an off the shelf collection is an implementation detail. Most of the time you just change the type deceleration to change which one you pick (i.e. Map m = new TreeMap<type> vs. Map m = new Hashtable<type>)

But what happens when you have to actually come up with the algorithm yourself and it isn't provided for you? Like if you need to use a binary space partition to do collision detection. Then the algorithm you pick is going to have a pretty big impact on how write the rest of your code. If you decide to make the algorithms switchable, then that is going to have a huge impact on how you write the rest of the code too.

No one is expecting people to write huge programs in assembly, we're simply saying that knowing assembly is a good way to understand what the computer is doing at a fundamental level, which helps out all the time when you're trying to understand why things are going wrong.

Now, if you're just writing tens of thousands of lines of database front ends at $HUGECORP then that's not going to matter too much. You're not very likely to have to deal with these issues, you're going to spend most of your time cleaning up other people's messes, and naturally it's going to seem like the most important thing other programmers know is how to write code that doesn’t give you a headache when you have to update it or it breaks.

But that's really "software engineering" which, while it requires programming, is not the whole thing. You can be great at writing programs that don't do anything but shuttle data between a user and a database without ever knowing about how the computer really works or ever knowing much about algorithms or low level stuff (because you never need anything complex) and so on.

Seriously, the argument that the orders of magnitude speed changes are 'illusory' is pretty insane. Like I said, if you're just using off the shelf data structures it's easy to change them if you need too, But if you're doing something really complex it does matter how you proceed.
posted by delmoi at 7:12 PM on October 24, 2009

butterstick, it is funny you should mention pure functional languages as the way to go. This is way out of my league (though I did use Mathematica in school) so I could be talking out of my ass, but I was talking to a friend about how business schools don't teach even the rudimentary concepts behind computing. This is strange as nearly all business systems are run on relational databases and object oriented systems. You would think that a Java/SQL course would be mandatory, especially given the capital expenditures of even a small data center, it would be important for decision makers to at least know what's going on more than a black box. My friend made the same argument you just made, that functional programming languages would probably be the way to go, even if enterprise software is primarily written in imperative languages. This seemed rather ridiculous to me, then I realized that it might not be that far into the future where you begin to see a split in computer science programs. More rigorous, academic programs would focus on functional programming and sort of morph in to the mathematics departments, while business oriented computer science would be a lot more pragmatic. One could write a lot of business apps without knowing C and having only the faintest clue what a register is. If scalability and performance becomes an issue you would need to bring over more formally trained engineers, but really for most businesses it simply is a whole lot cheaper to buy another data center in a box than hire someone who can really optimize the process.

Just a thought, could be complete bullshit, but computer science programs did emerge from math departments in what? The 80s? Given the advances since then there's no reason to believe it won't continue to change and divide.
posted by geoff. at 7:19 PM on October 24, 2009 [1 favorite]

Seriously, the argument that the orders of magnitude speed changes are 'illusory' is pretty insane.

Take any program. Pick a line at random. Wave a wand and magically make that line take 10 times longer to execute.

If I run this new program, it is most likely that you will never notice anything different.
posted by lupus_yonderboy at 7:26 PM on October 24, 2009 [1 favorite]

geoff.: functional programming languages don't correspond well to the way we think.

Humans have mental variables with names like "Laura" and "the cat" with states like "at home" and "asleep". You don't really think of applying the "at home" operation to Laura and getting a new instance of Laura... it's the same person whose undergone a state change.

Side-effects are very convenient - but dangerous. We need side-effects, we need variables, but we should keep them tightly controlled.
posted by lupus_yonderboy at 7:32 PM on October 24, 2009

But that's really "software engineering" which, while it requires programming, is not the whole thing.

It's 80% of the whole thing, and for most developers, the ones not fortunate enough to be working on the big interesting problems, it is everything. For the business people who are spending money on the software? It is everything. They don't get market advantages by having a better MapReduce than Google, they get their advantages by features. And the more features they can pile on faster, the more competitive they are in the market. The less they spend on developers and support staff, the better their margins. Most software companies simply aren't competing in those arenas that require the attention to machine detail you are referring to.

Seriously, the argument that the orders of magnitude speed changes are 'illusory' is pretty insane.

I don't think I have ever, in my 15 year career, run into a CPU bottleneck. It is always an IO issue, typically along the data serialization path.

Waste comes in all kinds of flavors, and CPUs are so fast that you run into more basic problems that are related to program structure and flow than anything else.
posted by butterstick at 7:50 PM on October 24, 2009

delmoi: (Interesting side note: the Wikipedia article has a Haskell two-liner implementation. Heh)

Actually, it doesn't.

lupus_yonderboy: Have you ever read CTM?
posted by Monday, stony Monday at 7:52 PM on October 24, 2009

But that's really "software engineering" which, while it requires programming, is not the whole thing.

In fact, now that I think about it... what you are referring to as "software engineering" is in fact the whole thing, where programming is only one aspect of it. My position is that developing software is a broad craft, and programming well is but a single skill in the toolkit.
posted by butterstick at 7:54 PM on October 24, 2009

I don't think I have ever, in my 15 year career, run into a CPU bottleneck

Photo-realistic rendering. Image motion vector tracking. Shape from shading ...
posted by StickyCarpet at 8:13 PM on October 24, 2009

And you can definitely do real-time things - in the other window I'm working on processing my live MIDI in Javascript

Whoa. What are you using? Right now I'm working on some libraries to do this using Mozilla's Rhino and it sounds like I may be reinventing the wheel.
posted by weston at 8:16 PM on October 24, 2009

@lupus_boy: As long as we're talking about software engineering as involving clarity, your code could be a lot clearer:

primes = [2, 3]
for i in xrange(primes[-1] + 2, 1000, 2):
  if all(i % prime for prime in primes):
    primes.append(i)

posted by A-Train at 8:22 PM on October 24, 2009

If I run this new program, it is most likely that you will never notice anything different.

That's probably true, if you pick a line at random in a 50k line codebase. But it's definitely not the case that if you fundamentally pick an architecture that's not going to be fast enough, it won't be fast enough. It'll be so slow you'll have to start over.

I'll give you an example that happened to me. I was working on a web spidering system. I wanted to give each spidered link a 'rank' based on some criteria like the popularity of the site and how many links for the site we already had in the system. That meant for each insert I would need to do a couple queries. I decided to try storing the data in a MySQL database. This system was really complicated, and my MySQL code ended up being about 600 lines. But when I went to actually use it, it was too slow. Even with my slow DSL connection the bottleneck was the MySQL database.

The overall system was pretty modular, so I was able to replace that component with one that just used Java collections. That was only 300 lines, but of course it wouldn't have been persistent. Anyway, it was easily able to scale up to millions of links before it filled up the memory I had allocated for it and it slowed down too much (I was expecting it to just crash, but instead it just started using the garbage collector a lot and got really slow). I was even able to use EC2 instances to do the downloading and only actually download the actual links, rather then all the content over my DSL connection.

That's an extreme example: using a database rather then storing data internally, but the choices you make can have a huge performance difference. But think about this. Suppose you have a choice of storing data in a traditional SQL database, or using something like Berkely DB. Obviously, that's going to have an enormous impact on how the rest of your software is written.

I've actually written my own little database system that uses memory mapped files. It's very, very, very fast, as long as your database doesn't get larger then system memory.

Since Byte Buffers in java can only be two gigabytes, I decided to write an abstraction layer that would let me use multiple byte buffers to store larger amounts of data. Since I was doing that, I figured I would also write an implementation that used random access files. Since I had just gotten an Intel SSD I figured it might still be pretty fast. I was wrong.

I could still do simple tasks using the non-mapped files on the SSD but when it came to using really complex software that needed to do tons of read/writes in order to work, it was just too slow (in that mode) to be practical, compared to the memory mapped mode.

Often times you don't really see the difference between a slow algorithm and a fast algorithm because the slow algorithm is so slow that the task becomes impossible using it and it's just dropped.

Oh, and I was going to mention one company where they really do code for speed: Goldman Sachs and other huge banks. Their high frequency trading platforms are a good (if also evil) example of where even microseconds can have a huge impact on value. And even though it doesn't always matter for most companies, at places like facebook or twitter, a 5-10x speed improvement means a 5-10x reduction in their main expense. It can be worth while.

Speaking of Google, here's what Peter Norvig, Google's head of research has to say in his essay "Teach yourself Programming in Ten Years"

Remember that there is a "computer" in "computer science". Know how long it takes your computer to execute an instruction, fetch a word from memory (with and without a cache miss), read consecutive words from disk, and seek to a new location on disk.

posted by delmoi at 8:29 PM on October 24, 2009 [1 favorite]

I don't think I have ever, in my 15 year career, run into a CPU bottleneck

Funny thing is that nowadays we run into nasty bugs because the production hardware is too fast.
posted by smackfu at 8:35 PM on October 24, 2009

Remember that there is a "computer" in "computer science". Know how long it takes your computer to execute an instruction, fetch a word from memory (with and without a cache miss), read consecutive words from disk, and seek to a new location on disk.

The funny thing about that is that these days, the answer to these questions is getting pretty complicated. For instance, the clock frequency of my computer's cores varies depending on how many of them are currently in use. There's also some quite opaque stuff going on with registers inside of my cpu: I can only code using 16, but the processor holds many more values in register-like slots. I also have 3 levels of cache, and my disk has cache, too.
posted by Monday, stony Monday at 9:24 PM on October 24, 2009

If you aren't teaching your students design for testing from the very beginning then you are teaching them to be a tinkerer, not an engineer. It is not engineering unless there is a test.

Oh for the love of Knuth, I am getting tired of this religion.

It isn't engineering if you don't understand wtf you're doing. Developing software should not be an exercise in trial and error, which seems to be the level of competence from which this ideology originates. A unit test is a hedge against your ignorance, against the things you can't control. The test itself does not make anything right.

And besides, fuck engineers. It's not computer science unless there is a formal proof of correctness.
posted by 0xdeadc0de at 11:46 PM on October 24, 2009 [1 favorite]

It isn't engineering if you don't understand wtf you're doing. Developing software should not be an exercise in trial and error, which seems to be the level of competence from which this ideology originates. A unit test is a hedge against your ignorance, against the things you can't control. The test itself does not make anything right.

Well, testing does have one good feature. If you write tests, then you can be sure you haven't broken anything. I think the whole idea of writing failing tests before you write your code is idiotic. It seems more like a mental trick to actually get you to it.

When I write code, I'll generally test each little bit as I go, then delete the test code. Obviously I need to know that it's doing what I think it's doing. Only rarely do I have a situation where I have to write a huge bunch of interdependent code that all relies on itself, and has to be tested as a huge unit.

There is one good use for testing, though, and that's to make sure you don't break anything. A couple of months ago I was playing around with Nutch, which uses Hadoop, an open source implementation of mapreduce in java. It was failing on me and I was able to run the unit tests that came with the implementation. Turned out the bug had to do with trying to change file permissions. I think it was actually trying to run chmod, and I was running windows. (There's no built in Java method to change file permissions now, but there will be in JDK 7) I don't know if it was vista or what, but I was able to fix the the incompatibility (by just skipping the file permission change, which wasn't even necessary), The unit tests found the problem right away.

So I think tests are a great way to make sure you don't break anything you've already written, and of course for testing in new environments, like my PC.

When you've got tens or hundreds of thousands of lines of code, the possibility of breakage is real.
posted by delmoi at 12:36 AM on October 25, 2009

Developing software should not be an exercise in trial and error, which seems to be the level of competence from which this ideology originates. A unit test is a hedge against your ignorance

Obviously, flailing around blindly until it does what you want is less than ideal. But personally, I know that I can't hold everything about most useful programs in my head. At any given moment, I'm ignorant of some aspect of how it behaves, even if I wrote it. I'm hardly dogmatic about test-driven development, but it seems to me that fundamental idea behind unit tests is a good one: some subset of expected behavior is well-defined and you test for it automatically. Good process engineering, whether or not it's software engineering.

It's not computer science unless there is a formal proof of correctness.

If I'd wanted to make formal proofs of correctness, I would've studied Math in college instead of computer science.

Wait. I did that, and I still don't want to do formal proofs of correctness.
posted by weston at 12:45 AM on October 25, 2009 [2 favorites]

I'm not saying unit tests are bad. I've written tens of thousands, for regression suites that take all day to execute. I have to, because even if I were perfect, and my team were perfect, we're dependent on APIs and compilers that are far from it.

But thumping the TDD religion into beginners is counter-productive. Their confidence should not arise from "it passes the tests". They need to write tests, but only as part of the exercise, because their tests will be as flawed as the code they're testing. Their confidence should be of the kind that comes from understanding something so thoroughly they don't need a unit test or debugger to find the errors.

Maybe I just miss the days when mistakes hurt, when it made your machine reboot.
posted by 0xdeadc0de at 9:39 AM on October 25, 2009 [2 favorites]

0xdeadc0de: i remember reading a star trek novel where a woman had to code something and while doing it she recalls her father teaching her how to write things perfectly the first time, how to debug in your head. Anyway. You reminded me of that. I should find the book again.
posted by jock@law at 11:33 AM on October 25, 2009

Not directly related to programming, but an aside to that old canard 'knowing how long things take': Check out the Non-Deterministic Hardware section of this LWN writeup, a few paragraphs in.

Apparently, measuring hardware response times gives results that are not statistically distinguishable from random numbers. There's a link to a PDF paper about attempts to benchmark the Linux kernel; their results were highly imprecise timings for things. It appears that systems are becoming too complex to be predictable anymore. Even CPUs themselves show some internal timing randomness, on the order of 10^-9.

It may already be impossible to be certain how long something will actually take in real life, at least in x86, and it appears the problem is only going to get worse as systems add more and more layers of functionality. This particular requirement to be considered hardcore may be obsolete. :)
posted by Malor at 12:05 PM on October 25, 2009 [1 favorite]

As I think I said above, I don't believe in writing tests before code, and I don't follow test-driven development.

But all your arguments baffle me. Your young programmer is supposed to develop "confidence" and "intuition" about how their program runs... how? It seems the argument I'm seeing is that you're supposed to reason out how this all works in your head. This is a lot to ask of people! This is the sort of thing I'm good at - but it's still very hard.

The whole idea of tests is that you get to see your code run. You get to test edge cases, not just design for them. When you're actually using your code, you not only see bugs, you see how easy or difficult it is for your client to use.

Let's put it back to you! You're teaching a young programmer - they've written their first program - now what? Do you pat them on the head and assume it's right? Do you get them to stare at it or run it with some "typical numbers"?

The way to develop confidence that you know that you program works is not with pure reasoning - it's by having a lot of programs fail and you fixing them and realizing how your mental model was broken.

A test is the perfect framework for this. You can make tiny changes, and see what happens - you can follow your code in the debugger, knowing exactly what the input data is.

I agree that you need to understand what you're doing; you need a strong conceptual framework; you need fundamentals; in fact, when I reject candidates for engineering positions, I'd say that at least 3 in 4 of them are rejected entirely for "fundamentals".

But I'd frankly rather use code with a complete battery of tests written by someone who I feared had a weak conceptual framework, than code with no tests written by someone conceptually advanced.
posted by lupus_yonderboy at 2:53 PM on October 25, 2009

It may already be impossible to be certain how long something will actually take in real life, at least in x86, and it appears the problem is only going to get worse as systems add more and more layers of functionality. This particular requirement to be considered hardcore may be obsolete. :)

I don't think people expect to know the exact amount of time something will take, but they should know the orders of magnitude, and how long they compare to other options. For example, doing a database query vs. opening a file vs. reading from a cache.

But all your arguments baffle me. Your young programmer is supposed to develop "confidence" and "intuition" about how their program runs... how? It seems the argument I'm seeing is that you're supposed to reason out how this all works in your head. This is a lot to ask of people! This is the sort of thing I'm good at - but it's still very hard.

Well, when I code I generally write little bits of code to test/try out the code as I'm writing it. Sometimes those blocks won't work in the final product, or there just there to make sure a bit of code does what I think it does. All the code I write is at least tested in some way, but not necessarily by writing formal test cases. Obviously it's possible to write software without writing "official" test cases and whatnot, otherwise there wouldn't be so much hype about doing it.

I'm not saying students shouldn't learn about it, certainly they should. But there's no reason why you couldn't just have a class on Test Cases that kids take at some point, rather then having it drilled into them. It's important for people in industry, depending on where they work, but it's just a bit of extra stuff, not part of the core basis of being a "programmer". Maybe for some people who don't intuitively grasp programming as well, it will help them think about how to structure their code to be more independent, and easier to test.
posted by delmoi at 5:50 PM on October 25, 2009

But I'd frankly rather use code with a complete battery of tests written by someone who I feared had a weak conceptual framework, than code with no tests written by someone conceptually advanced.

Of course, if I'm using someone else code I would rather it came with a complete set of test cases, the question is: Is it worth it to me to spend time writing test cases, or spend time writing more code? I gave the example of Hadoop earlier. It wasn't working for me and I was able to track down the incompatibility bug thanks to the test cases they had provided really quickly. On the other hand, what if they'd taken the time they spent on the unit tests to make a nice GUI front end, or basic implementations of standard algorithms? It could have been more work for me to track down the bug, but less work for the average Hadoop user.
posted by delmoi at 5:54 PM on October 25, 2009

Well, when I code I generally write little bits of code to test/try out the code as I'm writing it.

It is inevitable - and I'm proposing keeping that code and expanding it - using it to make sure that code always works.

Is it worth it to me to spend time writing test cases, or spend time writing more code?

There's an assumption there that it takes you longer to complete your task if you write unit tests. But that's simply not the case.

The reason I write unit tests is because overall it takes me less time to complete my task. True, I spend quite a lot of time writing tests - but I spend a heck of a lot less time debugging. Writing tests is more fun and less stressful than debugging and I save more time than I spend.

Frankly, I'd still do it even if it took me more time.

Let's look at what I'm doing today - a system to control a musical show through MIDI, written mostly in Javascript (I'm using the js box in Max/MSP, and to my pleasure, Javascript turns out to be quite a sophisticated modern language).

If there's a bug, it will appear in live performance, which will mean that probably something bad will happen in a live show, probably stopping the show. I really want to avoid this. I'd be willing to have a lot less functionality if I could be sure that that would never ever happen.

And at the same time I'm trying to do rapid development. I want this done and ready by Thanksgiving so I can use it next year.

There's no other way to do this than having extensive testing and a high-level language.

And in general, there really aren't that many tasks in my life for which correctness and reliability don't trump features...
posted by lupus_yonderboy at 8:56 PM on October 25, 2009 [1 favorite]

On the other hand, what if they'd taken the time they spent on the unit tests to make a nice GUI front end, or basic implementations of standard algorithms? It could have been more work for me to track down the bug, but less work for the average Hadoop user.

In many cases, the user would have been staring in baffled frustration at a more advanced program that didn't work, just like it didn't work for you. The unit tests allowed you to diagnose and fix the problem, and thus get a fully working program with fewer features. Lack of unit tests would have given you a more sophisticated product that didn't do anything at all.

Maybe I've missed something here, but it seems to me that you're citing compelling evidence for the usefulness of test cases, but drawing an odd conclusion. Your chmod problem would, presumably, have bitten most Windows users, so you appear to be arguing that it's better to have a more advanced program that doesn't actually function.
posted by Malor at 10:28 PM on October 25, 2009

Most of these arguments against testing strike me as shockingly selfish. Don't you people work in teams?
posted by butterstick at 10:39 PM on October 25, 2009 [1 favorite]

Re: Importance of algorithms, From "Introduction to Algorithms:" Although there are some applications that do not explicitly require algorithmic content at the application level (e.g., some simple web-based applications), most also require a degree of algorithmic content on their own ... Having a solid base of algorithmic knowledge and technique is one characteristic that separates the truly skilled programmers from the novices. With modern computing technology, you can accomplish some tasks without knowing much about algorithms, but with a good background in algorithms, you can do much, much more.
posted by geoff. at 7:25 AM on October 27, 2009

I'm enjoying this discussion.

Can anyone recommend good resources on programming style, as it is properly practiced, with details like lupus_yonderboy's warning:
When I am tempted to create a new class these days, my first thought is to say, "Can I avoid it entirely?"
posted by sebastienbailard at 8:59 PM on October 29, 2009

sebastienbailard, do you mean like Domain-Driven Design: Tackling Complexity in the Heart of Software or Gang of Four's Design Patterns?
posted by geoff. at 1:11 PM on November 7, 2009

« Older Heroes only slightly less evil than villains | Airport Float Newer »

This thread has been archived and is closed to new comments

MetaFilter

How To Program
October 24, 2009 8:37 AM Subscribe

Tags

Share

How To Program October 24, 2009 8:37 AM Subscribe

Tags

Share

How To Program
October 24, 2009 8:37 AM Subscribe