Join 3,512 readers in helping fund MetaFilter (Hide)

Architecture of Open Source Programs
May 23, 2012 8:06 PM   Subscribe

Architecture of Open Source Programs The Architecture of Open Source Programs is a guide into the functional implemenation of major opensource code bases. Notable Open Source Projects that are included: BASH, CMAKE, LLVM,GDB,Puppet and PyPy among others
posted by Rubbstone (12 comments total) 48 users marked this as a favorite

It just needs woolly mammoths illustrated by David Macaulay.
posted by nwatson at 8:53 PM on May 23, 2012 [6 favorites]

The LLVM section doesn't seem to mention this, but GCC's code is a purposefully obfuscated hairball because RMS wanted it to be hard to incorporate it into other (potentially proprietary) projects.
posted by Jpfed at 9:20 PM on May 23, 2012 [3 favorites]

I bought the paperback of A.o.O.S.P. because it was by the editors of Beautiful Code, which was a joy and a pleasure to read, my favorite programming book of the year.

I haven't enjoyed the new one at all, and I think part of the reason is that it suffers from a problem apparent in a lot of print-on-demand books: their page layout and font size are just horrible looking. The pages get formatted so they look good on the PDF, on-screen, to a reader sitting up in front of a monitor.

It made me appreciate O'Reilly's editorial staff, those unsung heroes of the publishing biz. To be honest, software developers writing about their own designs tend towards the "mass of dense unparagraphed text" style naturally, and it takes a good editor to wrangle that into something that anyone else would want to read.

(The other reason is that Beautiful Code showed you the code, or at least described how things worked in enough detail that you could imagine writing it yourself. Architecture of Open Source wiffles on a lot about "components" and "modules" which "interact," and draws a lot of boxes with arrows.)
posted by Harvey Kilobit at 9:30 PM on May 23, 2012

PyPy is such a deeply weird beast, I feel like I need it re-explained to me on an annual basis. Python that’s compiled down to simpler Python using Python and then binary?
posted by migurski at 9:33 PM on May 23, 2012

Is this where I can vent about a horrible open source project? Good.

There's been a bug in my phone that's been driving me nuts for ages, so I decided to delve into the Android source to fix it. I actually knew exactly where the bad bit of code was, and what lines I needed to change to fix things. All I needed to do was to fetch the source tree, make my changes, recompile, and flash my phone.

Three days later, I finally came up for air, haven't fixed the problem, and have concluded that AOSP (the Android Open Source Project) is an unholy mess. Easily the messiest and worst open-source project I've ever worked with. Hardware-specific compiler flags make up half of the code, there are precompiled binary blobs (that are vital for the OS to function) scattered throughout the source tree, and it's actually fairly difficult to find a single source tree that will compile down to a working operating system; AOSP only directly supports the Nexus devices (and does so poorly at that). If you actually want the build to work, you also need to compile it using a 2-year-old version of Ubuntu (no, seriously, AOSP only supports building on Ubuntu 10.04, and I've painfully confirmed that it indeed won't compile on a more recent release). For some reason, newer OSes don't work, and nobody's really investigated why or bothered fixing it.

It's amazing that the operating system works at all, let alone actually permits developers to contribute code to it. It reminded me of Linux circa 1999, but still somehow without even a shitty form of package management. Coincidentally, I have compiled Linux from scratch (ah, the early days of Gentoo), and this was nothing like that. At least Gentoo had documentation, and presumably worked once you compiled the thing.
posted by schmod at 10:09 PM on May 23, 2012 [2 favorites]

schmod, you just need to build with gcc and g++ 4.4 versions, the rest of the system has little impact and the build process works on precise at the least.
posted by jaduncan at 10:20 PM on May 23, 2012

Python that’s compiled down to simpler Python using Python and then binary?

Sort of!

PyPy is both a python interpreter and a general toolchain for building interpreters. The toolchain operates on that "simpler python" (a (very) informally specified dialect called RPython—restricted Python), but you don't compile regular python down to RPython, you write it yourself. The RPython is then analyzed and optimized by the toolchain (which is also written in python, but—AFAICT—need not be written in RPython) to produce an interpreter.

So what you do is, you write an interpreter (for any language—or really, technically, you don't need to write an interpreter at all, though the toolchain is designed with that in mind—here, for instance, is an Unlambda interpreter in RPython, which can be compiled by the pypy toolchain) in RPython, and then you can compile that into an executable (which, optionally, can have a JIT with various hints—one of the weird things about pypy (the python interpreter) is that (more or less) the interpreter just-in-time compiles itself as it runs through your python code, rather than directly JITting the interpreted python).

The translation toolchain can translate any RPython into a binary. So the pypy guys wrote a python interpreter in RPython, which they then ran the toolchain (at first under CPython) on, generating a binary that included a JIT and various other neat things. Then, they could use that binary to run the toolchain. Ta-da, self-hosted.
posted by kenko at 10:41 PM on May 23, 2012 [2 favorites]

After trying to build my own NDK with a less ancient version of gcc I came to the same conclusions.
posted by jeffamaphone at 12:02 AM on May 24, 2012 [1 favorite]

I've been reading up a lot on PyPy lately for my own project. (It's still alive, really!) It's definitely interesting.
posted by JHarris at 4:36 AM on May 24, 2012

These books are a real public service. There's a lot of amazing open source software out there like LLVM and Git and GDB that are nearly opaque. Just because the source is open doesn't mean you can understand how it works; a high level architectural explanation like this is good.

migurski: PyPy is awesome. I think of it as being roughly analagous to Java, except that the JVM itself is written in a restricted Java. That's not really fair though, kenko's explanation is more accurate. What's interesting is PyPy is doing some very aggressive computer science stuff, I've collected a few links over the years. They seem serious about using Software Transactional Memory in the actual execution environment, for instance, an idea I thought was purely relegated to academia. The PyPy status blog is good reading.
posted by Nelson at 7:56 AM on May 24, 2012 [1 favorite]

Nelson: "The PyPy status blog is good reading."

And for people who like numbers, here are some humbling numbers. I remember when Rigo's predictions for PyPy were widely derided, despite his already impressive success with psyco. These numbers means everyone can be quiet now and just watch in awe. It's stunning, cutting edge stuff.
posted by vanar sena at 12:04 PM on May 24, 2012 [2 favorites]

For any who are interested in a slightly dated but very interesting discussion on tracing JITs, there was a great thread on LtU a couple of years ago.
posted by vanar sena at 12:32 PM on May 24, 2012

« Older America's Test Kitchen Super Quick Video Tips: "Te...  |  Vintage meezers. NSFW.... Newer »

This thread has been archived and is closed to new comments