AWK: A History
March 29, 2018 8:00 PM   Subscribe

 
Looking forward to subsequent sed post.
posted by leotrotsky at 8:02 PM on March 29, 2018 [13 favorites]


It's more or less the same, with some substitutions
posted by The Gaffer at 8:08 PM on March 29, 2018 [76 favorites]


What a great interview! Thanks for posting.
posted by Annika Cicada at 8:16 PM on March 29, 2018


Of course each of those two lines of AWK was 500 chars long.
posted by octothorpe at 8:21 PM on March 29, 2018 [13 favorites]


"I once saw a paper that had a 1,000-line C [program] that took less time to grok and had more immediately reusable chunks of code than the two lines of AWK I wasted 2 hours trying to explain to our [CS graduate student] intern - who was really just doing some glorified data entry."
posted by Anoplura at 8:27 PM on March 29, 2018 [30 favorites]


"One Monday morning, I walked into my office to find a person from the Bell Labs microelectronics product division who had used AWK to create a multithousand-line computer-aided design system. I was just stunned."

I'm having a hard time even imagining how such a thing might possibly work.
posted by smcameron at 9:10 PM on March 29, 2018


I'm having a hard time even imagining how such a thing might possibly work.


several thousand lines of AWK inevitably become self-aware and then you just ask it to do board layout or whatever
posted by GuyZero at 9:31 PM on March 29, 2018 [20 favorites]


jokes aside, for certain type of data processing AWK is really great. Sadly it doesn't handle CSV files without some handholding so I tend to do that kind of stuff in python these days since it has a good csv library and I can usually manage to remember python syntax better than AWK's crazy syntax that was developed before the advent of parser tools and sane program semantics. Also with the move to JSON if you want to munge that then AWK is way, way out while python, again, does a really good job of reading in json and letting you slice & dice it.
posted by GuyZero at 9:34 PM on March 29, 2018 [7 favorites]


David Wall? David Wall. I am the sad. Larry, please.

Also, there's a non-zero chance the line above could be executed by the Perl interpreter.
posted by roue at 9:34 PM on March 29, 2018 [5 favorites]


Sadly it doesn't handle CSV files without some handholding so I tend to do that kind of stuff in python these days since it has a good csv library and I can usually manage to remember python syntax better than AWK's crazy syntax that was developed before the advent of parser tools and sane program semantics.

BEGIN { FS = “,” } ?
posted by invitapriore at 9:40 PM on March 29, 2018 [5 favorites]


> AWK's crazy syntax that was developed before the advent of parser tools

AWK was developed using lex and yacc, and since yacc literally stands for "Yet Another Compiler Compiler", I think it's safe to assume these were not the first parser tools. But maybe you mean something different by "parser tools."
posted by smcameron at 9:43 PM on March 29, 2018 [2 favorites]



perl -e '((David eq Wall) ? "David" : "Wall"); $I_am = the_sad; (Larry, please)'

Also, there's a non-zero chance the line above could be executed by the Perl interpreter.


Fixed that for you.
posted by alex_skazat at 10:00 PM on March 29, 2018 [5 favorites]


Not directly awk related, but I had to manipulate some CSV files recently and csvgrep from csvkit was very handy.
posted by fings at 10:06 PM on March 29, 2018 [1 favorite]


My first edition of The AWK Programming Language has sat proudly in my bookshelf for decades and is much worn...
posted by jim in austin at 10:38 PM on March 29, 2018


BEGIN { FS = “,” }
10,"foo,bar",20
posted by russm at 11:59 PM on March 29, 2018 [28 favorites]


obviously just stop using commas as delimiters in your CSV and you’ll be fine

use the little Unicode snowman or whatevs I’m sure your end users can roll with that
posted by Doleful Creature at 12:04 AM on March 30, 2018 [13 favorites]


If anyone else is surprised at the mention of Perl's popularity, this interview is from 2008.
posted by Pronoiac at 12:29 AM on March 30, 2018 [2 favorites]


Many CSV readers and writers can be configured to use ASCII 31 Unit Separator
posted by Phssthpok at 1:58 AM on March 30, 2018 [1 favorite]


Thus, we can create an AWK program to retrieve Naomi's phone number by simply writing $1 == "Naomi" {print $2}

Socially AWKward programmer's fantasy.
posted by pracowity at 2:00 AM on March 30, 2018 [7 favorites]


I once had someone answer an interview question in AWK, which was simultaneously absolutely the right tool for the job and completely unexpected.
posted by hoyland at 4:37 AM on March 30, 2018 [5 favorites]


Looking back, would you do anything differently in the development of _____?

One of the things that I would have done differently is instituting rigorous testing as we started to develop _____. We initially created _____ as a "throwaway" _____, so we didn't do rigorous quality control as part of our initial implementation.

Feel free to fill in the blanks as you see fit. Computer science in a nutshell.
posted by the painkiller at 4:41 AM on March 30, 2018 [31 favorites]


Quite a few years back I had a friend planning a small business presentation the next day and wanted to print up a proposal including some spreadsheet data.

I was in an AWK zone from working on another project, so using Enable for the spreadsheet data, and Ventura for the formatting (quite a few years back I said), I wrote a quick AWK script to take a CSV file from Enable and apply the Ventura formatting.

We went through a few iterations of the proposal as he kept on wanting to try different tweaks in the spreadsheet data, but it was definitely a lot easier than starting in Ventura and inputting the new numbers each time.
posted by rochrobbb at 5:22 AM on March 30, 2018


If anyone else is surprised at the mention of Perl's popularity, this interview is from 2008.

I was just surprised that in ten years nobody corrected Larry Wall's first name.
posted by fedward at 6:06 AM on March 30, 2018


I think AWK is one of those things young people don't learn anymore? Between Python and Ruby and NodeJS I don't think there's as much AWK going. Also jq, which is kind of AWK for json.

AWK is really good at two things. First is splitting records on white space or other separator, the behavior of FS. The other is the implicit "run this code as a loop over every line in a file", with optional regexp match. Put them together and you can do a lot of powerful stuff straight from your Unix command line without ever saving the awk program. Ie: cat file | awk '/foo/ { print $3 / 10 }'. I still do that occasionally but as noted above this doesn't work great for CSV and not at all for JSON.

I've always found it weird other languages don't support this simple ad hoc text processing idiom. Perl sort of does with perl -pie, but Perl is even deader than AWK now. (Don't @ me.)
posted by Nelson at 7:01 AM on March 30, 2018


I use it daily to comb through log files of network gear. Lots of my colleagues are only comfortable with Windows, and pretty much only know grep. I've got a Mac, so running awk scripts on logs to pull out the bad actors on the network and using sed to change a hundred lines of configuration with one a one liner is a regular occurrence, and it seems like black magic to them.
posted by Slap*Happy at 7:52 AM on March 30, 2018 [2 favorites]


I'm still a very frequent awk user. A couple of things to watch for if you think you want to write that one-liner just anywhere:
  1. Different systems ship with very different awk interpreters: Debian ships with mawk, Ubuntu with gawk (I think), Mac OS with an old (2007) version of otawk/bwk awk, tiny machines with Busybox awk. All have minor differences that aren't portable.
  2. awk doesn't do UTF-8 well, if at all. If your substr( … ) returns strings in quite the wrong place, you've got the unicode all up in there.
awk's a really simple model for text manipulation: for every line in the input, run the whole awk script, and everywhere a pattern matches, run action. We taught a whole bunch of lexicographers how to do text edits with awk, and most of them got into doing almost all of their editorial computing themselves.

The default associative array handling is very natural if you throw out all compsci-assumptions about arrays you thought you knew. It just works!

When you know awk, you find it everywhere. Even the Processing and Wiring/Arduino languages have a similar ‘everything is a loop’ concept.
posted by scruss at 7:57 AM on March 30, 2018 [2 favorites]


Yep, AWK's a rice cooker: if you have rice with everything you eat, then it's really handy to have in addition to your main cooker, or if you only eat rice at home, that's all you need.

But you're not going to make a good sauce in it, regardless of how fluffy and easy it makes preparing your rice.
posted by ambrosen at 8:14 AM on March 30, 2018 [6 favorites]


But maybe you mean something different by "parser tools."

Yeah, it's not exactly what I meant... awk is a perfectly good programming language but using it is like looking at Burgess shale fossils, it's clearly from a whole different world than the world of C/BCLP-derived languages. BEGIN/END blocks, the metaphor of patterns and actions that are implicitly loops, etc. Don't let all those curly braces fool you.

I once had someone answer an interview question in AWK

ugh! I mean, sure, here, watch me read the man page for five minutes.
posted by GuyZero at 8:18 AM on March 30, 2018 [3 favorites]


awk's more of a living fossil than a Burgess shale fossil: maybe a horseshoe crab: still thriving, and still somewhat useful, but very different to everything else that's around.
posted by ambrosen at 8:27 AM on March 30, 2018 [5 favorites]


I once got drafted into a project where large amounts of text were being scraped and formatted, and the architect decided that Perl was too slow, and everyone had to write their parsers in C for 50+ different text formats. I decided to use system("perl")
posted by RobotVoodooPower at 8:52 AM on March 30, 2018 [2 favorites]


I worked for a telecom company that had purchased awk from AT&T (this was a long time ago) that we extended to do call record processing for PBX systems. It became my job to maintain it and I grew to love it. Many years later I had to demo a product to Irish customs to show that the controller was a "freely programmable" computer and used awk to show how you could write a program on the fly and run it.
posted by tommasz at 9:14 AM on March 30, 2018 [1 favorite]


I have a lot of respect for AWK, and not just because it's a useful language that can be fully described in a single man page. AWK is the piece that makes Unix almost make sense. It's a plumbing tool that brings Unix's dumb pipes of text so much closer to the Lisp machine or Smalltalk idea of code freely exchanging structured data. (I guess the current incarnation of this concept is web services exchanging JSON, which... This industry, I swear...)
posted by skymt at 9:38 AM on March 30, 2018 [5 favorites]


I think AWK is one of those things young people don't learn anymore?

Yeah, I suspect the rise of non-line-oriented formats like XML and JSON have made a lot of the classic UNIX command-line utilities less useful.
posted by mhum at 10:39 AM on March 30, 2018 [4 favorites]


AWK is the one tool that I KNOW I should be using more and better, but that I have consistently avoided learning. I can write perl but I cannot understand awk. I've written perl scripts that I almost certainly could have accomplished in just a few lines of awk, but it just feels... well.. uh, awkward. sorry.

My computational work lies entirely in the realm of parsing and manipulating excessively large text files (I do genomics, so 2 Gb files of short DNA sequences and files of hundreds of thousands of assembled transcripts and matrices of counts are my bread and butter). There's no damn reason that I shouldn't be using AWK for like 80% of my everyday stuff. But I just... don't. I'll multi-pipe sed expressions with cut and tr before AWK.

I really need to come to Jesus.
posted by Made of Star Stuff at 11:09 AM on March 30, 2018 [1 favorite]


I'll multi-pipe sed expressions with cut and tr before AWK.

I've been there... there are many paths to the top of the mountain.
posted by GuyZero at 11:31 AM on March 30, 2018 [1 favorite]


Yeah I'm not sure I'd suggest learning AWK now for large text file manipulation. I mean I've known awk for 25+ years and I really only use it now for occasional one liners and often even those are better done with cut and tr. I never ever write an awk program that I save to disk.

Weak Unicode support is reason enough not to use awk. You mentioned Perl; if you don't know Python I'd put time into getting good with that, first. Both because it's a nice language and because so much scientific computing is done with it these days.
posted by Nelson at 11:53 AM on March 30, 2018


Made of Star Stuff I have used awk to process large sequences of text to extract sub-sequences with a state machine style setup. It was trivial to code in awk, and ran impressively fast. In my case, the problem with the other languages such as perl, python, ruby was the interpreter overhead (and I say this as someone who has written a lot of perl). AWK had no additional libraries to load and was done by the time the others got started.

I learnt awk from the O'Reilly sed & awk book and it got me up to speed in a day.

Given the problem descriptions i have heard from my friends who do DNA sequence work in their computation biology work, i would say awk is a great fit for your genomics work, especially since you do not need Unicode support.
posted by techSupp0rt at 12:10 PM on March 30, 2018


I suspect that people like Ruby and perl for the same reason they like European cars. They have some awkward design features that force you to get to know them well. By then, you are emotionally attached to the language.

Perl gets the job done. I wouldn't call it some Euro car. It's like a Honda Civic.




Or a mountain bike.
posted by alex_skazat at 4:17 PM on March 30, 2018


Made Of Star Stuff, your multi-pipe sed expressions with cut and tr will likely be a lot faster than awk. Awk is many great things, but fast isn't one of them. otawk can be blazingly fast on regexes, and mawk with its tokenized input can run larger scripts quickly, but there's no general, high speed awk for all problems. I write awk scripts for speed of development and ease of understanding, not speed of operation.

In the late '90s, I briefly had access to a system that gave you a shell that assumed that all stdio and pipes were in SGML/XML. Its awk allowed XPath-like selectors to be used as patterns, way better than xslt. I don't think much ever came of it; istr it was something out of Edinburgh university.
posted by scruss at 4:39 PM on March 30, 2018 [1 favorite]


Young people also don’t learn SQL, in my experience. The first time they use it is the first time they have some pressing task that requires interacting with a database. And typically the trepidation of using it follows them around for a long time. I don’t get it. SQL is such a simple, beautifully expressive language that is finding new lives to this day.
posted by mantecol at 7:52 PM on March 30, 2018 [2 favorites]


Pretty much all of my own experience with awk was in the form using it for music analysis.

A quick google shows that, yep, the Humdrum Toolkit is still out there & kicking. That is the historic Humdrum website & if you're planning to actually use it you'll want the modern Humdrum website and maybe the Github repository, with updates as recently as 17 days ago.

I will say no more except this from the Humdrum Toolkit FAQ:
Humdrum allows users to pose and answer questions such as the following:
  • In Stravinsky, are dissonances more common in strong metric positions than in weak metric positions?
  • Is there evidence of greater metric syncopation in later Gershwin than in early Gershwin?
  • What passages of the original `Salve Regina' antiphon are preserved in the settings by Tomas Luis de Victoria?
  • In Urdu folk songs, how common is the so-called "melodic arch" -- where phrases tend to ascend and then descend in pitch?
  • Which of the Brandenburg Concertos contains the B-A-C-H motif?
  • What are the most common fret-board patterns in guitar riffs by Jimi Hendrix?
  • Which of two English translations of Schubert lyrics best preserves the vowel coloration of the original German?
  • After the V-I progression, which harmonic progression is most apt to employ a suspension?
  • In what harmonic contexts does Händel double the leading-tone?
When David Huron, author of Humdrum, posted an announcement for the first release of the Humdrum Toolkit along with the examples I quoted above to a music theory listserv I was on at the time, maybe mid 1990s, it nearly caused a riot.

"It's IMPOSSIBLE for a computer to do all that, No computer brain could possibly analyze the harmony in Bach, We're being replaced by artificial intelligence, blar blar blar."

It led to weeks-long flamewars where these apparently fairly elderly and highly opinionated music theory professors would go on and on for pages with various theories about how they imagined this must work. Which were all so, so very wrong from the start.

They had an idea in their heads of how computers must work, and it resembled nothing at all that awk actually did.

I confess that when I saw that list myself I thought for quite a while that it might be some kind of hoax or, more likely, a crank--some kind of Archimedes Plutonium for the music theory world, maybe.

But no: It turns out is really is impossible for a computer to do all that, but slightly clever awk programming plus some really nice data formats Huron developed for encoding various aspects of musical composition made it sorta easy or at least possible. Especially if you were smart enough to ask the right type of research questions. The magic was always in the human who was smart enough to ask nice questions, plus the data analysis tools to tease out the answers if you were patient enough.

Plus smart enough to spend time doing the type of analysis that humans are best at, but then putting into some kind of a format that is easily readable and analyzable by awk.

Just for example if you're starting point is 371 Chorales by J.S. Bach that have already been harmonically analyzed & turned into a simple, easily analyzable text-only format, you can pretty easily see how you might be able to use awk to figure out which chords are more or less likely to follow other chords and many other similar such things.

Not actual magic, but kind of magical actually.
posted by flug at 7:57 PM on March 30, 2018 [16 favorites]


gawk actually does pretty well with Unicode. bwk doesn't. I wonder if it's worth putting my time in to fix it, or if nobody really cares any more.
posted by enf at 8:39 PM on March 30, 2018 [1 favorite]


I suspect the rise of non-line-oriented formats like XML and JSON have made a lot of the classic UNIX command-line utilities less useful

Fight the power
posted by flabdablet at 10:57 PM on March 30, 2018 [1 favorite]


Anything AWK can do, Perl can do better
Perl can to anything better than AWK
No, it can't
Yes, it can
No, it can't
Yes, it can
No, it can't
Yes, it can! Yes, it can!
Back in the day, before Linux, when Perl was still a bit of a gleam in someone's eye, AWK/etc ported to DOS (MKS Power Tools???) was my CS/UNIX/geek savior in the hell of Windows3/DOS world. </long story deleted> I turned a week long five person (spare moments) job of writing a weekly report into one minute of carrying a floppy from one machine to another and "PRESS THE BUTTON FOOL". That got me permission to use the COLOR PRINTER and an offer to make the "your work-study (free student labor) is up but we'll "work something out"."

I did not know at the time that "we'll work something out" was a sly offer to create a position and hire me full-time-real-job. <sigh>. But still they would not buy me a real UNIX machine (say a Sun 3/50 would do) so I passed.

So, bittersweet is AWK memories. It (and k?sh/sed/grep) (and craptacular ports of awk/sed/grep/ksh for DOS) carried me through the years before Perl and Linux and OMFG Widows and Mac are still just getting barely past craptacular.... </grump>.
posted by zengargoyle at 12:29 AM on March 31, 2018 [1 favorite]


> gawk actually does pretty well with Unicode. bwk doesn't. I wonder if it's worth putting my time in to fix it

Yes please!

> Fight the power!

See also Dan Egnor's xml2/.

> I would never slander a Honda by comparing it to perl.

pfft. Perl is still lovely. I still code in it whenever I can, partly because it's the language I know best, but mostly because it irks the young 'uns. It made my heart sing recently to find out how python's numeric libraries are so blazingly and uncharacteristically fast: they're just wrappers around standard FORTRAN libraries from the 1970s and 80s, and building them calls gfortran --std=legacy ... so many times.
posted by scruss at 7:32 AM on March 31, 2018 [1 favorite]


> AWK/etc ported to DOS (MKS Power Tools???)

Oh, yes. This is when I learned that simple / vs \ broke everything.

There are still smoking piles of the wreckage of the / vs \ wars in the dark corners of my computer somewhere . . .
posted by flug at 8:47 AM on March 31, 2018


Perl is still lovely. I still code in it whenever I can, partly because it's the language I know best, but mostly because it irks the young 'uns.

I'm 51 and I think you are a lunatic
posted by thelonius at 9:02 AM on March 31, 2018 [3 favorites]


I'm 56 and I know he's one of the happy kinds.
posted by flabdablet at 9:45 AM on March 31, 2018 [3 favorites]


Sadly it doesn't handle CSV files without some handholding

Some handholding is available.
posted by flabdablet at 11:47 AM on March 31, 2018 [3 favorites]


Some handholding is available

clever!
posted by GuyZero at 7:50 PM on March 31, 2018


Just wanted to pop back in here to say I used awk today in a professional context. I probably could have done the same thing with bash, but awk felt right. Also I have no fear of Perl.
posted by fedward at 8:43 AM on April 6, 2018 [1 favorite]


« Older And the heat goes on, and the heat goes on   |   You fear the year will blow like a breeze through... Newer »


This thread has been archived and is closed to new comments