How Basketball-Reference Got Every Box Score
February 1, 2012 12:35 PM   Subscribe

Man, that guy was like a basketball monk in the scriptorium.
posted by resurrexit at 12:40 PM on February 1, 2012 [2 favorites]

How'd they get 'em? One man, fifty years and a whole lot of microfilm. I love this story.
posted by box at 12:53 PM on February 1, 2012 [3 favorites]

Me too. I love how the Internet is helping obsessive completists assist the rest of us when we need to know something. I love it so much. I'm 99% sure I'll never need the information that Dick Pfander has collected, but it makes me so happy that it's there.
posted by MCMikeNamara at 12:55 PM on February 1, 2012 [3 favorites]

They should really put in some crowd-sourced system to get these typed in, or at least the especially interesting ones. Images are nice but basketball-reference et al are more about slicing and dicing the stats.
posted by smackfu at 12:57 PM on February 1, 2012

It's interesting how many of them are hand-corrected too. That's true love.
posted by smackfu at 12:59 PM on February 1, 2012

I had that same thought. Right now, while everyone is excited to browse around them, would be the time to implement that too.
posted by roll truck roll at 12:59 PM on February 1, 2012

Yeah, there's a lot of discussion on it on the post on basketball-reference.
posted by smackfu at 1:11 PM on February 1, 2012

Wow, the basketball equivalent of Retrosheet.
posted by xmutex at 1:18 PM on February 1, 2012

I am impressed by the effort, truly an act of dedication, and I'm glad that basketball stat nerds will have the data they need.

Still hate basketball, though. However, I'm imagining a world without box scores for baseball and cricket, and ugh.

My first thought was OCR, but looking at a few samples, I think crowdsourcing is probably the better way. The question is "is there enough of a crowd?" I suspect yes -- and enough that you can actually have multiple people enter a given box score and thus have an automatic check on the data. Basically run it like does.
posted by eriko at 1:19 PM on February 1, 2012

I wonder -- and I'm wondering aloud so someone more knowledgeable can answer -- if there's something about the consistency of box scores (or more accurately, the lack of it) in basketball -- that would make integrating older stuff -- particularly non-NBA stuff -- with current data difficult.

I can't think of anything, but looking at the Retrosheet sheet link above shows that there's a lot I don't know about sports I think I know, let alone pro basketball.
posted by MCMikeNamara at 1:24 PM on February 1, 2012

So if I wanted to, I could look up that game I saw which (in my memory), Elgin Baylor of the LA Lakers did not score a point until the final seconds, when, with the Lakers one behind, he drove to the corner, chucked up a shot, and it swished as he continued to the locker room without breaking stride?

Although once I found the Grateful Dead's rendition of Sugaree from the LA Shrine Auditorium, which was the greatest moment of my listening life up to then, or since, and I had to admit that, listening to it again, it was CRAP.

So I might not look up the Laker box. . .
posted by Danf at 1:29 PM on February 1, 2012

MCMike - that's actually a pretty interesting topic, as many of the advanced stats that geeks use to analyze players nowadays (PER and Win Shares being the primary two) also depend on a few stats that were never considered standard box-score stats until the league got going (chiefly, blocks and steals, which were not recorded until '73-'74, I think, and 3-point shooting since the three-point shot was not introduced until the '79-'80 season). Assists have also been historically questionable, even to this day, as that's really a judgment call based on a number of factors.

The non-NBA stuff would be very difficult to integrate with modern NBA statistics, especially if you're looking at modern play and the myriad of European, Asian, and other professional leagues. As an example, Bonzi Wells could not crack the roster of the Timberwolves this year, after putting up some ridiculous numbers in China the year before. But Ricky Rubio, who put up relatively poor numbers for FC Barcelona last year, is currently a top candidate for Rookie of the Year in the NBA. Rubio's former teammate on that team, Juan Carlos Navarro, put up gaudy numbers (and is a fun player to watch - maybe one of the top Spanish players of all time), yet could not hack it in the NBA and only lasted a season before going home. That said, I think the success has less to do with the consistency of box scores, and more to do with style of play.

What this article really got me thinking about though, is a project that the NBA embarked upon several years ago to digitize all their archived video footage. As a hoops diehard, this is like some kind of dream - there was talk at the time about crowdsourcing much of the footage, so fans could label plays and players in "real-time" as the action happened. The end-result could have been that I could run a search for "Reggie Miller made 3-point shots", and be able to watch all 2,560 regular season makes. Then, for fun, I could add another search criteria like "within last 20 seconds of game" and get those results. This is a youtube-mixtape maker's fantasy.

Then the Donaghy scandal happened. This vast source of data (every game on tape, for fans to see), suddenly became a potential league nightmare. Not only could fans scour archived footage for KG's best dunks, but also for calls or non-calls that referees made. Donaghy, for those who don't know, was a referee who would tip-off gamblers on certain wagers (over/under bets), then make more or less foul calls to reach those bets (additional fouls lead to free throws, free throws are points added to totals while the clock is stopped, therefore running the point totals higher). Smart owners like Mark Cuban had already been doing this to some extent in order to make their teams better, and to point out that some officials did not "like" certain teams. I could be wrong, but Cuban actually blogged that the Mavs, a powerhouse in recent league history, had a horrendous winning percentage when a certain official (Joey Crawford maybe?) was reffing a game.

Anyway, as a basketball fan, it deeply upsets me that we might never have access to this archival footage, rich with untapped data. But I'm very grateful that this man took the time to track down the boxscores, without which I would never even have a starting point.

Also, stuck in my craw (and potentially of interest to readers of this rant), is the fact that the NFL will never release their "all-22" footage (aka Madden cam).
posted by antonymous at 5:47 PM on February 1, 2012 [5 favorites]

Smart owners like Mark Cuban had already been doing this to some extent in order to make their teams better, and to point out that some officials did not "like" certain teams. I could be wrong, but Cuban actually blogged that the Mavs, a powerhouse in recent league history, had a horrendous winning percentage when a certain official (Joey Crawford maybe?) was reffing a game.

There are actually private companies that provide this kind of service to professional sports teams. They buy the raw feeds from the broadcasters and chop it up and tag it and bag it so that teams can run simple queries to get the footage they are interested in. Total sports geek heaven.

I had a pretty useless business class taught by a former junior pro hockey player who was a partner in one of these companies years ago (more than 10).
posted by srboisvert at 8:53 AM on February 2, 2012

« Older More human than human   |   Oh, my God. What have we done? Newer »

This thread has been archived and is closed to new comments