Infodump update: we've added a few new features.

New since the August relaunch:

1. Comment length files. For folks interested in analyzing how the general size of a comment correlates to other aspects of site activity, you can now work with number-of-characters information about each comment on mefi, askme, meta and music. These are stored in new files separate from the existing commentdata files.

2. Metatalk thread closure information. We've had a "deleted" column in the postdata files previously, listing a 0 for undeleted and 1 for deleted threads, but now that column in the metatalk file can also have a value of 2 for closed threads and 3 for (rare) threads that are both closed and deleted.

3. Contact creation dates. If you're interested in looking at networking activity over time, you can now explicitly examine contact info in that light. Some of this info is approximate, since we didn't originally track creation date in that table. Details are on the wiki.

4. ID munging. On request from one user back in August, there's no an ID-munging function built into the Infodump scripts, which, for any user who specifically requests to be on the munge list, swaps out their actual userid for a unique 7-digit fake id throughout the dump. It's a very low hurdle to identification, but it's there, for whatever that's worth. Folks doing any analysis that makes assumptions about userids themselves as meaningful values should be aware of and account for this in setting up their analyses.

My to-do list is now completely clear. If folks have other Infodump additions they'd like to see in the future, let me know.

Also, there's been some interesting graphs coming out of this post-November thread, in case you're interested in datawankery but missed it somehow.
