Join 3,430 readers in helping fund MetaFilter (Hide)


THOMAS
June 8, 2012 9:42 AM   Subscribe

Apparently, the USHOR Appropriations Committee feels uncomfortable about permitting convenient bulk access to THOMAS database of bills.
posted by jeffburdges (34 comments total) 17 users marked this as a favorite

 
Journalism from the Washington Examiner? That's a welcome change.
posted by anigbrowl at 9:51 AM on June 8, 2012


I need a decoder wheel.
posted by Mezentian at 9:54 AM on June 8, 2012 [1 favorite]


See also: PACER charging ten cents per page to download an HTML file that shows a list of PDFs you can download for ten cents per page.
posted by Holy Zarquon's Singing Fish at 10:02 AM on June 8, 2012 [18 favorites]


The fuck is wrong with these goddamned luddite leaders of ours?
posted by symbioid at 10:05 AM on June 8, 2012


If you're a PACER user and using Firefox, please install Recap!
posted by longdaysjourney at 10:06 AM on June 8, 2012 [11 favorites]


Relevant bit from the report:
The GPO currently ensures the authenticity of the congressional information it disseminates to the public through its Federal Digital System and the Library Congress’s THOMAS system by the use of digital signature technology applied to the Portable Document Format (PDF) version of the document, which matches the printed document. The use of this technology attests that the digital version of the document has not been altered since it was authenticated and disseminated by GPO. At this time, only PDF files can be digitally signed in native format for authentication purposes. There currently is no comparable technology for the application and verification of digital signatures on XML documents.

At this point, however, the challenge of authenticating downloads of bulk data legislative data files in XML remains unresolved, and there continues to be a range of associated questions and issues:

Which Legislative Branch agency would be the provider of bulk data downloads of legislative information in XML, and how would this service be authorized. How would ‘‘House’’ information be differentiated from ‘‘Senate’’ information for the purposes of bulk data downloads in XML? What would be the impact of bulk downloads of legislative data in XML on the timeliness and authoritativeness of congressional information? What would be the estimated timeline for the development of a system of authentication for bulk data downloads of legislative information in XML? What are the projected budgetary impacts of system development and implementation, including potential costs for support that may be required by third party users of legislative bulk data sets in XML, as well as any indirect costs, such as potential requirements for Congress to confirm or invalidate third party analyses of legislative data based on bulk downloads in XML? Are there other data models or alternative that can enhance congressional openness and transparency without relying on bulk data downloads in XML?
posted by RobotVoodooPower at 10:08 AM on June 8, 2012 [2 favorites]


That sucks.. Thomas is unwieldy, but great resource.. at least they are not planning to cut off the whole service... yet
posted by snaparapans at 10:11 AM on June 8, 2012 [1 favorite]


This is really, really important and it's part of a nasty larger trend that's beginning to emerge: Americans are losing access to government information that they previously had access to.

Examples:
-Statistical Abstract (when Census' funding was cut, this was what they decided to get rid of. They refuse to work with advocates to bring the division back)
-American Community Survey (an incredibly large number of groups, including industry, rely on the information collected here. This one may survive)
-Sourcebook of Criminal Justice Statistics (Justice refuses to fund this any longer, but it's an important collection of crime data which doesn't overlap as much with the Uniform Crime Reports as Justice would like you to think)

I think there are a few others I'm missing (and other related issues, like Congress refusing to confirm the Public Printer and the potential move of the Federal Depository Library Program to the Library of Congress and the threatened but now rescinded cuts to FDSys, which is a gigantic source of government, especially legislative, information).

I cannot overstate the importance of open government information enough--if you're interested, the Sunlight Foundation is an important group to follow--and I also cannot overstate the wealth of what we're losing in an incredibly stealthy way.

GPO is pushing back on this because of their demand for authentic government information, which I think is hilariously short-sighted. They need to advocate for Congress to update Title 44 and they're fighting about getting XML documents digitally signed.

I am a federal government documents librarian. The cuts which we are facing are rolling back American access to government document information, back prior to the founding of the FDLP in the 1813.
posted by librarylis at 10:11 AM on June 8, 2012 [56 favorites]


Journalism from the Washington Examiner?

I know you wouldn't know it from the rest of the paper's content, but the Examiner actually has some decent scoops on their Fed pages!
posted by The 10th Regiment of Foot at 10:25 AM on June 8, 2012


(And lest you think Canada is doing better, they just agreed to some crippling cuts to their federal libraries and will cease distributing documents in print through their FDLP equivalent in 2014.)
posted by librarylis at 10:28 AM on June 8, 2012 [3 favorites]


"In the spirit of Thomas Jefferson..."

There's your RDA of irony, right there.
posted by clvrmnky at 10:32 AM on June 8, 2012


librarylis, the same thing is happening with the public records that we genealogists need. The crucially important Social Security Death Index is at risk of being removed from public access (but not bank or insurance company access, oh no), and Massachusetts is now at risk of having its last century of vital records closed off for "privacy" reasons, despite having been an open access state since the founding of the colony in the 1600's!
posted by Asparagirl at 10:33 AM on June 8, 2012 [1 favorite]


It's a damn fine business model. Get taxpayers to pay for something and then get them to pay again when they want to look at what they paid for.
posted by tommasz at 10:36 AM on June 8, 2012 [7 favorites]


See also: PACER charging ten cents per page to download an HTML file that shows a list of PDFs you can download for ten cents per page.

That's totally reasonable. After all, they have Xerox each page to create a new copy to send to your web browser.
posted by cosmic.osmo at 10:36 AM on June 8, 2012 [2 favorites]


Background reading: check out the site for RPAC, and their whitepaper "Open Access to Public Records: A Genealogical Perspective; A White Paper by the Records Preservation and Access Committee" (PDF).

There is a very strong chance we could lose access to the SSDI this year. It's infuriating.
posted by Asparagirl at 10:42 AM on June 8, 2012 [1 favorite]


Actually, this raises a legitimate question: Who the hell is responsible for archiving and disseminating the activities of Congress via the web? There are a handful of agencies that could ostensibly be responsible for this, and none who actually want to step up to the plate.

For fairly obvious reasons, the GPO is quickly fading into an agency struggling for relevance in the 21st Century. The LOC is tremendously underfunded, and in a somewhat similar boat, while the National Archives doesn't seem to be interested in cataloging and distributing any of the low-level stuff.

This is a big deal in my office, where we film almost every single Senate hearing, press conference, as well as what goes on on the Senate Floor (ie. what you see on C-SPAN 2). The only thing that's clear is that we are not responsible for archiving the Senate's activities, and therefore we purge everything that isn't claimed by a senate office or federal agency within 2 months. The disk space (and IO bandwidth) required to record uncompressed 720p HD video from 12+ sources for 6+ hours a day is tremendous, and we simply can't archive it all.

Effectively, this means that the source material from every senate hearing and press conference vanishes rather quickly. The Senate Floor broadcast is the only thing that makes its way to the LOC. Unfortunately, very little interesting debate takes place on the Senate Floor these days. Lots of speeches and posturing to empty rooms. Committee hearings provide a much better insight into the "down and dirty" of the legislative process (even though these also have a tendency to include lots of posturing themselves).

Right now, we send a low-resolution copy of our committee hearings to staffers from each committee, who ironically now offer you the best chance for actually watching videos of old hearings. In turn, these videos are actually hosted by the Senate Sargent at Arms, which is another entity that actually really shouldn't be acting as a librarian; if anything, that duty more naturally falls on the Secretary of the Senate. Unfortunately, this means that our old RealMedia and FLV stuff will permanently stay in those formats, since we can't convert them without completely destroying the picture-quality. To compensate for future technological shifts, we now record DVD-R copies of each of our hearings, just in case somebody might want one down the road. However, we're not librarians, DVD-Rs are certainly not archival, the system is by no means thorough, and it's vastly outside of our stated mission.

The LOC and Archives have both promised to develop systems that can more effectively handle this stuff, although I personally haven't seen much progress on those fronts (although I don't work directly on either of those projects).

On top of THOMAS, there's also LIS (the Legislative Information Service), which is basically a more comprehensive version of THOMAS only available to internal Congressional staffers. It's a bit rougher around the edges, but provides a lot more data (unfortunately, very little in XML). I honestly have no idea why it isn't available to the public. The House actually has a very nice XML data portal, while the Senate provides a bit less data. Internally, the Senate's started using some surprisingly-modern noSQL stuff, which should theoretically make it very easy to open up an API for legislative information, although this doesn't seem to have happened just yet....

Organizations like the Sunlight Foundation, and Sunlight Labs work to collate as much open data from these services as possible, and present them via nice XML and JSON APIs. You should support them and give them your money. They do great work.

There's also all sorts of intricacies and weirdness present with this data. As with any data schema, bills, documents, senators, committees, etc. all (er... mostly) have unique identifiers. Unfortunately, there's no great place to grab a list of these primary keys (even internally!), and in some cases, there are actually multiple sets of primary keys (I know of 3 that can be used to identify Senators). Groups like the Sunlight Foundation often end up assigning their own identifiers, adding yet another dimension to the data. My favorite anecdote is that the unique 6-character identifiers used by LIS to identify Senate Committees and Senators are arbitrarily assigned by hand by a lady who's worked here since the 1970s, when we first started assigning those IDs. (The system was somehow inspired by BASIC line numbers; I didn't ask why or how...)

Disclaimer: This is only my perspective, and these are my own opinions, and I'm sure that there are some factual errors in my understanding of this vast ecosystem.
posted by schmod at 10:44 AM on June 8, 2012 [22 favorites]


librarylis, is there anything in the Statistical Abstracts that isn't available in an easier, database-compatible format on either the Census website or its FTP server? I can't think of a less useful way to disseminate that information these days than in a book, or clunky PDF tables. I work with census data all the time, and never touch the Abstracts.

The proposed elimination of the ACS is absolutely horrible, and is a way for Republicans to hand off basic market demographic data to private collectors for profit. Absolute madness. It also serves as a convenient way to kill funding for social programs...if there's no public data demonstrating need or useful for funding allocation...the program dies. To hell with 'em.
posted by Hollywood Upstairs Medical College at 10:45 AM on June 8, 2012


WARNING, SELF-LINK AHEAD

Shit like this is one of the reasons I spent my free time last year building LeafSeek, a free and open source historical records database system for people and organizations and archives and libraries to use. Open source code, no corporate owners, and provides an alternative to vendor lock-in. Oh, and a built-in API and geo-coded records, among other neat things.

(MetaFilter even helped me name it!)

posted by Asparagirl at 10:49 AM on June 8, 2012 [9 favorites]


My tax dollars went to create those bills. Seems reasonable I should be able to see what they bought.
posted by Triplanetary at 10:58 AM on June 8, 2012 [1 favorite]


If you can't see the laws, how do you know they're fully funding Medicare?
posted by Holy Zarquon's Singing Fish at 11:03 AM on June 8, 2012


Funny. If you look at USAID "Democracy and Governance" programs around the world, one of the things that we're trying to promote with our tax dollars is promoting legislative transparency.
posted by RandlePatrickMcMurphy at 11:33 AM on June 8, 2012


Skimming the thread I didn't see this mentioned: the Washington Post today had a related article on govtrack.us
posted by exogenous at 11:40 AM on June 8, 2012


There currently is no comparable technology for the application and verification of digital signatures on XML documents.

This is complete and utter bullshit. Cryptographic signing exists for any filetype.
posted by odinsdream at 11:58 AM on June 8, 2012 [3 favorites]


librarylis, is there anything in the Statistical Abstracts that isn't available in an easier, database-compatible format on either the Census website or its FTP server? I can't think of a less useful way to disseminate that information these days than in a book, or clunky PDF tables. I work with census data all the time, and never touch the Abstracts.

Yes, I am so glad you asked that question! Part of the reason that librarians have pushed back so hard about losing the Stat Ab is because a significant portion of it comes from private sources (scroll to the bit about 100 private sources providing 13% of the total StatAb).

That is, the Statistical Branch not only coordinates a myriad of messy government sources into one concordant set of data, it also (used to!) put together private sources to supplement that data. This will no longer happen, even though ProQuest and Brennan will continue to publish the Stat Ab (note: now it will cost money, since the government is no longer funding it. A significant move from public to private there).

Asparagirl, yes, you're so right. I forgot about the SSDI but that one's on the list as well.

I really think people are profoundly underinformed by what we're losing here; there's certainly been media presence but this is easily as important as SOPA and it's not going viral because it's piece by piece bit by bit. I voted for Obama and I am horrified at how agencies are acting under his nominal leadership.
posted by librarylis at 12:02 PM on June 8, 2012 [5 favorites]


My tax dollars went to create those bills. Seems reasonable I should be able to see what they bought.

I have good news and bad news. Which do you want first?
posted by odinsdream at 12:03 PM on June 8, 2012 [2 favorites]


this is easily as important as SOPA and it's not going viral because it's piece by piece bit by bit

THIS, so much.
posted by Asparagirl at 12:11 PM on June 8, 2012 [1 favorite]


Hollywood Upstairs Medical College: The proposed elimination of the ACS is absolutely horrible, and is a way for Republicans to hand off basic market demographic data to private collectors for profit.

My local Fox News station the other day broadcast a "report" about the ACS under the scare heading WASTE WATCH. If anything's a waste that should be watched (or not watched), it's freaking Fox News.

librarylis: I voted for Obama and I am horrified at how agencies are acting under his nominal leadership.

I agree with every single thing you're saying a million times over, but I don't know about this. Isn't it the federal agencies that are fighting against the tide to keep these vital resources available, and the legislative branch, not the executive, that's pushing so insistently to slash them, on the pretext that they are a "waste" of taxpayer money because we really don't need to know how many flush toilets are in each American household? The GPO is doing the best it can, for the most part, but it's like fighting Goliath with a piece of lint.

What is absolutely clear is that all three branches of government, whatever lip service they pay to the notion of transparency and open access, have a strong and vested interest in working behind the scenes to make less information available, not more, and in disappearing as much information as they can get away with while replacing it with misinformation.
posted by blucevalo at 12:21 PM on June 8, 2012


Also, speaking as someone from the Eastern European focused subset of the genealogy community, it is amazing to me how well open access to historical and vital records in a country's archives correlates to the country's overall transparency and democracy. The differences in how Poland, Hungary, Ukraine, Romania, and Moldova (to name five I am very familiar with) open or close parts of their systems at will (either de jure or de facto) is very much a statement about how the country is doing politically.

Some examples, off the top of my head:

- Moldova's very long-time archives director, at least as of a few years ago but not sure if this is still the case, is ex-KGB and a total Luddite. Guess how easy it is to get records from there, or how responsive they are to requests for records from abroad (even when we want to give them money for the copies as fair payment). Guess how easy it is to even find out what the hell they have in their archives at all. Guess how open Moldova's society is compared to other European countries.

- Poland, in contrast, is relatively open and easy to deal with, only imposing a century moratorium on vital records access for privacy reasons. They have tons of websites where they proactively scan materials and put them online in DRM-free formats, and openly publish online catalogs of their holdings, and even created English versions of their websites. They are responsive and modern...and so is their society.

- Hungary used to be a very open place to get records, but they are suddenly making noises about slamming shut the door on some records access in the near future, making some archivist/researcher friends I know who live there very nervous. Guess what's going on in Hungary's political climate lately. (Hint: fascism. Boo.)

- Romania used to be the absolute worst place to get records, or attempt to even find out what existed. But then a few years ago, a new national archives director was instated and he has thrown open the doors. It's now probably the best place to get copies of vital records (more than 100 years old) in Eastern Europe! You can request up to five record books per person per day at the 41 local archives, and no longer need to get sign-off from Bucharest first, and you can digitally photograph every damn page in those five books, no matter how big they are, for no extra charge! For his troubles, the new archives director has reportedly had to move his wife and children out of Bucharest to a safer city, because this new openness was causing threats to made against their lives. Did I mention that his Ph.D. background was in the anti-communist history of Romania? Not a coincidence he was picked to clean up their system, I suspect.

My point is that there is a definite correlation between access to a country's records and history, and their political transparency and climate. And as everyone here is noting, America's access to our own records, created and paid for with our own money in our own institutions, is waning faster and faster, all over the country...very disturbing.
posted by Asparagirl at 12:32 PM on June 8, 2012 [8 favorites]


I agree with every single thing you're saying a million times over, but I don't know about this. Isn't it the federal agencies that are fighting against the tide to keep these vital resources available, and the legislative branch, not the executive, that's pushing so insistently to slash them, on the pretext that they are a "waste" of taxpayer money because we really don't need to know how many flush toilets are in each American household?

Mmm, yes and no. It is Congress acting in the article listed in the FPP, for example, and in the case of the ACS, the Public Printer, and some other shenanigans. But it's definitely Census itself that's acting against saving, say, the Stat Ab (due to cuts stemming, again, from Congress). But then again the director of the Census Bureau was nominated by Obama. So was the head of the DOJ.

I sort of go back and forth on this: a lot of it is Republican-dominated Congress but I think it's important to examine the role of Democratic-appointees in this. I particularly called out Obama because I think a lot of tech folks remember his call for better government websites and haven't seen how it's petered out over the years (especially since Vivek Kundra resigned).

Oh, and while I'm multi-commenting in this thread: if you sneer at PACER, use RECAP (the US Courts would prefer if you didn't, though).
posted by librarylis at 1:21 PM on June 8, 2012


It is Congress acting in the article listed in the FPP, for example, and in the case of the ACS, the Public Printer, and some other shenanigans. But it's definitely Census itself that's acting against saving, say, the Stat Ab (due to cuts stemming, again, from Congress). But then again the director of the Census Bureau was nominated by Obama. So was the head of the DOJ.

I don't know all the ins and outs, but I'm not seeing how whether Obama or a Republican appointed the Census director enters into it, other than the fact that the Census director is getting pressure from the GOP in the House to eradicate the Stat Ab and other so-called superfluous Census "boondoggles."

It's all well and good to examine the role of Democratic appointees, but the main point I'm making, and a point I think can't be made strongly enough, is that's it's the GOP-dominated House beating the drum for austerity that's the proximate source of the pressure on the agency to do away with these "wasteful" products.

The GOP has been spoiling for a fight about the Census Bureau ever since Sarah Palin and Michele Bachmann inveighed in 2010 against the intrusive government bureaucrats who were Census workers doing their jobs and helping enforce the law. There were 700 incidents of violence against Census workers in 2010, up from 180 incidents in 2000, including a case in my state in which a census worker's car keys were snatched and he was threatened at the point of a blowtorch. Now the GOP has found a new line of attack. But it's the same underlying principle involved -- the Census is horrific government intrusion, no matter what. And the less we have of the Census the better.

I think these pressures would be perhaps less brought to bear in one-party government, but then again in one-party government there would probably be less resistance by the agencies to the cuts.
posted by blucevalo at 2:12 PM on June 8, 2012


I hadn't realized this desire to conceal government operations ran so deep, thanks Asparagirl, librarylis, etc. There isn't any requirement that newsfilter posts link to newspaper articles. May I request that librarian-like people post more about "baseline access" type threads in future?
posted by jeffburdges at 4:45 PM on June 8, 2012 [1 favorite]


Congressional data is also really important for the burgeoning research going on in the Natural Language Processing and Machine Learning communities on political data. My brother's doing his thesis right now in this field, using text mining techniques to analyze congressional speeches to measure influence and decision making.

Removing access will remove a lot of the aggregate data that is getting generated now, and in the future. Just like the power of data mining is letting the NSA and Facebook build highly detailed profiles of citizens and users, these same techniques will give us insight into congressional activity. Information that citizens need to make informed decisions.

Frankly, if I was a congressman I'd be scared too. It's only going to make even more blatantly obvious what we all already know: the system is corrupt. Talking informally with some researchers, some of them have found that measurable indicators and factors that correlate with known corruption.

The worst part is these bastards demand this level of quantification in measuring teacher ability, but they fight when we apply it to them.
posted by formless at 8:21 PM on June 8, 2012 [2 favorites]


counterspin it:
- the goverment is taking away your freedom of having goverment data access, next step is communism;
- you can't have governmenet data anymore, because china opposes having debt data revealed and that's how they will finally buy the country and make it a socialist paradise;
- when private companies are the only ones allowed to access-produce data, your bills will raise;

you catch the drift, hoepfully
posted by elpapacito at 7:36 PM on June 9, 2012


elpapacito, that's certainly well-meant, but you really, really don't want the loose cannons of the tea party on your side, especially leading them in that particular direction.
posted by odinsdream at 10:32 AM on June 10, 2012


« Older In 2005, Florida passed controversial "Stand Your ...  |  One man's quest to craft big w... Newer »


This thread has been archived and is closed to new comments