Federal News Network: No one really knows why
June 24, 2020 1:56 AM   Subscribe

This story is nuts: "A group of eight IRS employees led by Jian Wang developed a method for extracting logic from assembly code to convert it into a modern language. They managed to convert 90% of the IRS's Individual Master File code to Java. Until..." IRS programming mystery continues
posted by kliuless (82 comments total) 30 users marked this as a favorite
 
It does sound nuts, I can't imagine how something could rewrite assembler into Java, or at least human readable Java. I could see that it would emulate the original hardware, which would have some value to enable faster modern equipment to be used.
posted by inthe80s at 5:20 AM on June 24


Can I point out, as someone who does work inside similar walls, how good this reporting is? It's factual, not sensational. It doesn't make enormous to do about personalities. It's aware of institutional and demographic pressures, which are all too often the largest issues.

I don't know anything about the issues or facts of the matter here, but it is amazing to me to see a piece like this on a"boring" subject that is really one of the important pieces of infrastructure in the US.

Thanks for posting it.
posted by bonehead at 5:30 AM on June 24 [22 favorites]


I feel like this story pops up every few years, and every time it does, I'm more befuddled. Yes, converting nasty undocumented COBOL into modern code is challenging and tricky, and I don't begrudge anyone the difficulty of doing it. But. It's been TWENTY YEARS, and they've spent tens if not hundreds of millions of dollars trying to convert this monstrosity. At what point do you change strategies? Train an AI on it or something? I'm sure they've got a state-transition diagram or two to show for their 8-figure outlay; the complexity is in all the weird edge cases. Luckily, they also have a massive collection of data to train their AI/test their solution on: all the tax returns submitted for the last 40 years.

Hell, they could publish the source code for the Individual Master File, and some sanitized data for it, and let the open source community go nuts. Run it like Netflix did with their recommendation engine a few years back: best solution gets some prize money.

Unless there's an angle to the story that I'm missing entirely, this seems more like a case of stubborn bureaucrats who want to do things Their Way, and less like a case of software that is so complex that it can't be modernized.
posted by Mayor West at 5:58 AM on June 24 [7 favorites]


No one really knows why

Maybe not, but it's very easy to generate plausible guesses.

If the path of least resistance toward re-implementing the Individual Master File is an exercise in assembly code to Java translation, the inescapable conclusion is that the inscrutable assembly code in question is functioning as the de facto definition of tax law, regardless of the decisions assorted legislatures have made about it over the years. Converting it to something more readable seems highly likely to make all the discrepancies (and they will be there, you can bet on it) visible to all.

I can smell huge lawsuits here, as I'm sure can IRS management. This is a rock that I suspect nobody who knows anything actually wants turned over.
posted by flabdablet at 6:05 AM on June 24 [40 favorites]


But. It's been TWENTY YEARS, and they've spent tens if not hundreds of millions of dollars trying to convert this monstrosity

It's worth noting that one part of your two party system is deeply invested in the belief that government can never be modern or efficient, and has worked hard to make it that way.
posted by mhoye at 6:43 AM on June 24 [24 favorites]


It might also be that the conversion is pretty far from working. If it's "90% done" does that mean that the easy stuff is done and the hard stuff remains? If the point of the project was to make things maintainable and not have to rely on techies with arcane knowledge, and was pretty successful, how come it was derailed by one person leaving?
posted by TheophileEscargot at 6:57 AM on June 24 [12 favorites]


If there were discrepancies, then are they replicated in the forms and tables that are used when filing by hand? Is HRBlock running what the IMF runs in terms of calculations rather than what's on the books in terms of tax law?
posted by Slackermagee at 7:00 AM on June 24 [1 favorite]


I don't even understand why an engineer would consent to undertake the translation of machine code to a higher-level language. The software is supposed to be a reflection of the tax code, not vice versa. Just spend the time and money to port the tax code to a human-readable, machine-agnostic language. An AI solution wouldn't really work because AI's goal is to be probably right most of the time and I think our goal with respect to the application of laws should be a bit more stringent than that.
posted by klanawa at 7:12 AM on June 24 [20 favorites]


I don't even understand why an engineer would consent to undertake the translation of machine code to a higher-level language.

Because that is a super cool and fun project for an engineer to do. It has nothing to do with the actual utility of the finished product.
posted by grumpybear69 at 7:16 AM on June 24 [16 favorites]


I can't imagine how something could rewrite assembler into Java, or at least human readable Java

I've seen assembly automatically translated into C; the code is fugly. There are global variables for the registers and a byte array for the original machine's RAM, and more often than not, the code is one big function with labels and gotos. Presumably translating dinosaur machine language into Java would produce some similar monolith, though encapsulated in 1980s-style OO objects. The main gain would be not having to maintain an ancient mainframe, and perhaps in the fullness of time, search parties could venture into the code with machetes in hand and gradually map out functions to reimplement in a more modern fashion.
posted by acb at 7:17 AM on June 24 [4 favorites]


If it's "90% done" does that mean that the easy stuff is done and the hard stuff remains?

I sometimes forget that not everyone works in software, so they don't find things hilarious the way I do. "It's 90% done!" is such an old canard in development that your project manager will hit your knuckles with a ruler if you even say it ironically. I'm not sure which came first, the cliche or the 90-10 rule: the first 90% of the project takes 90% of the time. The last 10% takes the remaining 90% of the time.
posted by Mayor West at 7:18 AM on June 24 [37 favorites]


90% sounds about right for an incomplete, unusable project that's been going on for years. I don't mean that cynically, and I'm not implying that anybody there is working in bad faith. Technical projects always stack the easiest work first and the hardest work last, and almost always for pragmatic reasons: It's difficult to tell where the most intricate, difficult problems will arise until the big, obvious problems that are easier to knock down are cleared out of the way. If it's taken this long, it's because the long tail of minor features and bugs is pretty hellish.
posted by ardgedee at 7:19 AM on June 24 [6 favorites]


Because that is a super cool and fun project for an engineer to do.

Yeah, I was a 22 year old hot-shot once too. We all learn the hard way.
posted by klanawa at 7:25 AM on June 24 [22 favorites]


I can smell huge lawsuits here, as I'm sure can IRS management. This is a rock that I suspect nobody who knows anything actually wants turned over.

You could deal with that by running the new system in parallel for the statute of limitations. Gives you time to tweak it, and time for the suits you anticipate to time out.
posted by spacewrench at 7:33 AM on June 24 [4 favorites]


I don't even understand why an engineer would consent to undertake the translation of machine code to a higher-level language. The software is supposed to be a reflection of the tax code, not vice versa. Just spend the time and money to port the tax code to a human-readable, machine-agnostic language.

I wish more people would recognize "I don't understand the problem or the motivations behind it, but here's my simple and obvious solution" for what it is.
posted by mhoye at 7:33 AM on June 24 [24 favorites]


I'm not sure which came first, the cliche or the 90-10 rule: the first 90% of the project takes 90% of the time. The last 10% takes the remaining 90% of the time.

The variation I'm fondest of is that the first half of the job is about 90% of the work, the second half of the job is about 90% of the work and the third half of the job is about 90% of the work.
posted by mhoye at 7:38 AM on June 24 [34 favorites]


It does sound nuts, I can't imagine how something could rewrite assembler into Java, or at least human readable Java.

Looking at the patent, it does not attempt to produce human readable Java. It just attempts to reproduce the logic in Java.

the 90-10 rule: the first 90% of the project takes 90% of the time. The last 10% takes the remaining 90% of the time.

The other part of that rule is that the engineers often are talking in one unit and the management is talking in a different unit. In this case the engineers are talking in lines of code converted. 90% of the code is converted. The other 10% can be found by looking at the patent, page 3 figure 2. Notice the path "complex branching -> manual conversion" The last 10% of code conversion is going to take quite a bit of time.
posted by bdc34 at 7:43 AM on June 24 [4 favorites]


Why are the IRS filing patents anyway? Does that align with their mission in some way?
posted by simonw at 7:52 AM on June 24


You could deal with that by running the new system in parallel for the statute of limitations.

Making the new system do exactly what the old one does to within the limits of anybody's ability to prove otherwise is of course completely feasible, requiring only a huge suite of test cases that I'm sure there's more than enough historical data to generate.

My point is that once you have a new system that does exactly what the old one does, written in a language that is essentially comprehensible, there will be plentiful opportunities to look at what it's actually doing - and therefore, demonstrably, what the inscrutable 1:4:9 black monolith it replaced has been doing for the last fifty years - and spot all the places where it's been doing things that, according to any reasonable interpretation of applicable tax law, it shouldn't have been.

Reparations for that could plausibly cost far more than the inefficiency losses involved in keeping the creaking old beast going for another year and another and another; plausibly enough that I would expect sufficient management fear about that possibility to perpetuate the quiet scuppering of anything that might start to feel like an actual cutover.

Anybody senior enough to be at risk of overseeing such a thing will be well familiar with the Six Phases of a Big Project.
posted by flabdablet at 8:01 AM on June 24 [8 favorites]


the first half of the job is about 90% of the work, the second half of the job is about 90% of the work and the third half of the job is about 90% of the work

Yes, I've certainly dealt with projects estimated as insanely optimistically as that.
posted by flabdablet at 8:04 AM on June 24 [9 favorites]


The software is supposed to be a reflection of the tax code, not vice versa

Aye, there's the rub.
posted by flabdablet at 8:10 AM on June 24


Wasn't it Frederick Brooks' contention that all estimates are insanely optimistic, there being no space between over-optimism and a fatalistic it'll-be-done-when-it's-done despair?
posted by acb at 8:10 AM on June 24 [1 favorite]


"... and spot all the places where it's been doing things that, according to any reasonable interpretation of applicable tax law, it shouldn't have been."

Unexpected Professions of the Future #1: forensic software archeologist
posted by thatwhichfalls at 8:13 AM on June 24 [13 favorites]


Java though :grimace emoji:
posted by GallonOfAlan at 8:17 AM on June 24 [1 favorite]


There are a lot of wrong-headed comments in this thread.

@inthe80s says "I could see that it would emulate the original hardware, which would have some value to enable faster modern equipment to be used." This is at least 20 years behind the times. Old mainframe systems are already running on emulators: nobody is keeping 1970s mainframes (or minicomputers for that matter) on life-support to run mission-critical systems. Do you know that you can download an open-source emulator that can run IBM S/370 (and 390 and Z series) for free? IDK anything about IRS's operations, but I'll bet you a steak dinner that whatever hardware that ASM was originally targeting is now running as emulation on (probably) a Linux system.

I'm pretty sure @Mayor West doesn't know anything about large software projects. Yes, there are multiple angles that they're missing entirely. It's not "stubborn bureaucrats" so much as the consequence of bureaucratic self-mutilation, which I'll get to in conclusion.

@flabdablet brings the conspiracy theories with "This is a rock that I suspect nobody who knows anything actually wants turned over."

@mhoye finally touches on the real problem with "It's worth noting that one part of your two party system is deeply invested in the belief that government can never be modern or efficient, and has worked hard to make it that way." The discussion improves from there, with the remarks about the 90/10 rule, the coolness of hard projects, and @spacewrench's observation about running the legacy system in parallel with a replacement... doing that "until the statute of limitations runs out" is a nice touch.

The real problem that this story illustrates is that the Federal Government has intentionally stripped itself of the technical expertise required to do the things it's required to do by law. Back when the IRS system was built, all that ASM code was written by civil-servant programmers who spent their entire careers immersed in the project, with institutional access to all of the relevant domain-specific knowledge. The same is true of the FAA's National Airspace System and the VA's VistA electronic medical records system. I have some familiarity with both of the latter, and have been a participant in an effort by private contractors to replace the NAS. I'm currently watching with interest the VA's attempt to replace VistA with off-the-shelf commercial software (spoiler: I expect this effort to fall flat on its ass, after exceeding its initial budget by a factor of two).

To build and maintain the kind of systems the Federal Government needs requires that the relevant agencies have in-house, Federal civil servant expertise. You can't parachute in a bunch of people who know nothing about the problem domain (and even a successful commercial vendor with related expertise, like the vendor currently working with VA, doesn't have that) and expect them to produce a system that works. We have seen this over and over and over with the NAS, for which there have been no fewer than three multi-year efforts to replace the IBM S/360 programs that make up its guts. I was involved in the 1990s in the AAS project that was the second attempt to build a replacement: that effort was supposed to take 3 years and cost about $2 billion in 1990 dollars. After 4 years and almost $4B, FAA pulled the plug and got exactly 1 piece of usable technology out of it, a serial communications device worth about $1M in engineering costs (air-traffic radars use a weird serial communications protocol that predates modern notions of 8- and 16-bit data boundaries... the relevant bit of kit is an adapter that makes it easier to hook radars to modern computers).

The real problem is that Americans don't pay nearly enough in taxes or employ nearly enough civil servants. Private contractors simply do not have what it takes to do the job, when the job is building something like a large national air-traffic control system or a national tax-collection system. I have spent most of a 35-year career in the private sector doing things that in a sane world would have been the job of civil servants. The Reagan/Bush I-era cuts to civil service were bad enough, but the Clinton administration's attempt to "re-invent Government" by having private contractors do the work of Federal employees was the larger disaster. Yes they cut Federal headcount, but it turned out that almost all of those people were doing things that needed doing, and by hiring contractors the price of getting it done was multiplied by a factor of at least two: between paying the salaries of private-sector programmers and paying for the profits of their employers, the Feds have wasted an incredible amount of money on people who can't get the job done.
posted by Aardvark Cheeselog at 8:21 AM on June 24 [84 favorites]


Unexpected Professions of the Future #1: forensic software archeologist

This is a thing in Vinge's stuff.
posted by GCU Sweet and Full of Grace at 8:25 AM on June 24 [9 favorites]


Aside from the howls of rage I would expect to come from the Right and Tax Software Companies and probably other folks I'm not thinking of, what would be the problem with starting from scratch? I mean, like pair it with some sort of simplification of our byzantine tax codes and a debt jubilee or something. It's most definitely naive on my part, but it seems like just trashing it and starting over at least means you're gonna put some thought into documentation and future-proofing and having some time to do it rather than waiting for the inevitable break in the system. Or are we expecting these cobol mainframes to just last forever?
posted by snwod at 8:26 AM on June 24 [4 favorites]


To build and maintain the kind of systems the Federal Government needs requires that the relevant agencies have in-house, Federal civil servant expertise. You can't parachute in a bunch of people who know nothing about the problem domain (and even a successful commercial vendor with related expertise, like the vendor currently working with VA, doesn't have that) and expect them to produce a system that works.

Quite so. And people who understand large systems have been saying that very thing for decades, and for decades have been ignored by the kind of ideologically driven bean counters who really do seem to believe that cutting $X off a salaries budget in order to spend $5X on contractors is a "saving" because "competition" and "private sector efficiency".

This has been happening at scale since approximately the mid to late eighties, and I would be astonished to learn that the consequences for the operation of the IRS's current assembly language monster have been negligible.

I have very little doubt that, as originally implemented by long term IRS employees, it once was an accurate reflection of the applicable tax law. The chances that it remains so after decades of contractor patching strike me as really very low, and the probable desire of those who made the decisions that led to this state of affairs to be held accountable for it strikes me as similarly low.

It's not a conspiracy if it's been happening in plain sight.
posted by flabdablet at 8:35 AM on June 24 [9 favorites]



It's most definitely naive on my part, but it seems like just trashing it and starting over at least means you're gonna put some thought into documentation and future-proofing and having some time to do it rather than waiting for the inevitable break in the system. Or are we expecting these cobol mainframes to just last forever?


You would be surprised how much work is still done with mainframes with Cobol. Both were designed with these kinds of business uses in mind. The performance is still very good and conversion costs are very high.

Also, large mature code bases must deal with an amazing amount of edge cases, hardware quirks, and just stuff people have forgotten. Starting from scratch would only work if as you said the tax system was rewritten.
posted by KaizenSoze at 8:39 AM on June 24 [5 favorites]


The author points out that "I last wrote about this technology two years ago..." and there are interesting details there:
Wang explained his solution to assembler conversion to me in some detail. It proceeds from the fact that “in theory, there’s no way to translate assembler code. They way it runs is not how it reads.” Indeed, because it is so tightly coupled to machine instruction sets, assembler looks totally cryptic to 21st century programmers.

Wang and his team nonetheless developed a logical translation component, a “technical rule language” that acts as an intermediate stage to retain the logic withdrawn from the assembler, and a data extractor. By separating out the data, Wang says it was possible to trace the assembler logic flows, then abstract it into structured code in the technical rule language. He says testing proved the three parts could result in a Java program that accurately reproduces what the assembler code does. He said this was proven using production-sized data sets.
You also get a suggestion about how Congress encourages the use of contractors:
Wang was working under streamlined critical pay authority the agency has had since its landmark 1998 restructuring. It gave the IRS 40 slots under which it could pay temporary, full-time employees higher than GS rates. Former Commissioner John Koskinen pointed out Congress did not re-up this authority in 2013, despite his entreaties to former Congressman Jason Chaffetz’s Committee on Oversight and Government Reform.
posted by Western Infidels at 8:41 AM on June 24 [4 favorites]


are we expecting these cobol mainframes to just last forever?

Unlike humans, there is no insuperable technical impediment to uploading large COBOL mainframes to immortality in an emulated universe.
posted by flabdablet at 8:42 AM on June 24 [10 favorites]


@inthe80s says "I could see that it would emulate the original hardware, which would have some value to enable faster modern equipment to be used." This is at least 20 years behind the times. Old mainframe systems are already running on emulators: nobody is keeping 1970s mainframes (or minicomputers for that matter) on life-support to run mission-critical systems. Do you know that you can download an open-source emulator that can run IBM S/370 (and 390 and Z series) for free? IDK anything about IRS's operations, but I'll bet you a steak dinner that whatever hardware that ASM was originally targeting is now running as emulation on (probably) a Linux system

Is it not more likely that they are running on a modern Z/Architecture machine? If you buy a brand new IBM mainframe today it can absolutely run original IBM S/360 code. In fact, that's basically how they sell them. It's obviously very attractive to be able to keep running all your old systems that you know work.

Obviously they are not actually running this software on 1970s hardware!

The real problem that this story illustrates is that the Federal Government has intentionally stripped itself of the technical expertise required to do the things it's required to do by law. Back when the IRS system was built, all that ASM code was written by civil-servant programmers who spent their entire careers immersed in the project, with institutional access to all of the relevant domain-specific knowledge.

Indeed. As stated in the second link, their special authority to pay people more than GS-15 wasn't renewed so all these people left to go work elsewhere. Oops.

Lots of agencies do not use the GS schedule and have their own pay banding. The SEC, the OCC, the foreign service, all have their own systems. Not the IRS though.

Notably, total budgets are much less constrained than the payscale itself. It is therefore much easier to use a combination of undercompensated (and therefore often regrettably mediocre) in-house staff and extortionate contractors than it is to just keep a small core team of software developers permanently working on a project. This does not lead to good results.
posted by atrazine at 9:04 AM on June 24 [9 favorites]


If you buy a brand new IBM mainframe today it can absolutely run original IBM S/360 code. In fact, that's basically how they sell them.

Folks who think Microsoft and Intel invented this kind of obsessive dedication to backward compatibility could certainly stand to read a bit more tech history.

What's kind of annoying is that the S/360 architecture is nicer than x86 in all kinds of ways despite being decades older, and yet it's x86 that's achieved world domination because of a relatively small number of years during which S/360 was just too big to fit in a microprocessor.
posted by flabdablet at 9:13 AM on June 24 [4 favorites]


Two thoughts:
(*) unless it's processing English-language tax code (and court decisions) into usable contemporary 'business logic', it's a band-aid on a 20- to 40-year-old problem; and
(*) there's that phrase about the funding and management of projects like this: 'this is why you don't get nice computers'.
posted by k3ninho at 9:21 AM on June 24 [2 favorites]


but I'll bet you a steak dinner that whatever hardware that ASM was originally targeting is now running as emulation on (probably) a Linux system

I can hear the real workerbees in the back room: Hey Harry did ya download that emulator?

Sure, last week.

Can we use your phone to run this years tax? You can use my old blackberry to make those tweets of yours.
posted by sammyo at 9:33 AM on June 24 [1 favorite]


 Just spend the time and money to port the tax code to a human-readable, machine-agnostic language

But it's already written in COBOL!
posted by scruss at 9:35 AM on June 24 [7 favorites]


Back in engineering school when I still knew everything, I refused to have anything to do with COBOL because I was offended by the idea of a compiler emitting code that actually used strings of numeric digits as the internal numerics format for all its arithmetic. FORTRAN I could get behind, but not COBOL. Long-ass source code is bad enough, but long-ass object code as well? Bridge too far.

I would be a lot wealthier now if I'd been able to get past that then.

That said, business rules implemented in COBOL are a hell of a lot more readily auditable than the same rules implemented in S/360 assembler.
posted by flabdablet at 9:44 AM on June 24 [4 favorites]


I feel like Gall's Law applies here.

"A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work. You have to start over, beginning with a working simple system."
posted by Citrus at 9:54 AM on June 24 [13 favorites]


If it's "90% done" does that mean that the easy stuff is done and the hard stuff remains?

It's possible that this "90%" might even be worst than nothing, as in there might be 150% of the project left to go, because, years and millions of dollars might have to be spent to unfuck what's been done. But who knows, the "reporting" on this has literally no technical details.

And, as per usual when these types of discussions come up on The Blue, many of the "geez, why don't they just...." suggestions would make whomever pulled if off the wealthiest person humanity has ever seen.

Also, looking at the top end of the GS15 scale for Virginia, he could double his total comp as even just a senior individual contributor at any number of private companies. Guessing at where he'd slot in at place like mine, it'd be more like 4X-5X.

Aside from the howls of rage I would expect to come from the Right and Tax Software Companies and probably other folks I'm not thinking of, what would be the problem with starting from scratch?

lololol. That's even worse! Just set fire to the good part of a trillion dollars on the first day and just save everyone the trouble!
posted by sideshow at 10:00 AM on June 24 [8 favorites]


Actually, I somehow missed the patents. I'll check them out.

But, as someone who spends a decent amount of every workday pulling apart other people's Java code, I can't imagine how this wouldn't be a disaster. Even if you converted everything to Java, now you just a have a black box in a different language.
posted by sideshow at 10:14 AM on June 24


Starting from scratch would only work if as you said the tax system was rewritten.

It's not just tax law. The IRS regs are way more specific on how things are handled, but also mutable based on private letter rulings and court cases. Then I'd imagine you have a lot of special cases on what should be flagged for audit, and the things a programmer thought should be flagged but then turned around and squelched when testing data showed everyone would be flagged.

It may surprise you to know too that tax practitioners are quite keen on stuffing data into the wrong part of the tax return despite no actual justification other than having gotten away with it for thirty years. As well the software for return prep is confusing. MACRS straight line depreciation in CCH is coded as MSL but then shows up on the printed return as S/L. If you code it as SL in the software you get a very slightly different result based on pre-MACRS depreciation. You will have to tattoo that special case on your staff's hands to get them to understand. I don't even want to get into all the wrong things I've seen from accountants with decades more experience than I have. I make mistakes too, but I try not to state anything as definitive unless I can cite a source.

I'm going to guess there are something like 4 billion errors in tax returns every year that the IRS should ignore because they don't substantially change the tax liability. Any system that hasn't coded in those exceptions is going to blow up the IRS. Not sure how much of that is in the software we're discussing versus some other piece of the whole system.
posted by BrotherCaine at 10:41 AM on June 24 [9 favorites]


As for extracting biz logic from legacy systems, someone probably thought there were easy off the shelf tools for doing it from java that weren't available for that assembly language. Even if you never run the java code it might be a useful map to understanding the assembly.

For starting from scratch, my guess is that it might be best to make a cutoff and handle simple returns in a new system and complex returns in the old. Then gradually build up the new system to handle returns with real estate or partnership k-1s. How you communicate things between those systems would then be the nightmare.
posted by BrotherCaine at 10:50 AM on June 24 [5 favorites]


I really think this is a political and budget problem though.
posted by BrotherCaine at 10:52 AM on June 24


All taxes can be done as lookup tables. Don’t use AI, just calculate all possible scenarios and create lookup tables for those scenarios. Then use Java to implement the lookup functionality.

Provide the tax forms as lookup tables, too. It would simply the process. GNU grep author Mike Haertel said it best here.
posted by metasunday at 10:58 AM on June 24 [1 favorite]


All possible scenarios?

You could turn the entire mass of the universe into hard drives and not have enough storage space for all of those tables.

And I'm not exaggerating either.
posted by Hatashran at 11:10 AM on June 24 [11 favorites]


All taxes can be done as lookup tables.

{{Citation needed}}
posted by flabdablet at 11:20 AM on June 24 [6 favorites]


See, this is all one more reason we need a flat tax!


(ducks)
posted by nickmark at 12:07 PM on June 24


I'm going to guess there are something like 4 billion errors in tax returns every year that the IRS should ignore because they don't substantially change the tax liability.

That's about 15 errors per return, and about 90% of household filers take the standard deduction, meaning their tax returns aren't that complicated so 15 errors has to be an outlier based on (I guess) business and corporate tax returns. Which means that the logic between individual and corporate taxes should be split more.

I work on mega corp tax software projects occasionally, and did a COBOL to C++/Java conversion about 10 years ago that included taxing; it wasn't the most difficult or expensive project I've ever worked on. I'm not discounting the difficulties of taxes either - as we get daily files which contain tax rate changes for cities/counties/states/federal/about 65 international jurisdictions that we work in. But it's not rocket science, it's just work.
posted by The_Vegetables at 12:19 PM on June 24


I studied computer science at the very last year when COBOL was a graduation requirement. ANd also the last year when assembly language was taught in the IBM 370 dialect. A professor at my school developed an emulator to stop us from running code on the same machine that was running the course registration software.

One interesting thing about 360/370 assembler is that it was designed to be written and read by humans. And, while I was taking COBOL at the same time, you could really see how the individual COBOL statements would logically translate themselves in to lines of that particular assembly, compilers were a lot more straightforward back then, as Grace Hopper said, it was the first one. All that is to say, it is not as wild a prospect as you might think to be able to translate that logic in to a higher level language.
posted by Space Coyote at 12:27 PM on June 24 [7 favorites]


Lookup tables, jump tables. I've never yet met an assembler coder that met any they didn't love.

"Knock, knock"
"Who's there?"
"Algorithm"
"Algorithm who?"
"Yeah, that's what I figured"
posted by Chitownfats at 12:53 PM on June 24


I'm currently watching with interest the VA's attempt to replace VistA with off-the-shelf commercial software (spoiler: I expect this effort to fall flat on its ass, after exceeding its initial budget by a factor of two).

This is at the very least their second attempt--I worked at the VHA back in the early-to-mid-2000s, and they spent years attempting to replace their VistA MUMPS-based system with a more modern (at the time) Java and Oracle architecture.

I spent around three years there and when I left, not a single working part of the project plan had been accomplished. I think they were still doing requirements gathering. I found out maybe a year later from someone who stayed on that they finally canceled the entire thing. No idea if they ever tried it again.
posted by Mr. Bad Example at 1:10 PM on June 24 [1 favorite]


One thing to look forward to is in 30 years they'll be trying to find retired Java programmers to translate this stuff into something more modern and less clunky.
posted by MtDewd at 1:18 PM on June 24 [10 favorites]


What's with the patent filing? I thought that all work that the government did with public money was public domain. Even if you had an ancient view of security and wanted to keep your source private, why on earth are they filing for an exclusive monopoly on these ideas? No wonder the director didn't want to fund the $250 filing fee on the other patents- this seems way out of bounds. It's odd enough that it makes me question the rest of the story.
posted by jenkinsEar at 2:15 PM on June 24 [1 favorite]


The US government holds tens of thousands of patents, and sometimes even litigates them. Thousands of these patents (although probably not any of the IRS ones) are also kept secret.

Whether any of this conforms to the constitutional purpose of the patent system is, as they say, left as an exercise for the reader.
posted by Not A Thing at 2:27 PM on June 24 [5 favorites]


Is it not more likely that they are running on a modern Z/Architecture machine? If you buy a brand new IBM mainframe today it can absolutely run original IBM S/360 code. In fact, that's basically how they sell them. It's obviously very attractive to be able to keep running all your old systems that you know work.

They're almost certainly running on current hardware. Newer Z mainframes can also run multiple Linux VMs (or even containers via Docker!) along with native mainframe code. And it's not as if COBOL is a dead language. COBOL 6.1 was just released in 2017.
posted by Uncle Ira at 2:28 PM on June 24 [4 favorites]


I spent around three years there and when I left, not a single working part of the project plan had been accomplished. I think they were still doing requirements gathering.

The result of requirements gathering phase 1, for any system on this kind of scale, is always going to boil down to "make the new one do just what the old one is already doing, only better." Everybody who uses the old system is going to be a fountain of knowledge about things it does wrong, but because these systems always diverge from their original missions over time, the chance of finding anybody - or even any reasonably tractable collection of anybodys - who knows what it actually does right is always remarkably close to zero.

And it's not a "simple" matter of starting from the legal requirements and working forward. If writing code that deals moderately sanely with countless edge cases unanticipated by regulators or glossed over with "ministerial discretion" and the like wasn't ridiculously harder than writing the regulations that give rise to them, we wouldn't need programmers.
posted by flabdablet at 2:32 PM on June 24 [4 favorites]


That's about 15 errors per return, and about 90% of household filers take the standard deduction, meaning their tax returns aren't that complicated so 15 errors has to be an outlier based on (I guess) business and corporate tax returns. Which means that the logic between individual and corporate taxes should be split more.

If I put all the stock transactions into one of my past clients returns instead of summarizing it would run to 1500 pages. Between farms, multi-state oil & gas partnerships, disregarded entities, partnership interests, real estate, schedule C businesses, parsonage allowances, RSUs with no clear valuation, FBAR & FinCEN reporting, and doma era registered domestic partnerships, individual returns can be sickeningly complex before you even get into foreign tax treaties. Individual returns should be easy, but H&R Block and others have been lobbying for additional complexity for a long, long time.

A lot of the time things come up where there's not even a clear answer on how to handle it. When you ask people who write textbooks questions and they shrug you have to do a lot of research you'll never get to bill enough to cover.
posted by BrotherCaine at 3:15 PM on June 24 [2 favorites]


My first silicon valley job was automated Cobol code remediation for Y2K that was supposed to eventually get spun into a more general case business rule extraction. Almost no one in the financial industry wanted their system handled by anyone not in-house and that was definitely the right call.
posted by BrotherCaine at 3:24 PM on June 24 [3 favorites]


I am a professional coder who writes Java every day. I make a good living but nowhere near what I could make if I wanted to, because I work in a small shop rather than in silicon valley. The idea that the government could pay me enough to implement tax code is risible. The reason civil servant programmers could exist and implement the system being translated is an artifact of that historical era; modern pay scales for programmers means you'd have to pay so much more than they're capable of doing. The real problem is political will, not technical capability. There's no space race for implementing tax codes, and even NASA's operating at a fraction of their Apollo era budget these days.

I'd also suggest that the contractors who've been hired to do this probably never really intended to succeed, deep in their dark little hearts. They're just in it for the paycheck.
posted by axiom at 3:36 PM on June 24 [4 favorites]


Programmers being paid a lot of money is also a historical anomaly; gradually, much of coding will be automated and commodified, to the point where increasingly large proportions of programs can be generated without knowing anything about big-O notation or concurrency abstractions. What remains will be either highly specialised, alongside microcode engineers and the like, or a trade much like graphic design or carpentry. By then, the IRS of the era will surely be able to hire enough programmers to maintain their codebase and tend to the needs of the automated systems that convert specifications to logic, and possibly one or two highly-paid forensic archaeologists for the occasions they need to delve into century-old machine code to figure out exactly what it does, more often than not as evidence in a civil court case.
posted by acb at 4:26 PM on June 24


I see that sentiment often, but I think it misses the mark simply because programmers are simultaneously both very good at automating their own jobs, and also at creating more work for themselves. Assembly, and compilers, and higher level languages are all code automation, and each has been accompanied not by a reduction in the need for skilled programmers but rather an expansion of it. Whenever programmers see that languages are getting too simple and powerful they come up with something like Rust to ensure their own continued employment.
posted by Pyry at 5:30 PM on June 24 [6 favorites]


acb
I don't know if you've spent your career working in the software industry but what I've learned is that any time a system for developing software becomes more high-level, presumably to implement more with less code and time, the requirements will expand to fill up all available time and budget. The vast majority of current software is implemented by people who are barely aware of the existence of big-O notation and have never worked with concurrency abstractions (javascript). Every few years there's some sort of language or framework or new technology that "will make programmers obsolete". Not a single one of them has made much of a change to the landscape. ML and what we're calling "AI" today is, at its heart, another abstraction tool and will be used to make incredible things, but still will not fill the bottomless well of need.
posted by WaylandSmith at 5:50 PM on June 24 [9 favorites]


The main gain would be not having to maintain an ancient mainframe

This has never been necessary. IBM will happily sell you a modern machine that will run your old code and data without modification. These systems even have tools to help migrate different pieces to more modern languages without having to do the entire project at once.

You see, IBM's mainframes and midrange systems have been (in a sense) virtualized since the 1970s. The physical architecture has changed dramatically, but the runtime environment can be exactly the same as it has been since the System/360 was new.
posted by wierdo at 6:04 PM on June 24 [2 favorites]


All taxes can be done as lookup tables.
{{Citation needed}}
posted by flabdablet at 1:20 PM

Are taxes Turing Computable, tonight at 9:00.
posted by symbioid at 6:11 PM on June 24 [1 favorite]


All taxes can be done as lookup tables.

Well, you're right, they should be, but there are many instances in the current tax code where that is impossible. For example Form A affects the result of Form B and Form B affects the result of Form A. This results in a circular dependency that has no closed form solution and requires iteration to asymptotically approach a result with a difference of less than one dollar.

One common case is the interaction between calculation of ACA health insurance subsidies and the deduction of health insurance premiums.
posted by JackFlash at 6:32 PM on June 24 [5 favorites]


Programmers being paid a lot of money is also a historical anomaly

It is a 'historical anomaly' because the first programmers were women and were drastically underpaid for the skills needed to do the job. People expressing this sentiment will go on to say 'eventually a business analyst will be able to describe the precise requirements for what he computer should do, and the computer can figure out how to do it. Which is.. programming. They are describing programming.
posted by Space Coyote at 7:19 PM on June 24 [12 favorites]


I don’t know from taxes and I don’t know from software or coding. But I do know from Medicare/Medicaid, and I know about “fiscal conservatives” and the GOP and the pro-Wall Street Democrats.

Anything that makes the civil service less good at doing the job of collecting taxes and distributing benefits is not a bug. It is a feature. That’s not going to change unless there is a substantial shift in power and ideology in both parties, and a turn to government as a public good. When the loopholes get closed and the regulations get simplified and the wealthy get the snot taxed out of them, I bet the software is going to get a whole lot better.
posted by skookumsaurus rex at 7:21 PM on June 24 [1 favorite]


My point is that once you have a new system that does exactly what the old one does, written in a language that is essentially comprehensible, there will be plentiful opportunities to look at what it's actually doing - and therefore, demonstrably, what the inscrutable 1:4:9 black monolith it replaced has been doing for the last fifty years - and spot all the places where it's been doing things that, according to any reasonable interpretation of applicable tax law, it shouldn't have been.

That's not how taxes work, though. It's not as simple as feed some numbers into the black box and whatever it spits out is what you owe in taxes. For one, every taxpayer also does their own computation of what they owe in taxes based on the tax laws, and if there's a discrepancy between what the IRS thinks you owe in tax and what you think you owe in tax, you hash it out.

This means that if the system was doing something it shouldn't have been doing according to the tax laws, it would be ferreted out pretty quickly because the IRS would arrive at a different answer than the taxpayer regarding the tax that is owed.

Also, when the IRS goes to court, it can't just say you owe $X in tax because that's what the black box says. It has to show how it arrived at that answer and that it's in accordance with the tax laws
posted by ultraviolet catastrophe at 7:57 PM on June 24 [2 favorites]


Plan for VistA seems to be replacing functions one at a time rather than replacing the whole thing. So now I get to use a clunky web interface for timecards and leave requests, but still log in to VistA to approve financial transactions. They finally gave us a web-based room scheduling system too, instead of VistA (it’s sharepoint based and kind of awful, yet better than VistA). It’s still needed for health records but that’s supposed to be transitioning to electronic health records elsewhere in the near future.

So much for the VA, tell me more about the IRS...
posted by caution live frogs at 8:34 PM on June 24


when the IRS goes to court, it can't just say you owe $X in tax because that's what the black box says. It has to show how it arrived at that answer and that it's in accordance with the tax laws

But when it doesn't go to court, which is the overwhelmingly general case, it doesn't.

My point is that it seems to me that there's a fairly high likelihood that overhauling the black box provides an opportunity for somebody skilled in the art of preparing class actions to identify many, many millions of cases in which the IRS would have lost if it had needed to defend its rulings in court.
posted by flabdablet at 11:00 PM on June 24 [1 favorite]


By then, the IRS of the era will surely be able to hire enough programmers to maintain their codebase and tend to the needs of the automated systems that convert specifications to logic

Two dots to join:

The Last One (spoiler: not the last one)
Freeway congestion
posted by flabdablet at 11:04 PM on June 24


I am a professional coder who writes Java every day. I make a good living but nowhere near what I could make if I wanted to, because I work in a small shop rather than in silicon valley. The idea that the government could pay me enough to implement tax code is risible. The reason civil servant programmers could exist and implement the system being translated is an artifact of that historical era; modern pay scales for programmers means you'd have to pay so much more than they're capable of doing. The real problem is political will, not technical capability. There's no space race for implementing tax codes, and even NASA's operating at a fraction of their Apollo era budget these days.

The issue is that the United States has decided that it will not pay long term civil servants competitively. Temporary positions, yes. Contractors, yes. Career people, no.

There is no reason why either government pay or technical competence would be less than non-government. This is a political choice.

The playbook is simple: by keeping pay and conditions where they are, they make the government less competent. That being the case, it is easy to justify not paying them more because they're "incompetent".

In some countries, civil servants are highly compensated and highly regarded and recruiters have their pick of candidates.
posted by atrazine at 12:58 AM on June 25 [10 favorites]


There is no reason why either government pay or technical competence would be less than non-government. This is a political choice.

Not to gild the lily, but you've already found the reason. It's political, we agree. That's not to say that if someone with a political mandate to fix the IRS' computational systems were put in charge tomorrow could fix it either, because it turns out that even under favorable circumstances this stuff is hard, but also hi this is not remotely the position we find ourselves in. Regulatory capture is real. Republicans are real and it turns out control at least 2 branches of our government. Some countries might do better (I think, though citation needed) but this one that I live in certainly isn't one of them. Luckily all of this is sort of academic because climate change will likely kill us all before the vagaries of the tax code and its translation to machine language become a bigger problem.

Why yes, I am told I'm fun at parties.
posted by axiom at 1:12 AM on June 25 [2 favorites]


2/3 of the work abandoned? Sounds
posted by Nanukthedog at 6:47 AM on June 25 [3 favorites]


two highly-paid forensic archaeologists for the occasions they need to delve into century-old machine code to figure out exactly what it does, more often than not as evidence in a civil court case.

'Forensic programmer' is already a job, it's not a job of the future.

The idea that the government could pay me enough to implement tax code is risible

This seems like an odd comment. I mean, the median salary of 'programmer' is like $50-100k in the US, which is basically the median salary range of every city in the US. Ok, so the low end of that is people fresh out of school, but still boat loads of them exist. I mean, only the top 15% or so of households even earn $100k, and that often includes 2 earners. And not every one of those people is a programmer. My point is that 'programmer' doesn't automatically mean a super highly paid person, outside of a few locations, and those people are writing code way more complex than tax software.


when the IRS goes to court, it can't just say you owe $X in tax because that's what the black box says. It has to show how it arrived at that answer and that it's in accordance with the tax laws
Maybe when they go to court they say this, but the letter you get first (way before you go to court) is a bunch of nonsense with a vague code useless to anyone (barely even googleable) and vague statements that you did your taxes wrong and telling you how much more you owe (if you owe, I'm assuming they also send letter saying you paid too much). And then Step 2 is go to an IRS office, where they also don't have people capable of calculating taxes correctly or able to tell you what the problem is. They do them by hand in this office, not using software at all.
Step 3 is go back to your accountant, where they recheck and resend to the IRS, and usually it disappears.

Step 4 is court.
posted by The_Vegetables at 8:03 AM on June 25 [2 favorites]


It's genius, they are letting the patent expire so they can't be accused of stopping private enterprise from finding more ways to make Americans pay money to pay taxes.
posted by parmanparman at 11:09 AM on June 25


This seems like an odd comment. I mean, the median salary of 'programmer' is like $50-100k in the US

I was talking less about compensation (though yeah, of course that's part of it) than I was that this sounds like an extremely boring and irritating task. You make it sound like tax software is pretty easy ("[other] people are writing code way more complex than tax software") but I doubt that's an accurate depiction of the problem, because then they'd just re-implement the IMF from first principles rather than translate the extant assembly, but they've gone the latter route after multiple attempts to use contractors to tackle what is apparently quite a byzantine piece of code.
posted by axiom at 11:23 AM on June 25


My point is that it seems to me that there's a fairly high likelihood that overhauling the black box provides an opportunity for somebody skilled in the art of preparing class actions to identify many, many millions of cases in which the IRS would have lost if it had needed to defend its rulings in court.

My point is that there's a very, very low likelihood of this happening. If there were that many cases in which the IRS was miscalculating people's taxes, it would be noticed by the IRS or by taxpayers.
posted by ultraviolet catastrophe at 11:52 AM on June 25


This seems like an odd comment. I mean, the median salary of 'programmer' is like $50-100k in the US

When you look at programmers with 10+ years of domain expertise and/or who also have a CPA or have practiced tax law, I imagine it's hard to find anyone who can't easily make more than 150,000/year in the private sector. Not to say you can't find someone, but I'd think of it as one of the hardest programming jobs out there, not the easiest. Too much detail, not enough that can be abstracted.
posted by BrotherCaine at 2:01 PM on June 25 [1 favorite]


I can see the case that anything major that caused substantial and frequent overcalculations of taxes / false positives for violations would have been caught (and presumably patched) long since. But is there any corresponding mechanism that would catch undercalculations and false negatives?

Seems like a zero-day exploit of the Individual Master File, one that e.g. allows large underpayments to slip by undetected or generates refunds the taxpayer is not entitled to, could be worth an awful lot on the open market... Is there some process that would eventually catch such errors?
posted by Not A Thing at 8:57 PM on June 25


Is there some process that would eventually catch such errors?

Yes. The IRS is subject to routine audits on pretty much any topic you can think of. Also, people regularly challenge their tax bills, even when they receive larger refunds than expected. In such cases, the IRS rechecks its math, so they would notice if there were some kind of systemic undercalculation of taxes.
posted by ultraviolet catastrophe at 8:14 AM on June 26 [1 favorite]


> The real problem is that Americans don't pay nearly enough in taxes or employ nearly enough civil servants. Private contractors simply do not have what it takes to do the job, when the job is building something like a large national air-traffic control system or a national tax-collection system. I have spent most of a 35-year career in the private sector doing things that in a sane world would have been the job of civil servants. The Reagan/Bush I-era cuts to civil service were bad enough, but the Clinton administration's attempt to "re-invent Government" by having private contractors do the work of Federal employees was the larger disaster. Yes they cut Federal headcount, but it turned out that almost all of those people were doing things that needed doing, and by hiring contractors the price of getting it done was multiplied by a factor of at least two: between paying the salaries of private-sector programmers and paying for the profits of their employers, the Feds have wasted an incredible amount of money on people who can't get the job done.

Thank you so much for this comment. I just submitted my resignation this week after fifteen years of working for a federally-funded R&D center on federal cybersecurity projects, and what you have described is my experience as well. Conservatives have never cared about small government in terms of budgets, rather, they want the federal headcount to be as small as possible, so that the budgets can go straight into the accounts of consultants and system integrators.

I still believe there are short-term, tactical tasks and bleeding-edge research activities for which academic and private sector contractors (working in concert with and at the direction of engaged and well-compensated federal employees) are the best option, but a vast majority of federal contracting goes to things that the federal government, if it weren't reeling from decades of attacks from Republicans, could do with a properly-trained workforce. Any of the government employees I worked with at the GS13-15 level could have found a job with twice the pay and half the aggravation, so turnover was a serious problem for us, and is a major reason why I'm leaving the employer I thought I'd end my career with.
posted by tonycpsu at 11:01 AM on June 26 [3 favorites]


« Older It's iconic.   |   just a generalization of the feeling of isolation Newer »


This thread has been archived and is closed to new comments