Thursday, October 18, 2012

Mock Bad Code (Because Fractals)

Everybody encounters bad code sometimes, and wonders what to do.

The answer is simple.

You should mock it, in the sense of making fun of it, because it's often funny.

You should also mock it, in the sense of testing its implementation details, because you will need to rewrite it, and you will need to verify that it works the way exactly how you think it does. Because it probably doesn't.

The first thing that people do to figure out code is they read it. In the case of bad code, it's a mistake.

Bad code never does exactly what it says it does. Bad code is hard to read, or wildly misguided, or both.

You only have three ways to find out what code does: read it, test it, or test it in production. Testing code in production is obviously a bad idea. But when code is bad, reading it is also a bad idea.

The worse code gets, the more likely it becomes that you're better off testing the code than reading it. Bad code uses variable names which are incomprehensible; really bad code uses variable names which are inaccurate; truly awful code uses indecipherable variable names to describe an inaccurate model of the business logic.

When it comes time to test bad code, you have to operate on the assumption that whoever wrote it didn't really understand what they were writing. I have seen better programmers than me implementing call/cc while calling it event bubbling, or totally missing the fact that they implemented a Factory pattern when they thought they were cleverly hacking an inheritance tree. I have also seen much worse programmers than me referring to "length" as "lenght" every single time in a code base, not just in comments and variable names, but in class names as well. A Factory pattern does not resemble OOP inheritance, and "lenght" is not the correct spelling of any word in any language I'm aware of.

One easy method of burning out, as a programmer, is reading bad code when you could be testing it. It pollutes your brain and gives you extra work just figuring out the difference between the bad code's terminology and the actual business logic. It is especially frustrating because the work does not pay off. Filling your brain with the bad code's terminology rarely helps you understand anything, and always constitutes unnecessary mental overhead.

It is a wasteful expenditure. You'll be happier if you start with tests or specs written in extremely simple language. Use these to define exactly what the system does.

Once you have good code which defines the system, it starts making sense to read that code. And once you've read a few solid descriptions of what the code actually does, take aim at the naming and see how it clears things up.

Where you take the specs after this point is a matter of style, but the way you get there is not just with specifications, but also with specifics. It is always better to find out what the system really does, and this goes double if you inherit the code from anybody else.

Code written by other people who understood the logic they were attempting to describe is a rare joy. But bad code written by other people, when it fails to accurately model the business domain, can then only be understood if you have (or write) some kind of glossary which tells you that when the bad code says (for instance) "OOP inheritance," it means "a Factory pattern."

Save yourself the stress.

By the way, that "lenght" thing is a true story. On this project, I persuaded the head of engineering (who was, in my personal opinion, an insane dipshit) to move automated testing from the lowest priority to the highest priority. I wrote integration tests, and I could literally run the test suite ten times and get different results every time. Eventually, as I cleaned the code up, patterns emerged in the test suite's unpredictable counts of which tests passed and which tests failed. It was literally like hacking a Feigenbaum sequence generator and modulating the variable which stands for robustness.

Bit of a tangent, I did that in high school on my graphing calculator.

A Feigenbaum sequence begins as an iterating equation, e.g.:



I don't actually remember what that means, but I copied it from either this book or this one and figured out the code to run it into my graphing calculator.

This is a graph of the Feigenbaum sequences for values of x and i between 0 and 1.



This is what I tried to draw on my graphing calculator, but its hardware had very little RAM, and its implementation of BASIC did not support lambdas, so I had to settle for drawing individual sequences, while modulating by hand the values of x and i.

An individual sequence could look like this:



If it looks like this, repeating the iterating function with those particular values for x and i zeroes in on one consistent return value.

Or this:



If it looks like this, repeating the iterating function with those particular values for x and i produces an oscillation through multiple return values.

Or this:



If it looks like this, repeating the iterating function with those particular values for x and i produces a very noisy and unpredictable range of values which appears random, but which is more accurately understood as chaotic, which in mathematical terms stands for situations where apparently random data results from deterministic input.

When I started, the tests I had for the legacy code I brought under control would, as I said, give me different results every time I ran them. If I had graphed the passing or failure of any given test x against the number of times I ran it i, I would have gotten a graph that looked like this:



Once I got the code under control, a predictable rhythm started to emerge in the mapping of any given test's success or failure x against the number of times I ran it i:



As I refactored the code, I got closer and closer to a situation where the spec's passage or failure x mapped against the number of times I ran it i would converge on a single value -- either passing or failing every time -- like this:



You might think it impractical to write a Feigenbaum sequence generator in BASIC, but a) I had a lot of free time in my classes, because I never listened to my teachers, and b) it turned out to be a useful model for fixing extremely unpredictable software.

That's right, motherfuckers. When you're this good, you don't have to make sense.