Tuesday, October 14, 2014

How Much Of "Software Engineering" Is Engineering?

When you build some new piece of technology, you're (arguably) doing engineering. But once you release it into the big wide world, its path of adoption is organic, and sometimes full of surprises.

Quoting Kevin Kelly's simultaneously awesome and awful book What Technology Wants, which I reviewed a couple days ago:

Thomas Edison believed his phonograph would be used primarily to record the last-minute bequests of the dying. The radio was funded by early backers who believed it would be the ideal device for delivering sermons to rural farmers. Viagra was clinically tested as a drug for heart disease. The internet was invented as a disaster-proof communications backup...technologies don't know what they want to be when they grow up.

When a new technology migrates from its intended use case, and thrives instead on an unintended use case, you have something like the runaway successes of invasive species.

In programming, whether you say "best tool for the job" or advocate your favorite One True Language™, you have an astounding number of different languages and frameworks available to build any given application, and their distribution is not uniform. Some solutions spread like wildfire, while others occupy smaller niches within smaller ecosystems.

In this way, evaluating the merits of different tools is a bit like being an exobiologist on a strange planet made of code. Why did the Ruby strain of Smalltalk proliferate, while the IBM strain died out? Oh, because the Ruby strain could thrive in the Unix ecosystem, while the IBM strain was isolated and confined within a much smaller habitat.

However, sometimes understanding technology is much more a combination of archaeology and linguistics.

Go into your shell and type man 7 re_format.

Regular expressions (``REs''), as defined in IEEE Std 1003.2 (``POSIX.2''), come in two forms: modern REs (roughly those of egrep(1); 1003.2 calls these ``extended'' REs) and obsolete REs (roughly those of ed(1); 1003.2 ``basic'' REs). Obsolete REs mostly exist for backward compatibility in some old programs; they will be discussed at the end.

This man page, found on every OS X machine, every modern Linux server, and probably every iOS or Android device, describes the "modern" regular expressions format, standardized in 1988 and first introduced in 1979. "Modern" regular expressions are not modern at all. Similarly, "obsolete" regular expressions are not obsolete, either; staggering numbers of people use them every day in the context of the find and grep commands, for instance.

To truly use regular expressions well, you should understand this; understand how these regular expressions formats evolved into sed and awk; understand how Perl was developed to replace sed and awk but instead became a very popular web programming language in the late 1990s; and further understand that because nearly every programming language creator acquired Perl experience during that time, nearly every genuinely modern regular expressions format today is based on the format from Perl 5.

Human languages change over time, adapting to new usages and stylings with comparative grace. Computer languages can only change through formal processes, making their specifics oddly immortal (and that includes their specific mistakes). But the evolution of regular expressions formats looks a great deal like the evolution which starts with Latin and ends with languages like Italian, Romanian, and Spanish - if you have the patience to dig up the evidence.

So far, I have software engineering including the following surprising skills:
  • Exobiology
  • Archaeology
  • Linguistics
There's more. There's so much more. For example, you need to extract so much information from the social graph - who uses what technologies, and what tribes a language's "community" breaks down to - that it would be easy to add anthropology to the list. You can find great insights on this in presentations from Francis Hwang and Sarah Mei.