Sunday, April 10, 2016

The Fallacies Of Distributed Coding

If you only ever write code which runs on one machine, and only ever use apps which have no networked features, then computers are deterministic things. It used to be a given, for all programmers, that computers were fundamentally deterministic, and thanks to the internet, that just isn't true any more. But it's not just the rise of the internet, which its implicit mandate that all software must become networked software, which has killed the idea that programming is inherently deterministic. Because everybody's code became a distributed system in a second way.

If you write Ruby, your code is only secure if RubyGems.org is secure. If you write Node.js, your code is only secure if npmjs.com is secure. And for the vast majority of new projects today, your code is only secure if git and GitHub are secure.

Today "your" code is a web of libraries and frameworks. All of them change on their own schedules. They have different authors, different philosophies, different background assumptions. And all the fallacies of distributed computing prove equally false when you're building applications out of extremely modular components.
  1. The network is reliable. This is obviously a fallacy with actual networks of computers, but "social coding," as GitHub calls it, requires a social network, with people co-operating with each other and getting stuff done. This network mostly exists, but is prone to random outages.
  2. Latency is zero. The analogy here is with the latency between the time you submit a patch and the moment it gets accepted or rejected. If you've ever worked against a custom, in-house fork of a BDD library whose name.should(remain :unmentioned), because version 1.11 had a bug, which version 1.12 fixed, but version 1.12 simultaneously introduced a new bug, and your patches to fix that new bug were on hold until version 1.13, then you've seen this latency in action, and paid the price.
  3. Bandwidth is infinite.
  4. The network is secure. Say you're a law enforcement agency with a paradoxical but consistent history of criminality and espionage against your own citizens. Say you try to get a backdoor installed on a popular open source package through legal means. Say you fail. What's to stop you from obtaining leverage over a well-respected open source programmer by discovering their extramarital affairs? I've already given you simpler examples of the network being insecure, a few paragraphs above. I'm hoping this more speculative one is purely hypothetical, but you never know.
  5. Topology doesn't change.
  6. There is one administrator.
  7. Transport cost is zero. Receiving new code updates, and integrating them, requires developer time.
  8. The network is homogeneous.
Open source has scaled in ways which its advocates did not foresee. I was a minor open source fan in the late 1990s, when the term first took hold. I used Apache and CPAN. I even tried to publish some Perl code, but I was a newbie, unsure of my own code, and the barriers to entry were much higher at the time. Publishing open source in the late 1990s was a sign of an expert. Today, all you have to do is click a button.

The effect of this was to transform what it meant to write code. It used to be about structuring logic. Today it's about building an abstract distributed system of loosely affiliated libraries, frameworks, and/or modules in order to create a concrete distributed system out of computers sending messages to each other. The concrete distributed system is the easy part, and people get it wrong all the time. The abstract distributed system is an unforeseen consequence of the incredible proliferation of open source, combined with the fact that scaling is fundamentally transformative.