Monday, June 15, 2015

The Future Is Full Of Broken Machines

John McCarthy created Lisp in 1958. My hope is that Clojure has finally mainstreamed it, although it might be too soon to say. If Clojure fades out, and if you look at the history of Scheme, Common Lisp, and companies like Naughty Dog, it might be more accurate to say that Lisp periodically surfaces outside of academia, but never really achieves escape velocity. Only time will tell.

But let's assume for the sake of argument that Clojure is indeed mainstreaming Lisp. Even if it's not true, JavaScript is mainstreaming functional programming, which is pretty close.

If that's the case, it's kind of horrifying: the optimistic interpretation is that Lisp took 49 years to reach the mainstream (because Clojure was released in 2007).

Brendan Eich came to Netscape because they told him he could put Scheme in the web browser. So we could stretch the definition and use JavaScript's 1995 release date instead. But we'd be seriously fudging the numbers, and we'd still have Lisp making a 37-year voyage from new paradigm to mainstream acceptance.

One of the classic books of the tech business is Geoffrey Moore's Crossing The Chasm, which is all about how to transition a product from the early adopters to the mainstream. If you buy the narrative that all programmers are eager explorers of the future, it's pretty ironic that a programming language should have such a hard time making this transition.

But let's be honest here. Most programmers are very conservative and tribal about their toolsets. And with good reason: programming any particular language requires a lot of specialized knowledge and experience. Once you get good enough at something intricate and challenging that you can charge a lot of money for it, you usually want to stick with it for a while. If you dive into a Clojure code base after years of writing C, it might be uncomfortable, awkward, and extremely unprofitable.

There's also something paradoxically both intimate and mechanistic about the way that wrapping your head around a programming language can change the way you think, and thus, to some extent, who you are. Learning your second programming language thoroughly and well is a lot harder than your fifth or your sixth. Programmers risk a phenomenon of paradigm freeze, similar to the phenomenon that psychologists have identified as "taste freeze" in music:
From around the age of 15 years old, music tastes begin to mature and expand as listeners increase the diversity of the music on their playlists.

Tastes appear to change most quickly through the teenage years until the age of about 25 when this sense of discovery slows and people move away from mainstream artists.
I even saw a Douglas Crockford keynote where he said that the only way you can advance a new programming paradigm is by waiting for the entire current generation of programmers to retire or die.

Let's pretend that we have all the cynicism and despair that anyone would get from working at Yahoo for a long time, and agree with Mr. Crockford, for the sake of argument.

It would stand to reason that there must be new programming paradigms today that have not yet crossed the chasm.

I believe that this is probably true, and I have two specific examples. The irony is that both of these paradigms are embedded in technologies we all use every single day. Yet I would not be surprised at all if they remained widely misunderstood for the next 50 years, just like Lisp did.

One of them is git.

You Probably Don't Understand Git (For Large Values Of You)


git's a completely decentralized technology, which requires no master branch whatsoever (master is just a name).

But people typically treat git as a completely centralized technology which depends absolutely on its center, GitHub.

You've probably heard some variant of this story before:
Panda Strike's CEO, Dan Yoder, told me a story about a startup where he served as CTO. GitHub went down, and the CEO came into his office, saying, "we have to move everything off GitHub!" He was upset because no programmer could do any work that day. Except, of course, they could. You choose one person's repo as the new, temporary master version of master, fire up sshd, and replace GitHub with your laptop until their servers come back online.
One time, I worked at a company which was nearly all remote, with a small office in one city but plenty of programmers in other cities. Soon after I joined, we all spent a few days hacking and hanging out in a cabin in the woods. Our WiFi was very unreliable, so we were unable to reach GitHub. We just used gitjour, which wraps Bonjour, Apple's Zeroconf networking tech, to host and advertise git servers over small local wireless networks. In other words, one person said "I'm the canonical repo now," and we all connected to their computer.

The point is, git doesn't depend on GitHub. GitHub adds value to git. But to most people, the major difference between git and Subversion is that there's a web site attached to git.

On Panda Strike's blog, I went into detail about this:
GitHub is also hierarchical even though git is flat. If GitHub only added a center to git, it would have commits flow to the center of a web of repos. But that's not how it works. On GitHub, pull requests have to flow upwards to attain lasting impact.

It's this:



Not this:

Using GitHub adds a center and a hierarchy to git. These are usually beneficial. But, as I explore in that blog post, the added-on center and hierarchy become a problem when you have a project with a thriving community but a disinterested creator.

And the real downside of this tradeoff isn't that edge case. The real downside of treating git as if it were centralized is that lots of people assume that it is centralized. To a lot of people, this entirely new paradigm of distributed version control is basically just Subversion with a web site and a smiling cartoon octopus/cat beast from the island of Dr. Moreau.

You Probably Don't Understand HTTP Either


HTTP has this problem too. Not only that, HTTP's had this problem for more than twenty years. People who don't understand HTTP are constantly reinventing features that the protocol already has, and moving those features from the protocol layer to the application layer in the process.

Some examples:
But the biggest and most egregious examples would be media types, the POST header, and, of course, REST.

Media types matter because HTTP has a type system. It's based around the fundamental, important, and seemingly forgotten idea of hypermedia. This idea is so powerful that it basically puts an everything-is-an-object system like Smalltalk or Hypercard around the entire planet; but it's so frequently under-exploited that it's almost just a footnote. (But a footnote which can make your API traffic incredibly fast.)

With POST, the situation's improving, but for decades, POST was the go-to HTTP verb for nearly everything a Web app ever did. Ironically, however, POST was intended as a footnote. HTTP's basically a giant, globe-spanning key/value store, but just in case, the spec included POST for any kind of random operations that weren't covered in the primary use cases of GET, PUT, and DELETE.
The core operations for a key-value store are get, put, and delete. As you'd expect, each of these correspond to well-defined HTTP verbs. And by well-defined, I mean that they're more than just window-dressing to indicate intent. For example, a client might cache the response to a GET request exactly because it's defined to allow that.

But HTTP includes a fourth verb, POST, which provides for cases where strict key-value store semantics don't suffice. Rather than take the pedantic tack of insisting that everything fit into a single abstraction, HTTP gives you POST as a fallback.

Unfortunately, for historical reasons, this led developers to misunderstand and overuse POST, which, in turn, contributed heavily to the confusion that surrounds HTTP to this day.
In practice, most web developers have looked at POST as the mechanism which enables RPC on the web. "If I want to prompt the server to perform an action of any kind, I use POST." This meant that a huge number of HTTP requests over the past twenty-plus years could have used HTTP verbs to identify their purposes and intents, but instead had the application layer figure it out.

Even systems like Rails, whose developers realized that they could use HTTP verbs for this purpose, lost track of the basic idea that HTTP was a big key/value store. Instead of recognizing that PUT maps exactly to the act of putting a new key in a hashtable, they chose randomly, with no obvious rationale, to arbitrarily consider PUT equivalent to the "update" in CRUD, and POST equivalent to CRUD's "create."

Using the application layer to handle protocol-level information makes web apps slower to run, and more expensive to build and maintain. If we could total up the dollar value of this misplaced effort, it would be quite a lot of money. That's true also for the example of rebuilding Basic Auth by hand on every site and app since day one.

As for REST, it's a huge topic. For now, just understand that this mountain of errors we're looking at is really just the tip of an iceberg.

Superset The Dominant Paradigm


To paraphrase William Gibson, the future is already here, it's just not widely recognized. People in general find it a lot easier to put a new, unfamiliar thing in a familiar category than to wrap their heads around a new idea, and that's true even when the new idea doesn't really fit in the category they choose for it. Designers even do this on purpose; for instance, it's not an accident that getting on an airplane feels a lot like getting on a train, and the reason isn't because trains are necessarily great models for organizing transit. They're good, but that's not the reason. The reason is that when flight first became a widespread technology, it scared the shit out of people. Designers made it look familiar so it would feel safe.

In 2008, GitHub basically did the same thing. Git's fundamentally a functional data structure, but that sales pitch will only work for a few very unusual people. "Imagine if Subversion could handle many more branches at a time" is a much easier sell. Likewise, treating hypermedia like a bunch of remote procedure calls was just easier for a lot of people.

But here's where I disagree with Mr. Crockford: I believe that the idea that everybody has to understand a paradigm, for that paradigm to matter, is itself an outdated paradigm. After all, both HTTP and git have been wildly successful despite consistent and incredibly widespread misuse.

Maybe the key is just to superset some existing paradigm, so that the late adopters can use their old paradigms, while early adopters use their new paradigms, all within the same technology. This approach certainly worked for JavaScript, and it might even be the secret sauce behind Git and HTTP's success stories too.