Wednesday, May 13, 2015

Strong Parameters Are A Weak Schema

Ruby on Rails went off the rails a long time ago.

I don't work with Rails today. But, like so many other developers, I kept working with Rails for many years after the Merb merge. Because I loved Ruby, and because the Rails developer experience remains a thing of beauty, even today.

I stuck around for Rails 4, and one of the changes it made was silly.
Rails has always had a nice way of sanitizing user input coming from ubiquitous forms. Up until Rails 3, the solution was to list accessible fields right in your models. Then Rails 4 came along and introduced a different solution - strong_parameters, allowing you to take a greater control over the sanitizing process.
As is often the case with Rails, the real problem here is that the core team failed to recognize a classic problem of computer science, after underestimating the importance of API-centric web development, and perceiving the problem purely in terms of showing a web page to a user.

What Rails Was Thinking

Before I get into that, I just want to summarize the problem from the Rails perspective: you've got input coming in from users, who are filling out web forms. They might be up to mischief, and they might use your web form to cause trouble. So you have to secure your web forms.

The classic Rails solution for securing a web form: attr_accessible. Since models are the only way Rails puts anything into a database, you can recast "securing a web form" as validating an object. It makes perfect sense to say that code which secures an object's validity belongs in that object. So far, so good.

attr_accessible was a white-listing mechanism which allowed you to specify which model attributes could be mass-assigned. The default for updating or creating an object in Rails, update_attributes, would allow a user to update any aspect of a model, including (for example) their or their authorization privileges.

But this whitelisting was disabled by default. You had to kick it into gear by calling attr_accessible at least once, in your model. People forgot to do this, including people at GitHub, a very high-profile company with great developers, which got very visibly hacked as a result. People responded by writing initializers:


(Obviously, a better way to do that would be to wrap it in a method called enable_whitelist or something, but that's a moot issue now.)

People also responded by writing plugins, and in Rails 4, one of these plugins moved into Rails core.

So this is what changed:
  • attr_accessible had an inverse, attr_protected, which allowed you to use a blacklist instead of a whitelist. strong_parameters only permits a whitelist.
  • The whitelisting default changed from off to on.
  • The code moved from the model to the controller.
David Heinemeier-Hansson wrote up the official rationale. I've added commas for clarity:
The whole point of the controller is to control the flow between user and application, including authentication, authorization, and, as part of that, access control. We should never have put mass-assignment protection into the model, and many people stopped doing so long ago ...

An Alternative Approach

Let's look at this from a different perspective now.

Say you're building a web app with Node.js, and you want to support an API as well as a web site. We can even imagine that your mobile app powers much more of your user base, and your web traffic, than your actual web site does. So you need to protect against malicious actors exploiting your web forms, as web apps always have. But you also need to protect against malicious actors exploiting your API traffic.

At this point, it's very easy to disagree with Mr. Hansson's claim that "we should never have put mass-assignment protection into the model." Both the "protect against malicious actors" problems here are very nearly identical. You might have different controllers for your API and your web site, and putting mass-assignment protection into those controllers could mean implementing the same code twice. Centralizing that code in the relevant models might make more sense.

Rails solves this by quasi-centralizing the strong_parameters in a private method, typically at the bottom of the controller file. Here's the example from the official announcement:

But you could also just use JSON Schema. All your web traffic's probably using JSON anyway, all your code's in JavaScript already, and if you write up a schema, you can just stipulate that all incoming data matches a particular format before it gets anywhere near your application code. You can put all that code in one place, just as you could with models, but you move the process of filtering incoming input right up into the process of receiving input in the first place. So when you do receive invalid input, your process wastes less resources on it.

(This is kind of like what Rails did, except you can put it in the server, which in Rails terms would be more like putting it in a Rack middleware than in a controller.)

The funny thing is, writing a schema is basically what Rails developers do already, with strong_parameters. They just write their schemas in Ruby, instead of JSON.

Here's a less cluttered example:

Note especially the very schema-like language in this line:

params.require(:email).permit(:first_name, :last_name, :shoe_size)

All you're doing here is permitting some attributes and requiring others. That's a schema. That's literally what a schema is. But, of course, it lacks some of the features that a standard like JSON Schema includes. For instance, specifying the type of an attribute, so mischevious Web gremlins can't fuck up your shit by telling you that the number of widgets they want to purchase is `drop table users`. (Rails has other protections in place for that, of course, but the point is that this is a feature any schema format should provide.)

Rails developers are writing half-assed schemas in Ruby. If/when they choose to farm out parts of their system to microservices written in other languages, they'll have to re-write these schemas in other languages. For instance, they might at that point choose to use a standard, like JSON Schema. But if you're building with the standard from the start, you only have to define that schema once, using one format.

In fact, Rails developers typically re-write their schemas whether they refactor to microservices or not. Many Rails developers prefer to handle API output using active_model_serializers, which gives you a completely different Ruby-based schema format for your JSON output.

Here's an example from the README:

This code says "when you build the output JSON, serialize the name and body attributes, include post_id, and add some hypermedia-style URLs as well." It infers a lot from the database, and it's nicer to read than JSON Schema. But you can't infer a lot from a database without some tight coupling, and this syntax loses some of its appeal when you put it side-by-side with your other implicit Ruby schema format, and you have to remember random tiny distinctions between the two. It's kind of absurd to write the same schema two or three different times, especially when you consider that Ruby JSON parsing is so easy that your JSON Schema objects can be pure Ruby too if you want.

strong_parameters really only makes sense if you haven't noticed basic things about the Web, like the fact that HTTP has a type system built in.