Monday, August 1, 2011

ActiveRecord Minus The Record Part

Today an ActiveRecord discussion revisited an issue raised about a year ago. The short version: ActiveRecord models which contain all your application's domain modelling become bloated and a total pain in the ass to test, refactor, and/or read.

A year ago, James Golick said:

The truth is that in a simple application, obese persistence objects might never hurt. It's when things get a little more complicated than CRUD operations that these things start to pile up and become pain points. That's why so many Rails plugins seem to get you 80% of the way there, like immediately, but then wind up taking forever to get that extra 20%.

My favorite comment in today's discussion echoed this thought:

I think this all depends solely on the complexity of the app you are building.

If you're building a small website, even proper MVC might be overkill and things like Sinatra + Sequel might be the best solution.

If you're building a medium sized app, the Rails approach of thin controller / thick model will be just right.

If you're building an enterprise information system, you might need that three-tier architecture with presentation / business logic / persistence layers separation and maybe even other Java world practices.

Don't over-engineer, don't under-engineer. Make it just right for the thing you are building.

Factoring an ActiveRecord persistence strategy (either the design pattern, or the Rails gem) out of the centerpiece of your object model makes a lot of sense once you pass some threshold of scale. In this instance by "scale" I mean code base size but also possibly traffic. The argument for code base size is hopefully obvious, and certainly covered in the blog posts I've already linked to and quoted.

With enough traffic, strong arguments mount for slicing your persistence up. You might read from read-optimized databases while writing to a central write-optimized one (since most web apps do a lot more reading than writing) or handle most persistence through SQL while shunting a small subset off to NoSQL alternatives. It gets ridiculous managing multiple persistence solutions within an object which exists to model your business logic. A line or two of ridiculous, I can handle - I've written ActiveRecord models which snuck in calls to Redis pub/sub on the side - but go much further beyond that and the argument for separate objects becomes rock-solid.

In this sense, the traffic-motivated split is really just a code-size-motivated split with a specific reason for code size growing, and obviously where a general code-motivated split can depend on the overall size of your model code, a traffic-motivated one would depend on the size and complexity of your persistence code.

I don't think there's really any debate here at all, except for one crucial question: where do you mark the threshold? How do you decide when your code needs this split? Although you can certainly perform various measurements to guide this decision, I think this is a judgement call, and pretty much impossible to decide ahead of time. I've worked on sites which I knew for a fact would see gazillions of pageviews from the first day of launch, whether their larger business goals succeeded or failed. For a site like that, I would absolutely start by factoring ActiveRecord out into a service like James does. For more typical Rails sites, I'd start with ActiveRecord and move it out of the picture only when necessary. In my opinion this kind of split is crucial if you hope to scale a Rails site, but optional for most smaller projects.