Tuesday, October 16, 2007

Build An ActiveResource Schema Dumper

ActiveResource looks just like ActiveRecord, but a lot of the things you get for free with ActiveRecord aren't there with ActiveResource. One major weakness is the lack of association classes.

It's actually pretty easy to add custom methods on the server side which allow you to use find(:include) to return ActiveResource objects with their association class objects attached - for instance, if you write methods on the server side to support it, and patch ActiveResource's find(), you could easily call User.find(1) and get back a User with that user's Widgets attached. ActiveResource find() can do pretty much anything if you just have new keywords dispatch to new get() methods. It delegates to get() methods on keywords already, so you just follow that pattern and you can more or less send it anywhere. The key is the server-side methods that the get() methods call, but those aren't hard. All you really need on the server side is a reasonable understanding of routes and a #to_xml call on a collection which consists of both the User and their Widgets.

But there's a danger there. You might write code which assumes that ActiveResource models, like ActiveRecord models, will return [] or nil for association classes with no content. If a User has no Widgets, @user.widgets should return []. In fact, an ActiveResource application built on this assumption will raise a lot of NoMethodFound exceptions during development. This is especially true if you use ActiveResource models in form handlers. In fact the real danger isn't writing code with that faulty assumption but using it. You're in a controller, using a popular plugin, when suddenly everything blows up because you're missing methods on your models, but you didn't see it coming because they're only missing in some use cases. Even in the context of good TDD, code built on convention over configuration can blow up pretty badly if you're disregarding convention.

ActiveRecord attaches those methods automatically, but ActiveResource doesn't. Remembering to add them manually is pretty counter-intuitive. There's a simple way to handle this. ActiveRecord generates these methods when the application loads. ActiveResource can do the same thing. You need two parts - a server-side part and a client-side part. On the server, you need a method which generates a schema dump from the database, and on the client, you need code which attaches methods to model classes based on this schema dump. This is exactly how ActiveRecord generates these methods under normal circumstances; the only new addition is the translation to XML, for the sake of using a common network format.

On the server side, generating the schema is easy:

ActiveRecord::SchemaDumper.dump

This gives you Ruby. There's a schema_format class attribute on ActiveRecord base; you might think the cleanest way to get XML from here would be to define a new option for that attribute:

ActiveRecord::Base.schema_format = :xml

Unfortunately, schema_format doesn't seem to actually do anything any more. In fact, if you want to dump your schema to a SQL file, you don't even use SchemaDumper; check out railties/lib/tasks/databases.rake for more detail. To dump your schema to SQL, you don't use rake db:schema:dump at all. You use rake db:structure:dump.

Obviously, the correct solution here is to change that. There should be a SqlSchemaDumper and a RubySchemaDumper, each instances of SchemaDumper, which should become an abstract superclass, and Rails should dynamically determine the specific subclass to use based on the schema_format class attribute. But I'm not going to lecture about it; there's no excuse for lecturing about it instead of just fixing it, and I'm not actually going to fix it, so my lecturing isn't worth much.

However, if I was setting up that fix, I'd then add in a third subclass - XmlSchemaDumper. Obviously, this would be the option to go for when writing a server-side Rails controller action to return an XML schema to a client Rails app using ActiveResource.

If you're doing this, fixing the schema dumping process is probably worthwhile; if not, you could do it all ghetto-style and just churn through the Ruby schema dump with regular expressions. That'd be lame, though. In fact this whole thing has me thinking pretty seriously about refactoring this stuff, just out of a general neat freak vibe, but we'll see what happens.

Anyway, on the client side, just to finish up, you get XML (in one way or another) which gives you tables and columns. Obviously, you can map this directly to resources. If your ActiveResource objects correspond directly to your ActiveRecord objects, you're basically home free. The only thing left to do is write a simple bit of Ruby metaprogramming whatnot that takes a list like this:

:users => [:widgets, :other_widgets]

And adds User#widgets and User#other_widgets association methods accordingly. I'm going to skip the details. It's an interesting subject but the post would get too long. In brief, use String#constantize and attr_accessor, and read the final four chapters of "Ruby For Rails" if you don't understand what's going on.

The assumption here is that you're mapping ARec server models directly to ARes client models. This isn't always the way to go. Resources aren't objects, and there are pitfalls involved in treating them as such. Mike Mangino said some pretty interesting things at Rails Edge about how making this distinction, between objects and resources, significantly clarified application design on some of his projects. However, there's a lot of pragmatic benefit in a one-to-one mapping - apart from anything else, you could in theory auto-generate both your ActiveResource and ActiveRecord models/foo.rb files from your schema, eliminating a great deal of repetition. So, if you are doing it this way, model your ActiveResource application load phase on the app load phase from ActiveRecord.



Update: I can't think of any excuse for not implementing this refactoring, except for lack of time, so I may in fact give it a whirl.

6 comments:

  1. Instead of dumping the whole schema, you can publish the schema for each resource type as a resource itself.

    http://example.com/schemas/table

    So now my Video ARes class, which points to example.com, can call the schema URL when it first gets loaded.

    SchemasController is dead simple...load up the AR class specified, loop through the columns and spit out some json/xml/whatever with all the info.

    That just handles the column data types and default vals (an important end in its own right). But then you can have an <associations> element which specifies...give it your best guess.

    The associations element should be quite simple, something like

    <associations>
    <association name="hometown" type="belongs_to"/>
    <association name="lived_in_towns" type="has_many"/>

    You don't need to worry about sorting, conditions, and limits because that's all handled server-side. Also has_many :through is a non-issue because you just specify the association as a regular has_many. For example, if you wante has_many :lived_in_states, :through => :lived_in_towns, you just create a

    http://example.com/people/pat/lived_in_states

    resource. You don't care how that's implemented on the back end.

    So, yeah, I think it's doable, but I also think it's easily doable without having to hack AR to support another schema dumper. Though as you mentioned it'd be worth doing anyway.

    Keep in mind that I'm full of shit right now because I haven't done any of this, just spent a few minutes thinking about it.

    I'm going to be doing a ton of work with ARes over the next couple weeks most likely, and I wanted to talk to you about some stuff, so be prepared.

    ReplyDelete
  2. It doesn't make sense to invent a new schema format though. Using a standard format such as RelaxNG will get you some interoperability at least.

    ReplyDelete
  3. Here's the thing. It is totally doable within the context of ARes ideology. Ideology turns things into no-brainers - that's basically what it's for. But just because it's a no-brainer ideologically (a tautology if there ever was one) doesn't mean it's a no-brainer in every sense. Differentiating between a resource and a domain object can improve your design by making it more specific or muddy it by making it unnecessarily specific. If you have to distinguish between User and UserFromNetwork, that's just a waste of time. If you distinguish between User and Schema, that's potentially quite useful. I think it's very context-dependent, which is not the standard dogma.

    I've often changed my mind when I thought DHH was wrong and decided, oh, I should have been a DHH follower all along, but I'm not convinced here. I've remainined a holdout longer on this question than I have on most, and I've used ARes enough that I can say you can get good use out of it without necessarily using it as expected.

    In the middle of this TDD training so I can't properly concentrate, but that at least I think makes sense. You've got this tradeoff between differentiating or blurring, and I think there are arguments in favor of either one.

    ReplyDelete
  4. Also - getting all the schemas at once feels infinitely cleaner to me. Getting schemas shouldn't be dynamic; it happens at load time and from there on in the knowledge is just there in the app's brain.

    ReplyDelete
  5. Hi Giles, you may want to check the reddit page about your last post:

    http://programming.reddit.com/info/5yle2/comments/

    I would have posted this to the last post, but you disabled the comments.

    ReplyDelete

Note: Only a member of this blog may post a comment.