Friday, May 11, 2007

Rails Developers, Junior and Senior

Toolman Tim broke my heart. In his tumblelog, he posted some code he was very proud of.



I wish I could say this code represented the minimum standard of competency in Rails. Unfortunately that isn't the case. Overriding dynamic finders requires a knowledge of Rails internals and an understanding of Ruby's OOP. That's true. But first, the code isn't as good as Toolman Tim thinks. It has a number of flaws. (Assaf Arkin pointed out these flaws to me.) Second, Tim can justifiably claim to be an above-average Rails developer. But really, understanding Rails internals and Ruby OOP shouldn't be enough to make something above-average work; it should be the bare minimum.



The flaws in Tim's code are a criticism of Tim's code; but the minimums thing is a criticism of the Rails culture.

Rails has a lot of shortcuts. But as Tolkien wrote in the The Fellowship of the Ring, possibly quoting some old folk proverb, short cuts make long delays. Some hackers see building your own framework as a rite of passage; "now I'm really a programmer," you can say. Others never even bother to read Jamis Buck's Under The Hood posts once. I don't know how they do what they do, but I think it involves a lot of cut and paste, and I do know that bad code results from their impatience.



I heard it said that either Jamis or Rick Olson (technoweenie) had said you should never use a plugin you couldn't have written yourself. That's true of Rails as well. The hype and hero worship around Rails gets exasperating sometimes, because Rails is not a complicated piece of code. It's an elegant one. Rails is the Post-It Note.



It's not the electric light bulb. That's the opposite of the Post-It Note. A difficult invention that most people said was impossible. Thomas Edison spent years struggling to make the light bulb happen. It only happened after he had failed hundreds of times, in hundreds of different ways. The Post-It note guy, on the other hand, just put glue on the back of a notecard and bam! He was rich for life.

Let me tell you - that's the way you do it. Money for nothing and your chicks for free.



(Side note: I permanently destroyed my dad's speakers with that song when I was in junior high. That's right, people: I'm taking it back to the old school with this post. Way back. Back to when Rails should have been invented...)

The thing about the Post-It Note is that anybody could have done it. That's why veteran Web developers freak out when they discover Rails. This should have been the standard by 1998. What's really remarkable is that it's such an obvious idea. Nobody ever did it before, but from the moment you discover it, you wonder how you lived without it.

Sometimes a mastery of the elementary is the true sign of genius.



The other thing about the Post-It Note is that it's made of simple parts. So is Rails! Set aside some time to build your own Web framework (like I did, a tiny one for E*Trade in 1998 that was really just a basic CMS) or to thoroughly examine the Rails source (another thing I did which all you young Rails whippersnappers should do too). You'll find building a Web framework involves a lot of work, and that the Rails source is easy to read. That's because the remarkable thing about Rails is not its complexity but its simplicity. The source is really not that complicated, but the task is. If you develop your own Web framework, you'll probably find Rails' elegance much harder to duplicate than its functionality.

Patching dynamic finders in ActiveRecord models is something a Rails programmer can be proud of; yet it really is so easy that it should be considered one of the baseline tests in a Rails job interview. In fact it shouldn't even go in the interview; it should be in the phone screen. Familiarity with Rails' internals should be normal.

Then again, perhaps I'm just underselling my own skills and history. Let me rephrase: these acts of cleverness should be considered bare minima for Rails programmers, and things to aspire to, for PHP-style page hackers. It's understanding the system inside-out that gives you the ability to use it well.

Anyway, let's get back to the code. Here are Assaf's criticisms, copied from an e-mail:



1. lower(email) runs a full table scan as opposed to using the index. That's bad programming in any language.

2. What happens when you call it with:
find_by_email("john@example.com", :conditions=>"true")

3. with_scope is a Rails technique. A Ruby developer, someone who learned the language, not just the framework, would write:
find(:first, { :conditions=>"..." }.merge(options)
Shorter, simpler and a transferrable skill you can use outside of ActiveRecord.

Brief explanations.

1 - When you access the attribute in the DB, you get an index; when you need to modify that attribute with lower(), the DB has to hit every row. That's what the term "full-table scan" means. It's less efficient than hitting the indexes, and in the case of a large data set, so much less efficient it'll kill your performance entirely. It's a very good idea both to understand and to remember how databases work.

2 - The answer to Assaf's question is that conditions gets overwritten instead of added to.

3 - #merge() is a Hash method. It adds new keys to an existing hash based on another existing hash. Unfortunately, it does so destructively. If you have two values defined for the :conditions key in two distinct hashes and you merge them, only one :conditions value will survive the harrowing ordeal. #merge() is Darwinian - it's survival of the fittest in there.



Assaf gave me a great explanation of the problem here:

*opts is a bug, lacking an understanding of how Ruby handles arguments. *opts refers to all other arguments you pass, the vararg equivalent. But when you call:

find_by_email("...", :limit=>5, :order=>"name")

the last two "arguments" (limit and order) are actually keys in a hash that's passed as a single argument. So anything you pass to find_by_email, the original, that falls inside *opts, would be passed to find after the last argument it expects (the hash of options), which should, if implemented correctly, cause find to fail with an error.

If you do {:conditions }.merge(options), then you can override :conditions by passing a new one in options. If you do options.merge(:conditions) then you can't override :conditions, since the last value takes over. But now you can only query by e-mail address, and not, say by combination of e-mail and name ( e.g to ensure a match).

If you want to merge conditions together, that's a bit more tricky, although for the simple case it would be:

(options || {}).merge( :conditions=> (options[:conditions] || {}).merge(:email=>email) )

So the code is actually very troubled. It looks as if it took an entirely wrong turn with its whole strategy. with_scope() shouldn't be used as a workaround for Hash#merge() - with_scope() was created for a specific reason, and it's very useful in the right context. But you should know the standard library well enough that you only ever use with_scope() to do what it's supposed to do.

But if we want to rewrite the code to work, and to be flexible enough to accomodate additional conditions, we end up with an absurd mess.

So how should we fix this code?

The correct way to handle this situation is not by patching a dynamic finder at all. The problem isn't in the retrieval code in the first place. The correct way to handle it is by putting a #downcase() on the e-mail param before that param ever even gets to the database. The indexing problem, the main issue Assaf raised, is so detrimental to performance that putting the fix in the code is an architectural mistake which no amount of code can fix, whether the code is good or bad. That's what Assaf means when he says it's bad programming in any language. That's also why any solution ends up ugly or bug-ridden.

The real problem here is architectural. It's pretty much impossible to solve an architectual mistake with code. The real lesson here is that even the newbie Rails programmer ought to understand architecture, because if you put your code in the wrong place, it won't matter if it's good or bad.

If you've started acquiring data before this ever even occurs to you, don't worry - it's easy to fix. All you do is write a small script which loads up every User, downcases its email attribute, and saves the User. It only takes a few lines of code - possibly even a one-liner - and you can run it from the console. I like to do it by putting DatabaseUpdater objects in /lib and then starting them off from the console with something like DatabaseUpdater.go_go_gadget_database(), but there's a pretty strong argument to be made that the migrations are a more appropriate place for this sort of thing. The cleanest solution in terms of self-documenting code is to create DatabaseUpdater.go_go_gadget_database() in /lib, test it in the console, and then call it from a migration, first on your development DB, and then on your production one.



This approach saves your clients or co-workers so much time that you can get away with the Inspector Gadget reference. In fact, the last time I did something like this, I was cleaning the database, so I named the object ViktorTheCleaner after the character from La Femme Nikita.



By the way, I actually very highly recommend idiosyncratic, story-esque names for things like this. The reason is that they put a face on your code, and that extra dose of personality makes it very easy to remember why things were done a certain way, and when, and by whom. It taps a very powerful part of your brain in a very effective way. Also, because of its superficial frivolity, masking a deeper seriousness, this approach also has the very useful side effect of exposing people within your organization who value professionalism over effectiveness. You need to watch out for those idiots, as they're often self-saboteurs who are dangerous to any project they work on, plus they always listen to bad music and recommend bad restaurants, so it's good to be able to spot them quickly and easily, preferably from a distance.

Anyway, big thanks to Assaf for the analysis, and apologies to Tim for the criticism.

4 comments:

  1. When I started with ruby, one thing that I noticed soon, was the different, friendly and helpful community.

    This post has nothing to do with that in my opinion.
    Come on, there are better ways to discuss this.

    ReplyDelete
  2. email addresses are case sensitive.

    The local-part of a mailbox MUST BE treated as case sensitive. Therefore, SMTP implementations MUST take care to preserve the case of mailbox local-parts. Mailbox domains are not case sensitive. In particular, for some hosts the user "smith" is different from the user "Smith".

    ReplyDelete
  3. Yes, e-mail addresses are case-sensitive. Learn how to use a functional index so you don't get the db performance hit.

    ReplyDelete
  4. 2. What happens when you call it with:
    find_by_email("john@example.com", :conditions=>"true")


    The two clauses are ANDed together. That's the whole point of with_scope, and what sets it apart from .merge with the options. Coupled with the email comments above it looks like Assaf is zero for three.

    anything you pass to find_by_email, the original, that falls inside *opts, would be passed to find after the last argument it expects (the hash of options), which should, if implemented correctly, cause find to fail with an error

    I fail to see the problem here. Rather than worry about the original find's exact signature, this implementation simply relays all arguments to find. The variable would have been more suitably named as "args" rather than "opts" but the principle is sound.

    ReplyDelete

Note: Only a member of this blog may post a comment.