Wednesday, November 29, 2006

Refactoring Rails Code

Today I had a request for a code sample. I kind of went overboard with it and sent three samples. The first, I won't describe it yet. The second was my JavaScript Matrix falling text effect from 2004. The third was some code from a recent project which had some very cool aspects, but in the process of submitting it, I noticed some flaws, and they've been bugging me for hours, so I'm going to clean them up here, and blog it, and maybe it's going to be interesting.

Before I go any further, tho, Blogger seems to be doing all kinds of things that are making this a bit inconvenient, like going insane whenever it sees a less-than sign. So I had to use ? for less-than.

Anyway, here's the original code:

# http://justinfrench.com/index.php?id=122
class UserAreaController ? ApplicationController

def index
end

def drugs
end

def genes
end

def references
end

def render_drugs
drugs, genes = tokenize(params[:function_call])
drugnames = []
drugs.each do |drug|
if drug.is_a? Array
drugnames.push(drug[0])
end
end
# this creates an instance var for each involvement type; e.g., Inducer --> @inducers
Interaction.find(:all, :include => "drug").group_by(&:involvement_type).each do |involvement, list|
list.reject! {|interaction| not drugnames.include? interaction.drug.name.downcase}
instance_variable_set( "@#{involvement.downcase.pluralize}", create_drug_links(list) )
end
end

def render_genes
# we only want to see genes shown in the Flash display
drugs, genes = tokenize(params[:function_call])
all_genes = Gene.find_all
genes_in_display = []
genes.each do |gene|
genes_in_display.push(all_genes.reject {|x| x.name != gene})
end
# hack! the above returns an array of arrays, this fixes it...
@genes = []
genes_in_display.each {|x| @genes.push(x[0])}
end

def render_references
# whatever the central thing in the Flash display is, we only
# want to see references for that
drugs, genes = tokenize(params[:function_call])
@references = []
case params[:function_call][0..0]
when "p" # patients diagram
when "g" # genes diagram
@references = Gene.find_by_name(genes[0]).referencelinks
when "d" # drugs diagram
@references = Drug.find_by_name(drugs[0]).referencelinks
end
end

private
def tokenize(function_call)
drugs = []
genes = []
case function_call[0..0]
when "p" # patients diagram
function_call.scan(/name:'([^']+)',interaction:'([^']+)/).each do |name, interaction|
drugs.push(name)
end
when "g" # genes diagram
matchdata = function_call.match(/geneDrugDiagram\('(.+)',\[/)
genes.push(matchdata[1][3..99])
function_call.scan(/name:'([^']+)',interaction:'([^']+)/).each do |name, interaction|
drugs.push([name, interaction])
end
when "d" # drugs diagram
matchdata = function_call.match(/drugGeneDrugDiagram\('(.+)',\[/)
drugs.push(matchdata[1])
function_call.scan(/name:'([^']+)',interaction:'([^']+)/).each do |name, interaction|
case name
when /CYP|PGP|UGT/ # use phase to identify genes, then...
genes.push(name[3..99]) # remove phase, irrelevant for our purposes here
else
drugs.push([name, interaction])
end
end
end
return drugs, genes
end

end


The first change I made was kind of a show-off thing, using Procs.


case function_call[0..0]
when "p" # patients diagram
get = Proc.new {|name, interaction| drugs.push(name)}
when "g" # genes diagram
matchdata = function_call.match(/geneDrugDiagram\('(.+)',\[/)
genes.push(matchdata[1][3..99])
get = Proc.new {|name, interaction| drugs.push([name, interaction])}
when "d" # drugs diagram
matchdata = function_call.match(/drugGeneDrugDiagram\('(.+)',\[/)
drugs.push(matchdata[1])
get = Proc.new do |name, interaction|
case name
when /CYP|PGP|UGT/ # use phase to identify genes, then...
genes.push(name[3..99]) # remove phase, irrelevant for our purposes here
else
drugs.push([name, interaction])
end
end
end
function_call.scan(/name:'([^']+)',interaction:'([^']+)/).each &get
return drugs, genes
end


This eliminates the repetition of the regular expression. Good thing. But as soon as I did it, I realized it would be nicer if I could do something like:


function_call.scan(/name:'([^']+)',interaction:'([^']+)/).each filter


...where filter would be a method just returning whichever Proc was most appropriate.

I also noticed some really obvious flaws.

The first thing, if you look at the action names, they go like this: index, genes, drugs, references; render_genes, render_drugs, render_references. So the obvious thing is to separate those render_* actions. Make it a RenderController.

The next thing is that right now it uses a system where the UserAreaController inherits from ApplicationController and then controllers in the user area inherit from UserAreaController. That approach is actually very much discouraged by the Rails core team. Additionally, I've seen it work perfectly on one server and then blow up on another server, with no obvious reason for the change in behavior. So, the first question is, how do you preserve that utility, while using a different approach?

I don't have time to test this part, but my theory is you can do it with modules named UserArea and AdminArea, and instead of inheriting from UserAreaController or AdminAreaController, you just mix in the area-specific code with include AdminArea or whatever.

Anyway, let's say for the sake of argument we get that sorted out and we have it nicely refactored down to this.


class RenderController ? ApplicationController
include UserArea

def drugs
drugs, genes = tokenize(params[:function_call])
drugnames = []
drugs.each do |drug|
if drug.is_a? Array
drugnames.push(drug[0])
end
end
# this creates an instance var for each involvement type; e.g., Inducer --> @inducers
Interaction.find(:all, :include => "drug").group_by(&:involvement_type).each do |involvement, list|
list.reject! {|interaction| not drugnames.include? interaction.drug.name.downcase}
instance_variable_set( "@#{involvement.downcase.pluralize}", create_drug_links(list) )
end
end

def genes
# we only want to see genes shown in the Flash display
drugs, genes = tokenize(params[:function_call])
all_genes = Gene.find_all
genes_in_display = []
genes.each do |gene|
genes_in_display.push(all_genes.reject {|x| x.name != gene})
end
# hack! the above returns an array of arrays, this fixes it...
@genes = []
genes_in_display.each {|x| @genes.push(x[0])}
end

def references
# whatever the central thing in the Flash display is, we only
# want to see references for that
drugs, genes = tokenize(params[:function_call])
@references = []
case params[:function_call][0..0]
when "p" # patients diagram
when "g" # genes diagram
@references = Gene.find_by_name(genes[0]).referencelinks
when "d" # drugs diagram
@references = Drug.find_by_name(drugs[0]).referencelinks
end
end

private
def tokenize(function_call)
drugs = []
genes = []
case function_call[0..0]
when "p" # patients diagram
get = Proc.new {|name, interaction| drugs.push(name)}
when "g" # genes diagram
matchdata = function_call.match(/geneDrugDiagram\('(.+)',\[/)
genes.push(matchdata[1][3..99])
get = Proc.new {|name, interaction| drugs.push([name, interaction])}
when "d" # drugs diagram
matchdata = function_call.match(/drugGeneDrugDiagram\('(.+)',\[/)
drugs.push(matchdata[1])
get = Proc.new do |name, interaction|
case name
when /CYP|PGP|UGT/ # use phase to identify genes, then...
genes.push(name[3..99]) # remove phase, irrelevant for our purposes here
else
drugs.push([name, interaction])
end
end
end
function_call.scan(/name:'([^']+)',interaction:'([^']+)/).each &get
return drugs, genes
end

end


One thing it's easy to clean up, the tokenize() call happens with every method. So let's just put that in a before filter.


class RenderController ? ApplicationController
include UserArea
before_filter :tokenize

...

private
def tokenize(function_call = params[:function_call])
@drugs = []
@genes = []
case function_call[0..0]
when "p" # patients diagram
get = Proc.new {|name, interaction| drugs.push(name)}
when "g" # genes diagram
matchdata = function_call.match(/geneDrugDiagram\('(.+)',\[/)
@genes.push(matchdata[1][3..99])
get = Proc.new {|name, interaction| @drugs.push([name, interaction])}
when "d" # drugs diagram
matchdata = function_call.match(/drugGeneDrugDiagram\('(.+)',\[/)
@drugs.push(matchdata[1])
get = Proc.new do |name, interaction|
case name
when /CYP|PGP|UGT/ # use phase to identify genes, then...
@genes.push(name[3..99]) # remove phase, irrelevant for our purposes here
else
@drugs.push([name, interaction])
end
end
end
function_call.scan(/name:'([^']+)',interaction:'([^']+)/).each &get
end

end


That was pretty easy; the only complications, had to add a default parameter for tokenize(), and the drugs and genes arrays had to be made into instance variables, and that changed some names in the genes action. No big deal, though. The genes action, that's still pretty hacky, but we can fix it later.

The cool part is, now that there's structure to support it, the Lispy approach becomes viable. To use the Lispy style, all you really need is lambda.


private
def tokenize(function_call = params[:function_call])
@drugs = []
@genes = []
function_call.scan(/name:'([^']+)',interaction:'([^']+)/).each &filter
end

def filter
case params[:function_call][0..0]
when "p" # patients diagram
lambda {|name, interaction| @drugs.push(name)}
when "g" # genes diagram
matchdata = function_call.match(/geneDrugDiagram\('(.+)',\[/)
@genes.push(matchdata[1][3..99])
lambda {|name, interaction| @drugs.push([name, interaction])}
when "d" # drugs diagram
matchdata = function_call.match(/drugGeneDrugDiagram\('(.+)',\[/)
@drugs.push(matchdata[1])
lambda do |name, interaction|
case name
when /CYP|PGP|UGT/ # use phase to identify genes, then...
@genes.push(name[3..99]) # remove phase, irrelevant for our purposes here
else
@drugs.push([name, interaction])
end
end
end
end

end


Now it's still kinda messy, and in fact I'm thinking a more OOP approach might be cleaner. The problem there is that this system has diagrams in Flash, and it really needs diagram objects in ActionScript, on the Flash side, and it may possibly need them in both Ruby and JavaScript as well. But all those diagram objects imply a lot of repetition. That's an interesting question, a complex UI can kind of hammer Rails' super-DRY-ness, but it's a totally different topic, so let's put it aside for now.

Getting back to this code, the reason the Lispy style might not be appropriate here is that there's a lot of dependence on side effects. Even though I am returning lambdas, there are initialization-ish steps that take place prior to the lambdas being returned, and those steps might make more sense in an object instead. Might be less loose ends that way. I'm not sure.

On the other hand, I'm definitely happier with it than I was when I started.

I'd do some more, but Blogger's formatting issues are making me nutty.

Going back to the multiple objects in multiple languages thing, I actually did a little presentation on why I think a complex UI can present problems for Rails' otherwise very elegant structure, and I think it's going to be turned into a podcast, so I'll link it here when that happens.

The funny thing is, I fired the clients I wrote this code for, but I'm probably going to continue working on it for months. The stuff in the podcast, I'll blog about it at some point, I think, it's pretty challenging. It raises some really interesting questions and I don't know what the answers will be. I do know it'll involve Object#to_json.

Anyway, I should point out, the stuff with group_by() and instance_variable_set(), I got a lot of help from the Ruby-Talk mailing list on that one, especially from David Black (of Ruby For Rails). So, some gratitude there, a shout-out. Ruby-Talk in the house. It's actually pretty interesting, so I'm going to take a quick look at it.


# this creates an instance var for each involvement type; e.g., Inducer --> @inducers
Interaction.find(:all, :include => "drug").group_by(&:involvement_type).each do |involvement, list|
list.reject! {|interaction| not drugnames.include? interaction.drug.name.downcase}
instance_variable_set( "@#{involvement.downcase.pluralize}", create_drug_links(list) )
end


What this does is very simple. First line, it finds all interactions, groups them by involvement type, and makes sure the drug objects they're linked to are also passed to the block. Second line prunes irrelevant interactions. Third line creates an instance variable for each involvement type, and connects a drug link based on the list of interactions which have that involvement type. So you automatically get instance vars representing the category you want to deal with, and containing all the relevant data for that category.

I re-used this pattern later on, in another piece of code, which I hope to blog about later. It's a very handy way to build variables for Rails views, because it automatically provides you with only the variables that will actually contain data. The original version didn't include the list-pruning line, so it was only three lines of code, one block. It replaced something like 15 lines, and that's counting whitespace.

The only pitfall with this pattern, of course, is that if you use it, you have to watch out for views which contain references to instance vars which don't exist. I do have a solution for this in the other piece of code based on this idea, but it's a whole nother topic.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.