Tuesday, November 28, 2006

Rails Scalability: Real-World Solutions

Ezra "Brainsplat" Zygmuntowicz made an absolutely awesome post to the Ruby-Talk list tonight.

If your boss or your clients have asked you about scaling Rails apps, which practically every boss or client does, even the cool ones, you should get his book, forthcoming from the Pragmatic Programmers, but until then you can just quote him from here:


In something like rails you have the session around for state between requests. But you can also run a drb (distributed ruby) daemon to do longer tasks in an asyncronous way to increase speed. In effect offload any time consuming tasks to a background daemon and let the htp request return right away thru an xmlhttprequest. Then polling to check the status of jobs. These daemons can be avaiable to all your ruby processes running your application code.

The best way to obtain high throughput in ruby web applications is to add more processes behind a http or fcgi proxy. This is how rails and other frameworks scale. You add more processes to the cluster and they share state through the database or other means like memcached or drb.

...

There is a erb compatible alternative that is 3 times faster then ERB and 10-15% faster then the C eruby and it is written in pure ruby. Its called erubis:

http://www.kuwata-lab.com/erubis/

I also want to mention a project I am working on. Its called Merb mongrel+erb:

http://merb.devjavu.com/
http://svn.devjavu.com/merb/README

Merb is faster lightweight replacement for ActionPack which is the VC layer for the rails MVC. Merb still uses ActiveRecord for database persistence. But it can also use Og or Mongoose(pure ruby db). It is integrated into mongrel for http serving and has its own controller and view abstraction with sessions filters and erb. It is just a lot smaller and closer to the metal then ActionPack. I wrote it mainly to use in conjusnction with rails applications. To have a small merb app stand in for performance sensative portions of an application.

ActionPack is not thread safe and requires a mutex around the entire dispatch to rails. This can cause problems with file uploads. Because each file upload blocks an entire rails app server for the duration of the upload. This means that if you have numerous users uploading large files all at once, you will need an app server instance for each concurrent upload(!). This was one of the original reasons I made merb. It has its own mime parser and does not use cgi.rb or anything else that makes actionpack non thread safe. So it can process many concurrent file uploads or requests at one time in one multi threaded app server mongrel process. Merb does use a mutex for parts of the request that can be calling out to ActiveRecord code because although ActiveRecord is thread safe, it does not perform better then single threaded mode and does cause some other problems. So all of the header and mime parsing is handled in thread safe sections of the code and only uses a mutex for sections of code that call the database. ActionPack has a mutex around all mime body parsing as well as everything else actionpack does to serve one request.

You mention you would rather build most of your own framework to be closer to the metal. But you may want to look at merb and see if you want to work on it with me. I plan on continuing its development and it is being used in heavy production already. Augmenting rails applications for faster response times and file uploads.

...

I also find that Xen virtualization works very well for scaling ruby applications. Scaling ruby apps usually means adding more application servers and maxing out your database servers. Also caching plays an important role as well. Anything that can be cached to static files or even partial caching or using memcached for expensive sections of code can yield big performance gains. Using a number of Xen virtual machines with a shared filesystem like gfs can make it easy to scale your ruby applications pretty much horizontally. You just end up pushing the persistence into the database, memcached or drb and trying to use the "shared nothing" approach for as many portions of the system as you can.

In an application stack like this adding nodes to the app server cluster is easy and gives you very good scalability up or down. Ruby is really a small part of a technology stack like this. There are lots of other places to optimize performance. We have built a custom Gentoo distribution that is tailored to running ruby application at optimal performance in Xen instances. I hope to release this distro as soon as I get some free time to package it up.

Cheers-

-- Ezra Zygmuntowicz
-- Lead Rails Evangelist
-- Engine Yard, Serious Rails Hosting

(Reposted here with Ezra's permission.)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.