What's happening is actually weirder than that. The DB lives on a separate machine. The intermittent errors all fail on the absence of a Unix socket. But since the DB lives on a separate machine, the Rails app should never even be looking for a Unix socket in the first place. It should be looking for a TCP socket.
Here's the code in Rails where the error originates:
if (host == nil or host == "localhost") and defined? UNIXSocket then
unix_socket = socket || ENV["MYSQL_UNIX_PORT"] || MYSQL_UNIX_ADDR
sock = UNIXSocket::new(unix_socket) # stack trace points here
@host_info = Error::err(Error::CR_LOCALHOST_CONNECTION)
@unix_socket = unix_socket
sock = TCPSocket::new(host, port||ENV["MYSQL_TCP_PORT"]||(Socket::getservbyname("mysql","tcp") rescue MYSQL_PORT))
@host_info = sprintf Error::err(Error::CR_TCP_CONNECTION), host
(activerecord-1.15.3/lib/active_record/vendor/mysql.rb, lines 105-113)
The value of host, on entering the above if, should be the same as the value specified (for this particular environment) in database.yml. This code pulls from the config file, and it's invoked during the initialize phase of the app, so the only explanation which makes sense to me is that the Rails app is somehow losing the data in the config file. Except I've never seen that happen, nor heard of it happening, and I can't imagine why it would happen. In fact it's difficult to imagine it happening at all.
It's possible there could be some error in the config-file-reading code, but I kind of doubt it. MySQL is the most popular DB for Rails, and I think if this were a Rails problem I would have heard of it before. I think it's some problem in the deployment, but even then, what kind of deployment problem could result in Rails becoming confused about the contents of its own config files? Possibly, if the YAML-reading code assigns a nil to host after reading the value, for some reason, that could explain what's happening, but that also seems highly unlikely.
Update: I thought this might be due to the app being in dev mode, but a direct check on ENV['RAILS_ENV'] reveals a value of 'production'. However, if I change controller code, I see the results of that change immediately, without a mongrel restart, which I do not think should be happening in production mode.
Update 2: the problem is inconsistent via the Web but consistent via script/console.
Update 3: it seems very probable that I've fixed this. It appears to be a dumb load-balancing thing. One stale process with bad data loaded in memory.