I recently wrote a small compiler patch for Rubinius, with massive help from Defiler in #rubinius. My compiler patch seems flawed - I've never done compiler work before - but I've learned just enough to be dangerous. Here's what I know.
The core file I looked at was compiler/bytecode/compiler.rb. The method I was looking at was process_alias(x). If you click the link, you'll see what's going on in there is very simple. The code pops two vars off a stack, and then "adds" a series of small text strings using a method called add. The small strings are instructions.
add appends instructions to a newline-separated string called @output. In the code example linked above, add adds things from the stack and also adds a series of custom instructions. This is a very common idiom in the code for Rubinius' compiler. Very often a process_* method will add two things from its stack, along with a couple custom instructions. I thought the stack might be a call stack, or a context stack - a term I'd glimpsed in a PDF on Smalltalk virtual machines, but not fully understood - but in fact it's a list of s-expressions.
The two things added to @output from the stack in this case are the first two things on the stack, identified as cur and nw. I think that stands for "current" and "next word." The alternate idiom in the code is to name these variables start and fin, and in one place they're name and val. It's a repeating pattern in this code to have a process_* method start by popping two vars off the stack, maybe enough so to justify a refactoring. However, there are plenty of process_* methods that work differently as well.
For a compiler n00b like me, this is all a bit confusing. The good news is that although I still don't know exactly where all this stuff goes further down the line, it is the core information you need to write compiler speces. Rubinius uses a custom RSpec clone. The Rubinius project is essentially on the RSpec bandwagon, but RSpec uses dynamic features unavailable to Rubinius at this time. Obviously when a language implementation is still approaching version 1.0, you're going to be using a subset of the entire language's functionality.
Anyway, if you've played with RSpec, the code will look familiar. Here's a typical sample from spec/compiler/rubinius/compiler_spec.rb. This code really just says that if you compile a given list of s-expressions, you'll get back a given string of instructions. The instructions include some symbolic stuff, but are mostly literal enough that encoder will then translate them into integers (bytecodes) and store them in .rbc files, to be run by Rubinius' bootstrap, the small VM in C which provides the foundation for the more fully-featured VM in Ruby.
This means a compiler change is very easy to spec, because all you have to do is assert that a given list of s-expressions compiles to a given sequence of instructions. (Obviously, it would be a lot harder if you had to debug actual bytecodes.)
Update: thanks to rue and Defiler for corrections.