Monday, April 9, 2007

In-Memory DBs

This post is more about things that I wonder than things that I know.

The whole idea behind the Google Filesystem is that hard drive access was Google's major architectural bottleneck when developing their search functionality and optimizing it for speed. So they said, well, we can make it a lot faster if we put the whole thing in RAM. And then they did.

The GFS is actually a much more powerful competitive advantage than either Google search or Google ads. That's why Google spends so much on research. The GFS is the next operating system, and the applications that will make the best use of it haven't even been written yet.

One simplistic way to look at the Google Filesystem is as a gigantic in-memory database. This prompts an obvious question. Will databases adopt this architecture?

Performance is key for databases. Hard drive access is a bottleneck. Database-backed apps are much, much more widespread than they were in the days when the current, conventional database architecture was initially developed.

I think the answer is almost guaranteed to be yes.

5 comments:

  1. I was under the impression that GFS was stored on the disk at the chunk-server level, while the meta data lived in memory on the master.

    So it's not entirely in memory, correct? I may be missing something...

    ReplyDelete
  2. I'm not sure. This post's just industry sci-fi. Any doubts are definitely worth considering. That being said, I had a big Google phase a year ago where I was reading all about it and hard drive access is definitely mentioned as a bottleneck in a relevant white paper, but I'm not sure which one.

    The one I linked to describes two types of reads, small random reads and large streaming reads. It also mentions that the design priority is high bandwidth, rather than low latency. An individual slow read is acceptable; however the system has to be fast overall. I take this to mean that the permanent store is on hard drives but a lot more of the filesystem is in memory at any given time than would generally be considered normal.

    Probably an in-memory DB would look very similar to that; the permanent store's on the hard drive, but that's essentially just for fault tolerance, and all the data which would be considered meaningfully active would be loaded into memory. Data at the border of active and inactive would probably load into memory and then lapse out of it, depending on the demand for it at any given time.

    ReplyDelete
  3. That makes plenty of sense. With cheap access to 64-bit systems, an enormous amount of data can now be resident.

    That's a pretty amazing thing to think about.

    By the way, I like the new color scheme. Getting better all the time (anything beats the black! ^-^).

    ReplyDelete
  4. I think solid state drives will push everyone into this direction (reportedly 276 times faster than regular hard drives).

    http://www.dbazine.com/oracle/or-articles/ault6

    ReplyDelete
  5. Heh. Prevayler already does this, as does HAppS. No database required in either of them.

    ZFS does crazy-big in-memory cacheing to effectively mean that the drives are limited by bandwidth and memory not read time.

    J

    ReplyDelete

Note: Only a member of this blog may post a comment.