MySQL Scaling 101
Several years ago, back when I was a consultant, I prepared this cool demo of Oracle's replication features for a project we were bidding on. Through the magic of redo logs, writes to one server are propagated to the other server. Oracle even had this cool little GUI that showed pending changes and other stats. I was reminded of this episode while reading Cal Henderson's book on building scalable web sites (his slides were interesting so I figured I'd check his book out). He does a nice job of outlining the various replication strategies in chapter 9.
The most basic is the master-slave setup. Writes occur on the master and those changes are replicated to the slaves. Reads can occur on either master or slaves.

master-slave
This scales great for reads, but does nothing for writes. Fortunately, we tend to do more reads than writes.
At some point the overhead of having many slaves becomes problematic for the master. You can then do a tree replication setup, to limit the number of slaves each master has.

tree replication
These setups scale for reads but are not redundant. Failure of the master brings the entire application down. For redundancy, we can have a master-master setup, which each master potentially having slaves.

master-master
Cal also mentions a ring of masters setup, although I'm not sure how it can be used, because a break in the ring cripples the system.

ring of masters
It's important to remember that in any of these replication setups you are scaling only reads. Writes are not scaled, as each node is still required to write.
Another technique used for scaling is partitioning. With vertical partitioning you partition your tables so no cross partition joins occurs (otherwise you'll have to rewrite your SQL).

vertical partitioning
So you went from one big database into many smaller databases, each with a different schema. Eventually, you will hit a limit, as you can't partition a single table further.
With horizontal partitioning, also known as federation, you partition your rows instead of your tables. You go from one big database into many smaller databases, each with identical schemas but different subsets of the data (shards).

federation
There are a hosts of challenges when you horizontally partition your database: inconsistencies, moving data between shards, duplication of data (as you'll probably need to denormalize your data quite a bit). Note that in both cases you scale both reads and writes.
The setups Cal outlined is a good starting point, although some of the information is a little dated (this was written for MySQL 4.x). MySQL 5 has a new NDB engine and some other tricks for scaling, like natively supporting partitioning your data onto different disks. And obviously, other databases have different ways of scaling (see Why Oracle likes to Share), although in the end the basic concepts of replication and partitioning will still be involved.
In any event, his book is not a bad read, although the title is a little misleading. "Scalable" is the key word in the title, but much of the book is not about scalability per se, but just good tips, strategies, pointers.
Comments