Ruby is often the language of choice for startups of smaller applications. It allows you to get an application going quickly. While there has been a lot of talk about NoSQL databases for many applications instead of a SQL database, if your application does well, its functionality will grow, the amount of data you are storing will grow, and you will begin to get more traffic. At some point your monolithic database will begin to have performance problems and you will need to take steps to allow your system to scale. What are your options?
Throw more hardware at it
The first thing that many groups will do is to move their database to a larger machine with a larger processor, more memory etc. If you are using smaller VMs for hosting your database this might be the right thing to do.
Optimize your queries and add indexes
If you are using ActiveRecord to assemble your queries without doing any manual assembly of your SQL, your queries are probably not that efficient. You may also have tables without indexes with queries that are starting to run slowly. This step along with beefing up the hardware you are running your database will buy some places years of scaling before they hit a wall.
Is the architecture preventing us from scaling?
It could be. There are a lot of books on the subject. Some things you might want to look at would include splitting components into their own services, using a different data store for non-transactional data, or restructuring your data to reduce the number of joins.
Replication
You may have tried everything else but still can’t get the performance you need. Or, you may want to re-architect the system but need to buy time while you are breaking things out. If a significant amount of your database traffic is read-centric, read replication can help you handle more traffic.
The diagram below is an example of a read replication setup. One server acts as the master and handles all write traffic. The other three servers are slaves that handle all of the read traffic. You then can put the database cluster behind a load balancer or have certain servers point to certain databases to spread out the traffic. Depending on your system’s architecture you might be able to set this up without needing to use any separate gems to enable this. We have one client that currently is doing this.
Sharding
If you have an application where the majority of your database traffic is transactional, then at some point your only option will be to shard your database. Sharding is a technique where you split the data into different databases to spread out the load. This also reduces the amount of data that you have in each table resulting in smaller indexes. The diagram below shows a typical sharding scenario.
Let’s say that we have a database for an alarm system that has a table for users and another one for actions that gets a new record for each action that the user performs on the system (arm, disarm etc). The user table has about 50 million records, but the action table has grown to 5 billion records. We decide to shard based on the user so that actions for a particular user all live on one shard. We write a script to create three new shards, add a table to our main database to indicate which shard the users data is on, and then we migrate the data for all of our existing users by moving them to various shards and persisting a value to tell the system where to find the data. There are other strategies for doing this including automatic ones with crazy directory structures etc. If you are using Oracle products there are also offerings that take care of this in the database so that you don’t have to worry about this in your application.
How do I configure sharding with ActiveRecord and MySQL?
This is where the DBCharmer gem comes in. The DBCharmer gem provides extra configuration options and directives to make sharding and replication easy to integrate into your application.
First you install the gem:
gem install db-charmer
Then you add an extra level or nesting to your database.yml file:
production:
adapter: mysql
username: admin
password: password
database: my-db-master
shard1:
adapter: mysql
username: admin
password: password
database: my-db-shard-1
shard2:
adapter: mysql
username: admin
password: password
database: my-db-shard-2
Now let’s say that we have an Action model like:
class Action < ActiveRecord::model
end
When we first start to run queries against this model they will go against the master database. But by using the switch_connection_to method we can use one of our shards. This is on a per thread basis, so a new thread will default back to the master database.
Action.switch_connection_to :shard1
Action.find 1
You can also switch the connection in the scope of a block:
Action.on_slave1 do
Action.where(:name => 'ARM')
end
You can even switch to a connection for an individual command:
Action.on_shard1.where(:name => "ARM")
For migrations it provides directives to allow you to run migrations on one or all of your shards. Below is an example using db_magic to write a migration for the shards:
class MultiDbTest < ActiveRecord::Migration db_magic :connection => [:shard1, :shard2]
def self.up
create_table :actions, :force => true do |t|
t.string :type
t.timestamps
end
end
def self.down
drop_table :actions
end
end
You can also use the on_db method to have fine grained control on various databases in one migration:
class MultiDbTest > ActiveRecord::Migration
def self.up
on_db :shard1, :shard2 do
create_table :actions, :force => true do |t|
t.string :type
t.timestamps
end
end
on_db :master do
create_table :users, :force => true do |t|
t.string :user_id
t.timestamps
end
end
end
def self.down
on_db :shard1, :shard2 {drop_table :actions}
on_db :master {drop_table :users}
end
end
There are also directives to automatically switch to slaves for reads and the master for writes. For example:
class Action < ActiveRecord::Base db_magic :connection => :master, :slaves => [:slave1, :slave2]
end
DBCharmer has built in support for range, hash_map, db_block_map and db_block_group_map strategies along with the ability to create custom sharding methods.
DBCharmer tips
While DBCharmer is designed to be used on large systems, it is not fully thread safe. So if you are using it with Goliath or in JRuby you may have some issues. Also, if you switch connections during unit tests you will lose the ability to do a transactional rollback for that test. The takeaway is to clean up after yourself with those unit tests. Also, DBCharmer only supports MySQL.
What about PostgreSQL?
If you are using PostgreSQL or a modern version of Ruby with MySQL, the Octopus gem is worth a look. It offers many of the features that DBCharmer does. The last project we sharded was still on Ruby 1.8.7 which is not supported by Octopus but was by DBCharmer. While your mileage may vary, we had no problems with our system when we sharded it using DBCharmer. The glue code to switch shards was easy to implement and things “just worked” in production.
Leave a Reply