Backing up MySQL using EBS snapshots

28 March 2011 - Dallas

For running applications that use MySQL on AWS I highly recommend taking a look at Amazon’s Relational Database Service (RDS). The smallest RDS instance however costs around $80 per month which is prohibitively expensive for small side projects. RDS might also not offer all the control necessary for more complicated database requirements. Not using RDS however comes with the overhead of having to adminster a MySQL installation. Specifically we are going to take a look at backing up a MySQL server on AWS.

In addition to MySQL’s built-in database dump commands, we can also leverage Elastic Block Store (EBS) snapshots in order to create a copy of the entire data volume. Amazon takes care of doing snapshots incrementally by only copying modified blocks. We are going to focus on how best to backup a MySQL database using EBS snapshots in this guide.

Consistent Snapshots

Before we take a snapshot of an EBS drive we need to ensure that we are getting a consistent view of the data at that point in time. In order to achieve this I recommend using the XFS filesystem which allows us to freeze writes to the filesystem. We also need to lock the MySQL database and flush all of its data to disk. Eric Hammond has written an excellent tool called ec2-consistent-snapshot that allows to freeze XFS, flush MySQL to disk and lock it, take an EBS snapshot and restore writes to XFS and MySQL.

Security considerations

We also need to provide ec2-consistent-snapshot with the AWS credentials needed to create the snapshot. Because this tool will be running on the database machine and because it should run automatically in an unattended fashion we need to upload our AWS access keys to the database host. This poses a security risk that we can mitigate by placing adequate access control on the file containing our credentials. We can also take advantage of Amazon’s Identity and Access Management (IAM) to create a sub-account that has more limited permissions. Specifically we can create a sub-account that is only allowed to make the CreateSnapshot API call. We can then place those credentials on the database host. This way if the credentials are compromized the attacker is only able to create snapshots rather than having full access to our account.

Boto

My favorite tool for interacting with Amazon’s API programatically is boto, a Python interface to AWS started by Mitch Garnaat. Boto can be installed using pip, easy_install or the OS’ package manager.

Using IAM policies

I recommend creating groups wih very narrow permissions and then creating users that can be added to one or more groups based on the capabilites needed for that user. We’ll start by creating a group for the create snapshot capability.

import boto

iam = boto.connect_iam(<access key>, <secret key>)
iam.create_group('snapshoters')

We then need to create a group policy granting access to the CreateSnapshot API call. Policies are represented with a JSON fragment that can be generated using Amazon’s AWS Policy Generator to create custom policies. Here is what the policy that allows CreateSnapshot looks like.

{
  "Statement": [
    {
      "Sid": "Stmt3121317131060",
      "Action": [
        "ec2:CreateSnapshot"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

In the past I have had difficulties copying and pasting this JSON fragment into a Python interpreter due to formatting issues. I was able to work around these issues by leveraging the fact that JSON is for the most part also valid Python syntax. This means that we can do the following.

create_snapshot_policy = {
  "Statement": [
    {
      "Sid": "Stmt3121317131060",
      "Action": [
        "ec2:CreateSnapshot"
      ],
      "Effect": "Allow",
      "Resource": "*"
    }
  ]
}

We can then use Python’s built-in JSON library to dump this policy back to JSON when making the API call. Here is how to add this new policy to the ‘snapshoters’ group.

import json

policy_txt = json.dumps(create_snapshot_policy)

iam.put_group_policy('snapshoters', 'CreateSnapshot', policy_txt)

We can then create a new user and add this new user to this group.

iam.create_user('dbbackup')
iam.add_user_to_group('snapshoters', 'dbbackup')

Finally we need to generate access keys for this new user that we can feed to ec2-consistent-snapshot.

iam.create_access_key('dbbackup')

Putting it all together

We can now create the following script at /sbin/backup-database-volume. Replace the values in angle brackets with the access key and secret key from previous step. Also use the id of the EBS vvolumen containing MySQL’s data. This script assumes that this volume is mounted at /vol.

#!/bin/sh

description="mysql-data-`date +%Y-%m-%d-%H-%M-%S`"

ec2-consistent-snapshot \
--aws-access-key-id=<ACCESS-KEY> \
--aws-secret-access-key=<SECRET-KEY> \
--description=$description  \
--mysql --freeze-filesystem='/vol' <VOLUME-ID>

We also need to restrict this file so that it can only be viewed, modified or executed by root. Run the following

$ sudo chown root /sbin/backup-database-volume
$ sudo chmod 700 /sbin/backup-database-volume

Credentials for connecting to MySQL either can be specified by using a .my.cnf or by adding –mysql-username, –mysql-password to the ec2-consistent-snapshot snapshot command. These and other options of ec2-consistent-snapshot can be found by running

$ man ec2-consistent-snapshot