Simple Database Patching Strategy

One problem that seems to need solving too often is how to keep databases in sync across different platforms, different developers working on a project, and deploying database changes along with code changes. There are lots of ways of approaching this, none of them are really excellent however and personally I tend to err on the side of simple being better. Fewer dependencies means a solution more likely to work on every platform (and no additional complications for the live platform). Usually this means patch files of some kind - here's an outline of my usual approach. For the impatient:

  • add a table for meta data to the database, set a database version parameter to 0
  • export structure and any essential data (lookup tables, etc) into an sql file
  • for any changes, create a numbered patch file e.g. patch-1.sql, including the change and an update statement to bring the version meta data to match the patch file
  • store all of the above in source control
  • for bonus points, create another sql file with some nice sample data

Give the Database Some Version Awareness

Naming is completely changeable but mine looks something like this:

CREATE TABLE `meta_data` (entry VARCHAR(255) PRIMARY KEY, VALUE VARCHAR(255));
 
INSERT INTO `meta_data` SET entry="version", VALUE="0";

This new row will hold information about what patch version the database is up to. Every patch that is created will update this number so that it is possible to tell which patches have and have not been applied.

Create an Initial Data Structure

Firstly create a database directory for all these files to live in. This should be outside your web root but inside your source control project.

Take your database and dump just the structure of the tables using the --no-data switch to mysqldump like this:

mysqldump -u <username> -p <database> --no-data > create.sql

You will also want to populate tables which hold things like lookup values, country lists, that sort of thing, so that these are set up. People starting to develop with this project, or if the application needs to be deployed to somewhere new, can use this file as a starting point.

Create Patches for Changes

This is where the discipline element comes in - no direct edits on the database are allowed! Instead, write the SQL for your change and place it in the patch file, then run it against the database. If that sounds too much like hard work then copy/paste the SQL you use to make changes, or the SQL generated by whatever SQL tool you use, and place it in the file.

Every file must have its own incrementing version number in its file name, e.g. patch-1.sql, patch-2.sql etc. Within the file the version must also be updated to match, with a statement like:

UPDATE `meta_data` SET VALUE="1" WHERE entry = "version";

Recommended Practice

Here are a few pointers on getting the most out of something like this:

  • Under no circumstances is it acceptable to edit a patch file that has been committed to source control. Someone might have run it already and you'll confuse everyone completely.
  • Having matching undo scripts alongside the patches can be really useful in case a change gets deployed and needs to be undone.
  • Make a script to look up the database settings in the config file of your application, query the database for what version it is at, and run any outstanding scripts. This makes life much easier especially if you have large numbers of scripts (I've seen the patch numbers hit the hundreds)
  • A possible adaptation of this approach is to create patch files for use for updating a database, but to also update the install.sql file to make it correct at any point in time, this means a much shorter and easier setup time for new deployements/developers. The version awareness works in the same way regardless
  • Creating a sample database which creates a few records in each table can really help for development purposes - its quicker for people to get set up and attain a working system that they can make progress with.

I'm sure there are many improvements and variations on this theme of simple database patching, leave a comment and let me know what works for you and why!

16 thoughts on “Simple Database Patching Strategy

  1. I would suggest adding a patch history instead of one row for your version.

    Just do a 'filename', 'version', 'timestamp' meta_data table.
    That way you also know which file was executed when and what version they upgraded the database to.

    The current database version is then the MAX(version)

  2. Nice post. Like it much, really. My projects are usually so rushed that no one takes the time to set up any versioning whatsoever.

    One exception: CuddleFish CMS. It is Zend Framework based and has a dedicated DbController that has an action for every db update ever applied.

    The advantage? You can use the regular CMS GUI to update the database by clicking a friendly name in a list of actions. From the same interface you are one click away from an install script for the database.

    But then, this is highly dedicated. I like your approach very, very much. Anyone who is not using a database versioning system yet should give it a try.

  3. This seems a good approach when you own all the code and the database. However, there are many times when you don't completely own one or both of these - eg. linking into a corporate database that the webdevs can't change or using a CMS that is mostly written by other people. In those situations it becomes really difficult to do decent versioning, and I've yet to see a good way of handling Drupal databases in version control.

    • Alice: so true, and I've seen several scary ways of managing database changes on drupal, mostly involving mysql proxy! All the dynamic apps have the same issues and none seem to solve them in a good way.

  4. I use something similar, although I bundle it all into an 'upgrade.php' script which contains all the sql changes and updates the meta_data like table for you; as the 'admin' you just need to hit 'upgrade.php' every once in a while and off it goes.

    As an example, see http://postfixadmin.svn.sourceforge.net/viewvc/postfixadmin/trunk/upgrade.php?revision=794&view=markup

    the only downside to my approach is that it doesn't (yet) support reverting to an old configuration - something I've never found to be necessary so far.

    I think the postfixadmin one takes into account the svn revision number; while on other projects I just use a sequential number.

  5. Take a look at
    http://sourceforge.net/projects/liquibase/

    I used this in the way Lorna Jane mentions, for DDL changes, only. This also runs nicely inside a Phing script.

    This does not allow rollbacks, but does allow roll-forwards from the beginning.

    Perhaps it is not unreasonable to dump data as XML or CSV, especially if all tables have timestamps.

    I like the idea of example data, which can be used in PhpUnit testing

  6. Thanks to everyone who is leaving comments, I *knew* there would be some great suggestions!

    Harro: That's a nice extension to this method, thanks for adding it.

    Bart: I haven't used CuddleFish but after your mention I will look out for it in future.

    David: Thanks for the example links, those are great exaamples of this type of approach in use :)

    Bruce and Mark: some great additional resources listed there - cheers!

  7. Hi,

    Good overview of a very simple scheme. I like it :).

    In my day job, we use a particular ORM a lot, which does the same kind of thing wrapped in a little DSL - and from it, I'd suggest the following:

    First, especially for /big/ projects in a distributed VCS, use timestamps instead of sequential numbers. It really reduces the potential pain at the cost of looking slightly uglier.

    Second, if you want to get a bit fancy, have a 'schema_versions' (or similarly named) table with a 'version' column. Every time you apply a patch, insert a new row into that table with the version ID/timestamp. This lets you have a stable deployment with migrations 1,2,3 and a staging one with 1,2,3,4,5,then backport a bugfix that relies on migration 5 but not 4 all by itself, then do a migration with 4,6,7,... when the time comes to upgrade stable by a major version.

    It also lets you have 'up' and 'down' migrations - so you can apply and unapply each one individually if you want to - but I've never found that to be particularly useful. But then, I have a bad habit of doing data migrations at the same time as the schema migrations... ;)

  8. I've written before about a simple way of patching database versions and there's a much more comprehensive article from Harrie on TechPortal as well. I often find though that projects with patching strategies are missing the scripts to apply these automa

  9. I personally don't think multiple people should change the database schema (if you have multiple devs, only one should be in charge of data modelling / DBA), and my way of handling database changes is to do whatever you want on the dev database, and auto-patch the prod database based on schema changes.

    That being said, if you're going to do something like your approach, I find it's important to make your patches idempotent (i.e. running them twice results in exactly the same thing as running them once), and use the information.schema as much as you can to avoid duplication and misinformation.

Leave a Reply

Please use [code] and [/code] around any source code you wish to share.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>