SQL JOINing a Table to Itself

Getting two sets of information from one table in a select statement often leads people to write subselects, but it really doesn't matter that this is the same table twice, we can just give it a new alias and treat it as if it were a different table. This is one of those techniques where, once you've seen it, it's really obvious, but until that point it can be very confusing. I explained this to someone else recently, so I thought I'd capture it here in case it's helpful to anyone else.

Consider that tried-and-tested example: employees and managers. Here's the staff table from the database (today's imaginary data isn't particularly imaginative, sorry):

mysql> select * from staff;
+----+------------+-----------+------------+
| id | first_name | last_name | manager_id |
+----+------------+-----------+------------+
|  1 | Hattie     | Hopkins   |          4 |
|  2 | Henry      | Hopkins   |          4 |
|  3 | Harry      | Hopkins   |          5 |
|  4 | Helen      | Hopkins   |       NULL |
|  5 | Heidi      | Hopkins   |          4 |
|  6 | Hazel      | Hopkins   |          1 |
+----+------------+-----------+------------+
6 rows in set (0.00 sec)

Continue reading

Inner vs Outer Joins on a Many-To-Many Relationship

Someone will probably tell me that this is an elementary-level topic, but I got some good questions regarding joins from my most recent ZCE class students, so I thought I'd put down the examples that I used to explain this to them. Being able to join with confidence is a key skill, because it means that you can refactor and normalise your data, without worrying about how hard something will be to retrieve.
Continue reading

Importing and Exporting MongoDB Databases

I'm enjoying working with MongoDB but as with any new technology, it can take a little while to find your way around all the tools related to that stack. In particular, I found myself wondering how do I mysqldump for mongodb?

It should have come as no surprise that the command I wanted was called mongodump, really! Continue reading

Explaining MySQL's EXPLAIN

The MySQL explain plan is a great tool to help developers and database administrators to improve the performance of specific queries happening against a database. It is very easy to use, but its output can be confusing, so I thought I'd show a very simple example. Continue reading

Simple CRUD with MongoDB

When I meet a new technology, I like to experience it "just as it comes". I'm happy at the command line and I like to type actual commands and see man pages before I use any wrappers or helper tools. So when I met MongoDB for the first time, I did exactly that. This post shows those first steps of creating a database, and inserting, reading, deleting and updating data.
Continue reading

Script for Database Patching at Deploy Time

I've written before about a simple way of patching database versions and there's a much more comprehensive article from Harrie on TechPortal as well. I often find though that projects with patching strategies are missing the scripts to apply these automatically when the code is deployed, so I thought I'd share mine.

My current project (BiteStats, a simple report of your google analytics data) uses a basic system where there are numbered patches, and a patch_history table with a row for every patch that was run, showing the version number and a timestamp. When I deploy the code to production, I have a script that runs automatically to apply the patches.

Continue reading

Is Enum Evil?

When I work on database designs, either on my own projects or as advisor to others, I often find people very reluctant to use an enum type for any columns. Now, I'm not about to advocate the gratuitous use of an enum column, they definitely have some pitfalls, but I think it is important to understand these rather than just shouting "enum evil!" if anyone should mention them.

There are cases where an enum is the correct choice for a particular type of data, so let's look at what an enum type actually is and does.

Continue reading

Indexes on Tables

Increasingly I find a very binary split between the professionals I come into contact with. One group of people are very database-aware and take the design of their storage quite seriously - with good results. The other group are more concerned with the functionality of their application, and have little regard for how it is stored other than considering it a keeping-place and making useful table and column names.

Too often though, they don't think about how that data will be retrieved or what the implications are when it gets beyond the thousand records that were used for testing. This is where having an idea of how the data will be retrieved can really help application performance. (note: this article is aimed at users of traditional relational databases, and ignores all other possibilities). This post takes a look at the various index types and when to use them. Continue reading