SQL JOINing a Table to Itself

Getting two sets of information from one table in a select statement often leads people to write subselects, but it really doesn't matter that this is the same table twice, we can just give it a new alias and treat it as if it were a different table. This is one of those techniques where, once you've seen it, it's really obvious, but until that point it can be very confusing. I explained this to someone else recently, so I thought I'd capture it here in case it's helpful to anyone else.

Consider that tried-and-tested example: employees and managers. Here's the staff table from the database (today's imaginary data isn't particularly imaginative, sorry):

mysql> select * from staff;
+----+------------+-----------+------------+
| id | first_name | last_name | manager_id |
+----+------------+-----------+------------+
|  1 | Hattie     | Hopkins   |          4 |
|  2 | Henry      | Hopkins   |          4 |
|  3 | Harry      | Hopkins   |          5 |
|  4 | Helen      | Hopkins   |       NULL |
|  5 | Heidi      | Hopkins   |          4 |
|  6 | Hazel      | Hopkins   |          1 |
+----+------------+-----------+------------+
6 rows in set (0.00 sec)


In order to see who reports to whom, we need to query the table to get the employees, and again to get the managers' names. MySQL will get horribly confused if we use the same table twice, so we alias it each time to indicate which one we mean. I do this by drawing the picture of which tables I need and labelling them - here I've used "e" for employees and "m" for managers:

mysql> select e.first_name as employee, m.first_name as reports_to
    -> from staff e
    -> inner join staff m on e.manager_id = m.id;
+----------+------------+
| employee | reports_to |
+----------+------------+
| Hattie   | Helen      |
| Henry    | Helen      |
| Harry    | Heidi      |
| Heidi    | Helen      |
| Hazel    | Hattie     |
+----------+------------+
5 rows in set (0.02 sec)

Did you spot that there's one row less in this output? That's because we did an inner join, but Helen has no manager since she owns the company, so she doesn't appear in the second set of results. We could easily have left joined instead to include her if we wanted to.

You can use this technique regardless of how many other tables are needed for the query and whether you need to join the table to itself or onto other places. You might have a table of users, and a table of user_friends, linking one user to another. You can start from the users table, join to user_friends, then join to user again with a different alias to pull back the information you need - the possiblities are almost endless.

7 thoughts on “SQL JOINing a Table to Itself

  1. This particular example is really a hierarchy, and does not lend itself well to unknown depths. What if the company grows, and you need to add another layer of management? You'd have to go in and change all your queries that access the hierarchy.

    I've been struggling with this problem for goat breeding. In this case, you have two hierarchies -- one for each animal's sire (father) and one for the dam (mother). I have found no practical solution in SQL. Although my dataset is small enough that performing multiple queries using recursion in a procedural language like PHP is possible, it grates on my aesthetic sense.

    One possible solution I've been playing with is the OQGRAPH storage engine, which can be used as a plugin for MariaDB or MySQL 5.5. It isn't really a storage engine per se, but rather, a computational engine, specialized for graph tracing.

    In operation, OQGRAPH has exactly one allowable schema, which can be used any number of times (with different table names, of course):

    [code]
    CREATE TABLE db.tblname (
        latch   SMALLINT  UNSIGNED NULL,
        origid  BIGINT    UNSIGNED NULL,
        destid  BIGINT    UNSIGNED NULL,
        weight  DOUBLE    NULL,
        seq     BIGINT    UNSIGNED NULL,
        linkid  BIGINT    UNSIGNED NULL,
        KEY (latch, origid, destid) USING HASH,
        KEY (latch, destid, origid) USING HASH
      ) ENGINE=OQGRAPH;
    [/code]

    In graph theory terms, each OQGRAPH row describes an edge in a directed graph, with "origid" and "destid" being the primary keys of connected nodes. You populate this schema using INSERT… SELECT on the table with the self-join relationship, then query the OQGRAPH table by joining it back to your original table.

    They have an example based on the Tree Of Life, where you can (for example), show the taxonomic path between humans and a banana. It seems to be lightning-fast, with no arbitrary depth-of-query limit. But I have not been able to bend OQGraph to my will in solving my rather simple goat breeding problem.

    LornaJane, have you any experience with OQGRAPH for such things?

Leave a Reply

Please use [code] and [/code] around any source code you wish to share.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>