When I work on database designs, either on my own projects or as advisor to others, I often find people very reluctant to use an enum type for any columns. Now, I'm not about to advocate the gratuitous use of an enum column, they definitely have some pitfalls, but I think it is important to understand these rather than just shouting "enum evil!" if anyone should mention them.
There are cases where an enum is the correct choice for a particular type of data, so let's look at what an enum type actually is and does.
Enum Type Intentions
An enum column type is a column which allows only the specified data values. This means that if a particular column can only take certain values, for example 'art', 'music' or 'sport' then you can specify this when you create the table:
CREATE TABLE `teachers` ( `id` INT(11) PRIMARY KEY AUTO_INCREMENT, `name` VARCHAR(50) DEFAULT NULL, `specialism` enum('art','music','sport') DEFAULT NULL )
This means we can only add data into the specialism column which is exactly equal to one of those values. This makes these columns ideal for things that never change. However in my experience, never is relative!
Changing Possible Values of Enum
In order to change the allowed values of an enum column, we need to issue an alter table statement - and this is where it gets messy. Alter table actually creates a new table matching the new structure, copies all the data across, and then renames the new table to the right name. During this process, which takes longer in proportion to the size of the data, the table is locked. Completely. So your whole application is going to have to wait for that to finish before anything else can even read from the table! This is why alter table statements are always a bad idea on production.
So, if your enum column's allowed values could ever change, or if your data set is going to be any kind of large, or you're going to experience traffic on any scale, then the enum isn't the right data type. As you can see, this eliminates most use cases and is the reason why enum is usually avoided.
Alternative to Enum: Lookup Tables
In situations like the teacher table shown above, it is much better to move the data into a simple lookup table; now we need two tables, teachers and specialisms, and they look like this:
CREATE TABLE `teachers` ( `id` INT(11) PRIMARY KEY AUTO_INCREMENT, `name` VARCHAR(50) DEFAULT NULL, `specialism_id` INT(11) NOT NULL ); CREATE TABLE `specialisms` ( `specialism_id` INT(11) PRIMARY KEY AUTO_INCREMENT, `name` VARCHAR(255) DEFAULT NULL );
With this setup, we are much more able to handle the case where "never" actually turns out to happen. In the event that the data does need to change, we alter the content of the database but not the structure, which is much better practice and doesn't lead to potentially long table-level locks on our production database.
Does anyone have any further approaches for avoiding enums? Or situations where they are a great choice? Leave me a comment, I'm interested to hear the experiences of others on this topic.