How Normal(ized) Should You Be?

Normalization is the process of removing repeating values from your data design, and ensuring that the values depend on the Primary Key in the table. OK, it’s a bit more complicated than that, but this definition will do for this post.

So how far DO you normalize? Consider the following:

Name, Address, City, State, Zip.

You might normalize it to the following:

FirstName, MiddleName, LastName, AddressLine1, AddressLine2, CityOrMunicipality, StateOrProvidence, PostalCode.

That seems pretty well normalized, and ready for a Primary Key. But let’s drill in a bit – the StateOrProvidence column is going to repeat, here in the U.S., quite a bit. Perhaps a Foreign Key is needed there, and another table created with WA, FL, GA, etc?

You could do that. But in most cases you’re too normalized if you take that route. In the case of two-letter state names, you’re not saving much space and there are other ways to enforce the data integrity of that column.

You must consider the reason you’re normalizing the table, and ensure your design fits that requirement. In most cases, it’s enough to tell the developers to code a drop-down list of the acceptable values, and/or put a constraint on the column to enforce what you want.

So apparently you can be too normal(ized). At least in this DBA’s experience.

Comments (2)

  1. amosfivesix says:

    We use a two letter code for the StateOrProvince, but it does link to another table that has the full name, which we do need to use now and then. Seems like it’s not a bad idea.

    Where I find it a pain is that we’ve broken down phone numbers into country code, area code, number, extension. Lots of countries don’t really have "area codes" plus dialing prefixes get all confusing, and US layouts (parans around area code, or use dashes) differs from most European layouts (just spaces), so I need lots of logic in my queries to build a nice display-able phone number.

  2. Hello,

    Say we have a table with all the states of USA, and this table has a primary key which type is TINYINT (1 byte). It’s shorter than the two letters identifying each state (2 bytes), and if I have thousands to millions of clients, the cost of these two letters will be greater than a byte to store in the clients table, added to the fact that the database engine goes faster when comparing integer values instead of character values (performance can also be hurted by the collation) …