In the last lesson we learned about degree and cardinality and all kinds of vocabulary coolness. We will continue that trend with this article focusing on one word: domain. This word is loaded with all kinds of confusion. The problem lies not in the word itself but its (over)usage in certain areas. Simply put a domain is a boundary that defines something (or contains it if that makes you feel better). So why should you care? Well, the ability to adequately KNOW what a domain is and actually KNOW how to apply its principles is fundamental to your success in the database world.
Okay so what is exactly is a domain then? In DATABASE TERMS a domain can be defined as all the possible values for a given attribute. Do what? Sounds like I’m talking in another language doesn’t it? If you recall from our last lesson, an attribute is just a column. Yep that’s it, just a column. So when I say “all the possible values for a given attribute” what the little voice in your head should say is “oh, he means all the crap that I could put into a column”.
This is where things get funky. Let’s say we have a column called “LastName” (sorry you will never, ever see me use spaces for column names) that will be used to hold all the names of the employees at Bubba Jones’ House of BBQ in Little Rock, Arkansas. So, okay, given the LastName column what do you think the “domain” would be? I know it seems repetitive but let’s go over it again: a domain is all the possible values for a given attribute. Got it? Really? You’re sure? Okay, so what is your answer? (insert Jeopardy theme here). Time’s up. The domain for the LastName column would be any last name on the planet (and any other planets we happen to discover).
Now you may be saying to yourself, “I really would like some pie.” Well go get some then and then let’s talk about this domain business. You see, the THEORETICAL domain is all the possible values for a given attribute. That is where you need to start to truly understand what you are after in terms of information in your columns. Once you have established this then you are ready to start adding CONSTRAINTS to the column definition.
Constraints are exactly what they sound like: things that restrict. In this case, we are specifically talking about the restriction of a column to more precisely define what we want. I think it is safe to say that in most cases you don’t really want every last name ever created. Am I right? (say “yes” or we’ll be here all day). Great! So in order to more accurately reflect the information we are after we slap on a few constraints to the original domain definition. Remember none of this is possible if you haven’t first gone to the level of using the official definition of a domain that we defined already. You MUST start with the theoretical domain then move on from there.
What constraints do you think we should apply to our LastName column? Believe it or not you have applied constraints already in a manner of speaking. When you created the database that holds the tables that holds the columns, you made a decision about the Collation that you wanted. For example, when I installed SQL Server on my machine I chose the “SQL_Latin1_General_CP1_CI_AS” collation. So…what exactly does that mean? Well it means that I chose to use Latin characters for my language. In real terms this means that I will store letters (and numbers, etc…) using “A” through “Z” which is the alphabet many of you use every day. However, you need to understand that I could have just as easily chosen to store Chinese characters or Russian (aka Cyrillic) characters as well.
By virtue of the fact that Latin was chosen as the language that automatically means that you will be restricted to the names that can be written this way. That is to say that I will have no problem with “Smith” but if you want to put in a name where Latin isn’t the base then you will have to translate it into a Latin equivalent. You have seen this many times with people living in the United States who may be from Asia, for example, where the names are typically written out phonetically.
Once we have a handle on the language constraint then we can move on to other key decisions about constraints. For example, how many characters do we want to allow for the LastName? Length is vital as it helps to know how much space is going to be taken up by the database. If I allow LastName to be, say, 20 characters then each row could take up that space. More on this subject in a future post.
In addition to length we also have to decide if we will allow spaces, hyphens, etc… in LastName? Each one of those comes with good and bad points. Take hyphens for example. If I decide to allow hyphenated names then I need to seriously reconsider the length as names with hyphens can be quite long. So just deciding to add something isn’t enough you have to consider the impact of what you are allowing as well. I’m sure you can think of other things that may come into play with other types of columns. Take an “Age” column for example. What about Max and Min age? Should you have a column for age or should you calculate it on the fly? There are many interesting questions that come into play when we consider the domain of a column.
This brings us to “SSN” (Social Security Number). For those that aren’t familiar with this item it is used in the United States to uniquely identify each citizen. What would the domain be for SSN? Really take a second to think this one out…
Okay, ready for the answer? The domain for SSN would be all the possible Social Security Numbers IN THE UNITED STATES. Since SSN only applies to US citizens it is, by its very nature, automatically constrained even at the theoretical level. It may seem like I’m being a little anal retentive here but the distinction is important. Some domains are virtually limitless by nature and some clearly are not. Understanding the domain and then applying critical thinking to the constraints you want to put on that domain is a very, very important fundamental tool for working with databases.
Well that’s all I have for now. In our next installment we will look at the various datatypes and the proper usage of the main ones. See you next time 🙂