A first look at SQL Server 2008 (part 3 of 3)

In this last part of the “SQL Server 2008 first look” series I will set the focus on the new SQL Server 2008 data types. If you missed the first 2 parts you can find them here: part1 & part2.

Let’s start by a really interesting feature, which is something developers would have been having since a long time. This new feature is known as SPARSE Column. But what are SPARSE columns? When can they be used?

SPARSE column are really useful in all those scenarios where you have to work with semi-structured data. Let’s take as an example a product. A product can have different attributes/properties depending on the type of the product itself. If our product is an mp3 player, the typical attributes are probably “capacity” or “battery duration”. If our product is a pair of pant, then probably you have others attributes like “waist size”, or “length”. Now the problem is how you can model a product entity inside a database. One possible solution would be to have one table for each single product’s typology; another approach would be to create a single table for all types of products. The first approach is probably not really manageable, while with the second one each time we are inserting a record into our big table, there would be a big amount of NULL values for all those columns representing a property that is not part of the specific inserted product. Of course this second approach is not really efficient, because NULL values are also using space.

With the SPARSE columns we can archive the data in a more efficient way. How does that work? A SPARSE column is just a column with the new SPARSE attribute (it’s not a new type!). You can therefore declare all columns, which are specific to a product’s type, with the new SPARSE attribute. The particularity of such a declaration is that NULL values stored inside columns decorated with the SPARSE attribute are not going to use any space. That’s the reason why SPARSE columns allow us to archive data in a more efficient way.

CREATE TABLE Product(

Id int,
Type nvarchar(16),
Price decimal(10),
Capacity nvarchar(8) SPARSE,
BatteryDurationnvarchar(8) SPARSE,
WaistSize int SPARSE

)

What’s we still need to solve are all the typical open schema problems. What do I mean with that? Each time you need to add a new product’s typology we need to add a certain number of columns. From a SQL point of view this is absolutely not a problem; you can just execute the following command: ALTER TABLE ADD COLUMN_NAME …. Problems arise at application level; in fact, each time we add a column, we then have to consider and manage this new column inside our application. Of course it is possible to build some kind of dynamic application, but this is not always so easy. How does SQL Server 2008 help us in that area? In SQL Server 2005 the typical approach to manage these “dynamic” properties was to put all these columns inside a single column of type XML. With SQL Server 2008 Microsoft did something very similar and introduced a new attribute, the SPARSE COLUMN_SET attribute, which can be added to the declaration of an XML type’s column. What are the benefits? As you can see from the following sample, by adding this new attribute, you can define SPARSE columns (gaining in storage space) and at the same time retrieve all “dynamic” information (in form of an XML) by querying the column of type XML. What’s interesting is that this column is a calculated column and therefore is not persisted on the database. Of course the same flexibility can be found in case you are inserting information.

-- Create table with sparse properties and column_set

CREATE TABLE Product(

Id int,
Type nvarchar(16),
Price decimal(10),
Capacity nvarchar(8) SPARSE,
BatteryDuration nvarchar(8) SPARSE,
WaistSize int SPARSE,
Length int SPARSE,
ProductProperties XML COLUMN_SET FOR ALL_SPARSE_COLUMNS

)

-- Generic XML access to properties through sparse_column_set
SELECT Id, Type, ProductProperties FROM Products

-- Returns an aggregated XML fragment for all non-null sparse columns
101, Playstation, <Capacity>20 GB</Capacity><BatteryDuration>3x</BatteryDuration>
5001, Pant, <WaistSize>32</WaistSize><Length>32</Length>

-- Insert values
INSERT INTO Product(id, Type, Price, ProductProperties)
VALUES (5003, 'Zune', 200,'<Capacity>40 GB</Capacity><BatteryDuration>12 h</BatteryDuration>')

It’s now time to have a look to some of the new data types have been introduced in SQL Server 2008. The first good news is related to the native support for date and time. There are in fact 2 new data types which allow us to store separately date and time. Think about how you write today queries which contain WHERE conditions on a field of type datetime; you always need to consider that DATETIME values contains also time information and therefore you have to be careful in writing the query, otherwise your selection could also forget some records. With this new available option everything should be really simple to manage.

Here some more information on these new data types:

· DATE

o Date onlyLarge date range from 0001-01-01 to 9999-12-31

o SQL standard type

o 3 bytes fixed storage size

· TIME(n)

o Time only

o Optional user specifiable fractional precision

§ up to 100 nanoseconds (default)

o Format

§ HH:MI:SS[.nnnnnnn]

o 3 to 5 bytes variable storage size

There is also a new DATETIME named DATETIME2, which is a wider DATETIME because of the fact that can store datetime until 100 nanoseconds. DATETIMEOFFSET is another new type which at the end is just a DATETIME2 plus information related to the TimeZone. As you can see from the following additional information the time zone ranges between -14:00 and 14:00. Why are minutes included? Because there are time zone that differ also from just 30 minutes (I also just discovered it quite a couple of years ago when I was for holiday in Australia J):

· DATETIME2(n)

o Large date range from 0001-01-01 to 9999-12-31

o Optional user specifiable fractional precision

§ up to 100 nanoseconds (default)

o 3 to 8 bytes variable storage size

o Time-zone not aware

· DATETIMEOFFSET(n)

o Contains date, time and time zone offset

o Large date range from 0001-01-01 to 9999-12-31

o Optional user specifiable fractional precision

§ up to 100 nanoseconds (default)

o Format

§ YYYY-MM-DD HH:MI:SS.[.nnnnnnn][+|-]HH:MI

§ time zone offset ranges from -14:00 to 14:00

o SQL standard type equivalent

§ TIMESTAMP WITH TIME ZONE

o 3 to 8 bytes variable storage size

Another new and really interesting attribute is the FileStream attribute. Think about an application that stores documents. Today to do that you can either create a BLOB field and store your document in binary format or create a simple VARCHAR field and save just the path to the file. The advantage of the first solution is that when you do the backup of your database you are also doing the backup of your documents. Things are a little bit more complicated if you need to access those documents inside your application (you are working with binary values and you don’t have a simple path to files). The ideal solution should be something that takes the best of both mentioned today solutions. The Filestream attribute does exactly that: by applying this attribute to a VARBINARY field, data or better files are not stored in SQL but on the file system. What does it mean? That you don’t have the typical limitation of a VARBINARY(MAX) field of 2GB, the only limitation you have is given by your hard disk capacity. Of course when you are using the filestream attribute, in case of backup you are also doing the backup of your files and you also have a bunch of API that helps you a lot to manage files stored in this new way.

Another interesting feature is filtered index, which allow you to define a where condition (of course with some limitation) on an index. Let’s take as example an archive table with millions of records and let’s suppose that this table is just used to generate reports that include information about the last 12 months. Wouldn’t be nice if you could define a filtered index on the datetime field touching only records which are not older than 12 months and therefore not having to create the index to the whole table? Of course yes.

Last feature that I just want to mention is Performance studio, which is an out-of-the box solution to collect and analyze performance information of all SQL Server instances in a kind of mini datawarehouse.

We are now to the end of this 3 part blog post. I hope I gave you an idea of what is included in the new SQL Server 2008. Of course there is much more, like the new spatial data types or the new HierarchyID data type. So, if you need more information or you just want to go deeper, then navigate to the following SQL server portal https://www.microsoft.com/sql/2008.

Hope it helps,

Ken