SQL Server 2008 Top 10 List for Developers

Well now that SQL Server 2008 has shipped, Zach and I are starting to put the finishing touches on all of our SQL Server 2008 early adopter initiatives and move on to planning for future releases.  Before we do that I have some unfinished business to take care of.

My boss, Mauricio, has been on me for months asking me to list the top 10 features of SQL Server 2008.  I have been stalling.  I think every single feature in the product is compelling, and inevitably I am going to leave something out and will hear about it later from the team that built it.  But with PDC '08 just around the corner, it's now or never.

So let me qualify my post by saying that this list represents the 10 things I talk to developers about when I discuss SQL Server 2008.  Often I don't have the luxury of spending all day (which I could) having this discussion, so I have to pick a few things that I think will stick.    

  1. Location Awareness via Spatial Technologies: The introduction of the .NET CLR types for Geography and Geometry enable a whole new class of location-aware database applications.  These types are first-class citizens in SQL2K8, and offer direct manipulation in Transact-SQL, programmatic manipulation in .NET, as well as a host of flexible indexing options.  See my post here. Note that this post is a bit dated, points are now expressed in lat/long order in the final release.  While this is a version 1.0 implementation, it feels much more mature, and we already have several customers and partners implementing production solutions.
  2. Improved BLOB Support via FILESTREAM: Finally, a high performance streaming option for BLOBs (binary large objects) in SQL Server. This is exposed as an attribute on varbinary(max), and uses NTFS to persist the BLOBs in chunks rather than cluttering up your precious data and index pages, SQL Server takes care of the garbage collection.  It's fully integrated from a transactional and management perspective and supports backup/restore with your database data.  Getting at the data programmatically requires some rethinking.  If you are familiar with Win32 file streaming API's (which are the highest performance way to access BLOB data in Windows) you will have a head start.  Native code developers will find new FILESTREAM API's in SNAC (SQL Native Access Client), and managed code developers will find a new SqlFileStream managed class in Visual Studio 2008 SP1. See my post here. Note that this post is a bit dated as well, the managed code api's were not yet available. 
  3. Improved Date and Time Types: Quentin Clark told me this was the single largest engineering effort for SQL Server 2008.  I started thinking about it and realized how pervasive a change to a data type is, from the storage engine, to the query processor, to data programmability API's, to backward compatibility.  I have to say I'm glad the SQL team decided to take the hit, these new types have already come to the rescue in a couple of projects I've worked on.  DateTime2 is the "new and improved" DateTime, with up to seven digits of accuracy (100 nanoseconds) and improved range with no goofy start date in the 1700's.  Alas the last of the Sybase holdovers appears to be slipping away!!  DateTimeOffset is the new "timezone aware" version of DateTime2, allowing you to save a timezone offset value along with your DateTime2 value.  Now datetime calculations can include values from different timezones.  Tired or storing that extra baggage if you just want to store a date or a time?  Enter the Date and Time types.  There are definitely some nuances to using these new types and converting systems from the old types to the new types.  You should manipulate them programmatically with care, remember that these new types have increased precision and you don't want to lose data during a conversation operation.  Also, from a .NET data programmability perspective, you'll want to use the standard System.DateTime and System.DateTimeOffset (introduced in the .NET Framework 3.0)  types to host these values, as the old .NET System.Data.SqlDateTime type only handles old DateTime values.  If you need to support null values, you'll need to implement them as .NET Nullable types.
  4. CLR Everywhere: While not exactly new for 2008, support for managed code in all of the core services has improved significantly.  So much so that the SQL team itself is starting to build core capabilities using managed code, Spatial Types are just one example of this.  On a recent project, I was able to take some C# functions I wrote and use them in the database engine, in Reporting Services and in Integration Services.  The ability to break out of the core server programmability paradigms such as Transact-SQL and SSIS expressions and leverage the power of the .NET Framework is an absolute life saver.
  5. Transact-SQL Improvements:   There are many, including Table-Value Parameters (TVPs), row constructors, rich assignment operators, and the ability to do variable assignment right in the DECLARE statement.  There is an important new DML command in MERGE, a.k.a. "UPSERT" that allows you to either insert a new row or update an existing row in the same statement.  And for those relational die hards who refuse to be drug into the world of Analysis Services, a new aggregation capability called GROUPING SETS will make it easier to roll up those aggregate functions.
  6. Change Data Capture: CDC is the closest thing to a log reader that you will ever see from Microsoft.  We open up some virtual tables against the transaction log and surface any changes to affected tables, which you can then grab in your application and do interesting things with.  CDC is a low-level thing that you would use if you want to build your own sync capabilities.  SQL Server Replication is what you would use if you don't want to have to build all that plumbing. 
  7. Change Tracking:   Applications that support occasionally connected sync capabilities have to track a lot of metadata to make sure that offline changes get applied correctly and to handle conflicts.  In SQL2K8 you can enable change tracking at the table level, and SQL Server will manage all that for you.  When used in conjunction with the Microsoft Sync Framework, developers now have fine-grained sync support all the way from the database engine out to the API level to build rich occasionally connected solutions.  For SQL Server-only installations that don't need a lot of fine-grained control, SQL Server Merge Replication may still be the solution of choice. 
  8. LINQ for SQL:   Application developers who are not database developers have long wished for a more tightly bound way to express SQL in their source code.  Dynamic SQL can't be checked at compile time, and stored procedures aren't visible in application source code.  Enter LINQ for SQL, or "Language Integrated Query".  LINQ for SQL allows developers to code their queries against local pre-generated classes, not remote database objects.  The classes isolate the developer from the specifics of where the data is stored and in what format, and offer a higher degree of type fidelity and developer productivity.  With that isolation comes certain tradeoffs, such as fine grained control over the SQL that gets generated, update semantics etc. Most solutions will be a blend of LINQ and dynamic SQL, but one thing is for sure, LINQ is here to stay.
  9. Entity Framework: The Entity Framework offers an even higher level of abstraction to collections of "entities".  These entities are modeled prior to developing against them, and highly productive programming paradigms (such as LINQ for Entities) are enabled on top of them which further isolate the developer from ugly data access plumbing and physical database goo.  While this is definitely version 1.0 technology, it is clear that the data programmability team is very serious about moving this technology into the future, so give it a good look now so you can understand how it works.  The same tradeoffs apply here as with LINQ to SQL, but at a larger degree of abstraction.  You can see various incarnations of the Entity Framework showing up in other projects, such as ADO.NET Data Services.
  10. BI Programmability Enhancements: This category needs another top 10 list, but in my world every developer should know how to build integrated reporting and analytics solutions, and therefore should know about what's new for developers in SQL Server 2008 Integration Services, Reporting Services and Analysis Services.  Here's my top picks for each of these three important components
    1. Integration Services: Support for C# scripts, especially in the Transformation Script Components, is an absolute life saver. Lookup Transformations have been completely re-worked for the better, with much improved caching options and the ability to share cache's across data flows and packages.  I also really like the new Data Profiling Task which can help you get a handle on a new data source very quickly.
    2. Reporting Services: Big investments have been made in providing a more productive report design experience for both the Visual Studio based "BI Development Studio" and the new stand-alone report designer known as "Report Builder 2.0".  The new RDL Tablix element make's it easier to build complex reports that may have previously required multiple data regions and or reports.
    3. Analysis Services: Our flagship OLAP server, and its integrated Data Mining functionality have seen a lot of improvements.  I particularly like the design-time guidance when building cubes, reminding you on the right way to build attribute hierarchies and the like before you shoot yourself in the foot.  From a data mining perspective, the time series algorithm has been updated and should see a lot of use, and the data mining add-ins for Excel 2007 have also been improved.

Whew, I'm glad I have that done!  Best of luck to all you eager database developers out there, I hope your experience with SQL Server 2008 will be as exciting and interesting as mine has.