Still Crazy (About Typed DataSets) After All These Years


First off, my apologies to Paul Simon regarding the title of this blog post — but I simply couldn’t resist 😉

When architecting and building solutions for customers, I tend to make heavy use of typed DataSets.

I believe I used them on my very first .NET project almost ten years ago, because I still clearly remember Mike Pizzo (one of the original architects on the ADO.NET team) explaining to us how we should think of DataSets as a “scratch pad” for disconnected data (as opposed to the old DAO, RDO, and ADO objects — which thankfully are a thing of the past for most organizations). [Okay, I suppose that, technically, you could disconnect an ADO Recordset from the underlying database, but that’s not how it was frequently used.]

The tooling for typed DataSets has always been very good (at least in my opinion). You can create a typed DataSet very quickly using the designer in Visual Studio and restructure an existing typed DataSet with ease. In my opinion, this is almost always faster than writing the equivalent C# code directly.

I also love the way typed DataSets can be somewhat complex — in terms of the business rules that are encapsulated by them — yet at the same time very easy to understand for developers that are not familiar with your solution (e.g. new team members or developers responsible for maintaining the solution).

For example, consider the typed DataSet shown below.

Figure 1: Typed DataSet example (ScorecardData)

What can you tell just by looking at the diagram? What are the business rules that are enforced simply by the structure of the tables, relationships, and columns?

  • First, you can see that ScorecardData (the name of the typed DataSet) contains information about scorecards for client sites (i.e. locations). Note that the term “scorecard” in this scenario refers to a collection of key performance indicators (KPIs) displayed on a dashboard.
  • Each scorecard is simply a collection of items (essentially the KPIs) — with each item corresponding to a particular client site.
  • Each scorecard item may have one or more instances of a KPI status (i.e. the measure of the KPI for a particular period). The status — i.e. “Exceeds”, “Meets”, or “Does Not Meet” (a.k.a. “Green”, “Yellow”, or “Red”) is determined by a corresponding set of thresholds.

Why about the primary keys on the various tables?

  • The primary key on the Scorecard table is obvioulsy ScorecardId.
  • The primary key on the ScorecardItem table is ScorecardItemId — although not quite as obvious as the Scorecard table, since there are multiple key icons shown on that table in the designer. However, you can infer this from the relationship between ScorecardItem and KpiStatus — and the fact that only the ScorecardItemId column appears in the KpiStatus table. [Wouldn’t  it be nice if the DataSet designer in Visual Studio showed a different icon for the primary key from for other unique keys?]
  • The primary key on the ClientSite table is ClientSiteId — which again, can be inferred from the relationship between the ClientSite and ScorecardItem tables.
  • The primary key on the KpiStatus table is (ScorecardItemId, Period) — thus allowing each scorecard item to specify one or more KPI status values (for different time periods).

If you were to right-click on the ScorecardId column in the ScorecardItem table in Visual Studio and then click Edit key…, you would see that there is a unique constraint on (ScorecardId, ClientSiteId, KpiName). In other words, each scorecard can only specify one scorecard item for given site and KPI (e.g. we don’t want to allow “Site1” to have two scorecard items that refer to “KeyPerformanceIndicator1”).

Also note that since a primary key on a table must be unique, the constraint on the KpiStatus table ensures that a scorecard item (i.e. a KPI) is only allowed to specify one KPI status for a particular time period. It just doesn’t make sense that “KeyPerformanceIndicator1” could be both “Green” and “Red” for, say, the “2010 Q1” time period — it has to be one or other (or “Yellow”, I suppose).

We can also see that there is a unique constraint on the ClientSiteName column in the ClientSite table. Here’s where things get slightly more complicated. Suppose that we wanted to retrieve data from this DataSet by filtering on the ClientSiteName. In other words, we want to only show KPI information for a particular site. Seems reasonable, right?

What if two completely different clients each had a site named “Headquarters”? That could definitely be a problem, because we certainly wouldn’t want to show one client’s data to a different client.

However, in order for that to actually occur, we would first have to populate the DataSet with information from two different clients. Since that wasn’t how I intended this DataSet to be used, I chose to add a unique constraint on the ClientSiteName column. If, at some later point in time, new scenarios were added and I needed to use this typed DataSet to store data from multiple clients, then I would probably add a Client table and adjust the relationships and constraints accordingly.

Okay, so where exactly are we — after all this verbiage?

The point I was hoping to make so far is that — without looking at a single line of code or any documentation (which many developers don’t like to read anyway) — we know quite a bit about the solution, including some of the business rules, just by looking at the typed DataSet.

In my next post, I’ll discuss some smart ways we can interact with a typed DataSet when actually building a solution.

[I know the content of this post is very elementary, but bear with me, I promise things will get more interesting in subsequent posts 😉   ]

Comments (5)

  1. Mark says:

    Thank you for the article. I am working on a credit/financial management application, and i am using typed datasets.

    I was googling to find out if anyone still believes  and uses datasets, and here is my answer.

    Just want to ask if you dont mind: is there any realworld financial management application (accounting) you know of that was architected using Datasets? Just curious to know that i'm not alone in this boat.

    Thank you for this very wonderful and encouraging blog.

    Mark

  2. jjameson says:

    @Mark,

    I am not aware of any financial management or accounting application that is architected using DataSets.

    While I am aware of at least one real world application (namely SharePoint) that leverages DataSets in its public API, I want to make sure you understand a couple of key points:

      – First, while some SharePoint API methods — such as QueryService.QueryEx (msdn.microsoft.com/…/search.queryservice.queryex(office.12).aspx) — return DataSets, these are not typed DataSets. In my opinion, there's a big difference between using generic (i.e. untyped) DataSets and typed DataSets. While I do occasionally use generic DataSets (typically as a temporary storage structure for manipulating — e.g. sorting and filtering — data), if you pass a generic DataSet to a method (or return one from a method) then the "contract" that you establish as a result is very loose. In other words, you really don't know what the structure of the DataSet is supposed to be and you end up having to do lots of run-time checking to make sure you have the expected tables, columns, etc. You also lose all of the "compile-time goodness" that you get for free when using typed DataSets. In other words, this is a lot like passing XML data as a string (which tends to make me shudder).

      – Second, my blog post regarding typed DataSets was not intended to suggest that they be exposed externally. My opinion (which apparently is contradictory to others) is that you shouldn't expose a DataSet — either typed or untyped — outside of your application (in other words, through your API or, similarly, via a Web service). The reasons for this are plentiful — e.g. cross-platform issues, "versioning" schema changes, etc.

    Now, with that said, I also want to point out that many years ago I worked on a large profile management application for a travel company that heavily leveraged typed DataSets. We even leveraged these DataSets when exposing the application through Web services by "shaping" the serialized format of the data (i.e. the XML) in order to match the specifications established by the OpenTravel Alliance. It was something that I had to keep explaining whenever I described the solution to other people because the first words out of their mouths would often be "you shouldn't be using DataSets in Web services", to which I would reply, "the contract is defined using XSD, we are leveraging the power of typed DataSets to generate the XML that validates against the schema."

    Also note that technology is continually evolving (often at a faster pace than we would like) and you won't find as many proponents for typed DataSets these days as, say, just a few years ago. For example, I seem to recall a somewhat recent blog post from Scott Hanselman in which he (sort of) slams DataSets. At least that was my take on it when reading it. He may have been referring to "bleeding DataSets through your public API or Web service" (in which case, the slam could certainly be justified), but I don't recall for sure.

    The new version of the Entity Framework is definitely worth taking a look at (if that is an option for your solution). You get many of the same benefits with EF that you get with typed DataSets, as well as some added value (e.g. the mapping functionality to more easily decouple your underlying database schema). The problem — which, in fact, may not be a problem at all depending on your organization — is that you need to be using Visual Studio 2010.

    So, after all this ranting, what is the key takeaway?

    Definitely consider using typed DataSets internally within your enterprise solutions, but also be sure not to expose these "outside" the solution (e.g. through a Web service) — instead opting for XSD, or perhaps newer alternatives like OData, to define the contract with other parties. Also, consider evaluating newer alternatives like Entity Framework for data access when the time is right for you.

    I hope this didn't "muddy the waters" 🙂

  3. Tod Flak says:

    Hi Jeremy,

    Thanks very much for this blog post.  Just like you, I have loved typed datasets for many years.  But I have the sensation from multiple sources that they are quite out of vogue these days.  For example, I was really suprised that in VS2010, the Dataset Designer interface hasn't changed hardly at all… still no zoom, unlike the new  "Xml Schema Designer" thing.

    I am about to embark on a new WCF-based project, and I was really wondering if exposing a Dataset-generated XSD through the web services interface was a bad idea… I am pleased that you commented on that , and I will look more into the Entity Framework stuff.

    thanks again,

    Tod

  4. Jon says:

    We just finished developing a distribution/accounting system for one of our clients that included modules for GL, AP, AR, Sales, Purchasing, Inventory, Banking, and a few others.  We used typed datasets exclusively with asp.net web services.  We can open a sales order in a windows application across a T1 connection in less than 3 seconds full loaded.  Typed Datasets are one of the most widely used but misunderstood technologies around.  A lot of the problems EF is trying to solve have already been addressed by the typed dataset if extended properly.  That's what we have done with our application framework.  www.3n1solutions.com

  5. jjameson says:

    @Tod:

    I recommend that you decouple the typed DataSet XSD from the Web service message schemas. I know this may seem like a superfluous effort, but I believe this gives you the best of both worlds in the long run. In other words, by definining the Web service contract using "pure" XSD, you can easily partition your schema into multiple files and incorporate other best practices for defining XML schema (in my opinion, the schemas defined by the OpenTravel Alliance (http://www.opentravel.org) provide a great reference when creating your own XML schemas for Web services). Then you can create typed DataSets that, when serialized, conform to the XSD.

    Think of this as a "contract-first" approach — which is generally recommended when developing Web services.