Data Access API of the Day - Part IV (Programming to the Conceptual Model…)

Welcome to Part IV of Data Access API of the Day; a brief history of the evolution of Microsoft’s Data Access APIs.

In Part I we looked at ODBC as Microsoft’s C-based Relational API, and the DAO and RDO automation interfaces that made relational databases exposed through ODBC available to languages like VB.  In Part II we looked at OLE DB as Microsoft’s first-class Data Access API for componentized data access within Microsoft’s Component Object Model (COM) environment.  In Part III we looked at the introduction of ADO.NET, a managed API for the .NET Framework that revolutionized the relationship between connected data access and working with disconnected sets of data, and started to rationalize the relationship between relational data and XML.

ADO.NET Entities builds upon our mutual investment in ADO.NET by adding the ability to write applications against a rich conceptual "Entity Data Model" schema, rather than a flat relational database schema. The Entity Data Model (EDM) extends the relational data model with Entity-Relationship (ER) constructs for modeling real-world concepts such as Inheritance (cars and trucks are vehicles), Relationships (customers have orders), and complex members (street, city, region, and postal code composed as a single "address" property within a customer).  An extended-SQL grammar called "Entity SQL" allows you to directly query your conceptual schema, leveraging inheritance, accessing complex members, and navigating relationships. In many cases, building these concepts into the conceptual schema and query language removes the need for complex joins, unions, and subqueries to do conceptually simple operations.

These rich conceptual schemas are exposed and queried through an "Entity Client".  The Entity Client is an ADO.NET Data Provider that builds queries against storage-specific providers using client-side read/write "views". Queries and updates written against these conceptual views are expanded by the Entity Client and executed as queries against underlying storage-specific providers.  All the actual query execution is done in the store (not on the client), and the results are assembled into possibly hierarchical, polymorphic, results with nesting and composite members. This separation between the conceptual model that the application targets and the storage schema of the database is an extremely powerful concept that we believe will greatly simplify the authoring and maintenance of database applications.

Exposing client views through an ADO.NET Data Provider allows us to retain the familiar ADO.NET programming model, leveraging investments in code, tools, and knowledge built around ADO.NET.  The fact that the Entity Client consumes existing ADO.NET Data Providers (extended to support a new canonical query tree representation) builds on the growing community of popular, as well as custom, ADO.NET Data Providers.

Many of the constructs added we added to the Entity Data Model also exist in popular object-oriented programming environments, including the .NET Framework.  This is not by accident.

Since version 1.0, customers have sought an Object/Relational solution within the .NET Framework. Microsoft has made several attempts at an O/R solution for .NET, most notably "ObjectSpaces" which, though never released, was premiered as a "technical preview" at PDC when we launched the .NET Framework, and pretty much every conference there-after.

Like most O/R solutions today, ObjectSpaces attempted to support a rich set of mappings and scenarios through custom query generation. Adding support for a new type of inheritance mapping, for example, meant adding code to a query generator to insert the necessary join conditions in all the right places.  Understanding how this new construct composed with other joins, projections, unions, and predicates added throughout the query to model other object-like concepts made the code complex and somewhat brittle.  Trying to understand how to generate updates against such complex queries, or if such updates were even possible, was even more difficult. Worse of all was trying to verify that all possible combinations of the constructs composed into a query were handled correctly.

ADO.NET Entities takes a different approach.  By modeling a rich conceptual schema through client-side query and update views, the Entity Client leverages the significant investment and research that has gone into relational database view theory. Updating, for example, is done by applying well-defined view maintenance techniques to the update views in order to produce a set of delta expressions that are combined with the query views to produce update expressions. The resulting query views and update processing are both composable and verifiable.

For those that prefer to work with data as strongly typed CLR objects rather than untyped result records, ADO.NET Entities includes "Object Services" which build on top of the Entity Client and allows you to query and retrieve results in terms of application "Data Classes" whose identity and changes are managed for you.

So what about LINQ?

"LINQ" stands for "Language INtegrated Query". As the name implies, LINQ integrates query concepts directly into the programming languages, enabling data access code to be type-checked by the language complier, and developer tools like Intellisense to make it easier for developers to write queries. This, along with higher level conceptual models (like Entities), contributes to reducing the impedance mismatch between applications and data.

LINQ is supported as a first-class citizen within the ADO.NET Entity Framework through "LINQ to Entities".  LINQ to Entities is part of the Object Services layer which enables you to build queries through strongly typed language expressions and built-in query comprehensions (in C# and VB) as well as textual Entity SQL statements.  This means that the same conceptual client views are available through existing ADO.NET provider programming patterns or through consuming Object Services using either ad-hoc textual queries or integrated language queries.

ADO.NET Entities, along with LINQ, will be featured in the upcoming February Orcas CTP.  In the meantime, more on the ADO.NET Entity Framework can be found here, as well as in this Channel 9 video.

It’s been interesting for me to see our data access APIs evolve over the years from ODBC, a C-level interface for accessing data in a SQL database that is still popular today, to a broad Entity Framework that supports modeling data as rich conceptual objects, querying through a common extended SQL grammar or query constructs embedded within the language, and interacting with the data as business objects with identity management and change tracking.

What’s next for Microsoft in Database APIs?  Time will tell, but the bet on the Entity Framework, and the Entity Data Model in particular, is big. You can expect to see more and more services within SQL Server, as well as technologies throughout the company, embrace and leverage the Entity Data Model as the natural way to describe data in terms of real-world concepts.  Although I don’t admit it to many people, after almost 20 years at Microsoft I still find working with data interesting, and I look forward to continuing the journey with you, wherever it may take us.

Mike Pizzo
Architect, Data Programmabilty