Conceptual Schema

When I first started working on commercial relational databases in the early 1980’s the questions of the day were as basic as “Can the relational database be used for commercial software?”

In those days the breakthrough from a programmer perspective was providing an abstraction from the data storage that allowed a programmer to leave behind references to physical on-disk structure, and instead code to a logical schema. This logical schema was represented in now familiar tables and columns of relational databases and allowed queries to be expressed without direct reference to specific indexes, rather we built “really smart™” query optimizers that determined the optimal execution strategy allowing the programmer to write their program using the relational abstraction and resulted in the now familiar Customers, Orders and Order-lines patterns. In fact, at the time, there was a revolution in schema design that led to the “Three Schema” model:

  • Conceptual – a model that captured the essential data that needed to be stored and the relationships between elements
  • Logical – a normalized representation of the conceptual model that minimized redundancy in the data and leveraged relational concepts
  • Physical – the on disk representation of data that accounts for layout, partitioning, index, space management, etc.

The conceptual model was generally written down on paper or built using a design tool and the results were hung on a wall for all to see, but was rarely represented directly in the program. Often the result was that important implicit knowledge of the software designers was lost when the logical schema was created and each new application ended up creating a new conceptual schema often having to reverse engineer from the existing logical schema.

Today, we have evolved to where programming languages present richer constructs that lend themselves to programming in a model that more directly represents the domain model of the application, the result is that conceptual models are directly represented in applications and many modern applications have a data access layer that in code provides the transformation and mapping services between the conceptual and logical schemas. This layer has been abstracted into a whole class of middleware products often call object-relational mapping or ORM tools.

At Microsoft, we’ve been working on these technologies and understanding how we can help programmers write less code by more directly incorporating a conceptual schema not only into individual applications, but into our data platform. We see a world where application services like replication and reporting are presented to the programmer, administrator, or knowledge worker in a manner that increases accessibility of data significantly. I’m interested in your comments on conceptual schema. Do you have a formal layer in your applications that transforms from logical to conceptual schema? If you’re interested in more on this see www.microsoft.com/data and the paper on Next-Generation Data Access: Making the Conceptual Level Real.

Britt…

Director of Program Management
Data Programmability
Microsoft Corporation