The best technical book of 2009

Article
01/13/2009

...was actually published in 2008, but it's really long and I've finally finished it! It's called 'Information Modeling and Relational Databases' (Second Edition) by Terry Halpin and Tony Morgan. Picture below, linked to Amazon for easy purchase :)

So, what's this book about and why is it so great? As the title would indicate, the primary focus of the book is modeling for relational databases. However this edition of the book contains much more than that. Before I elaborate, just the core modeling techniques and explanations would help every single one of us to model and subsequently build relational databases that more accurately reflect the business domains in which we work -- now, tell me, who doesn't need that?

The core method used is ORM (Object Role Modeling). If you've never heard of this and are intrigued, start at the main ORM site, where there are a series of tutorials. In a nutshell, ORM is a visual modeling technique designed to capture business domain information at the conceptual (rather than logical or physical level). The central argument is that if you capture the information at this level, then you are more apt to get it right (because it is closer to the source, i.e. natural language) and easier for you to have domain experts validate what you have done. The tool that you use will produce both diagrams and verbalizations (text) that you can use to validate your skill in capturing domain information accurately. I'll show a simple ORM diagram below. Another key idea in ORM is that there are only entities and roles (no attributes are specified at this level of modeling). Think of nouns and verbs and you got it - it's also called 'fact-oriented' modeling. After facts are captured, then various types of constraints are added. These constraints in combination with roles accurately capture domain information so that lower level diagrams or code can be generated (for example ERD diagrams or T-SQL code).

ORM Picture

While reading this book, I discovered that Terry's team have recently released an add-in to Visual Studio 2008, called NORMA, that allows you to create ORM diagrams from within VS. I blogged about the details of how to get NORMA and to use it recently. So why I am writing about this book then again?

Because as I read on, refreshing my knowledge of ORM, I noticed that there were three entirely new chapters in this (second) edition - these new chapters particularly peaked my interest. The first covers 'Advanced Modeling Issues', such as join constraints, deontic (i.e. 'merely suggested' vs. 'absolutely required') rules, temporality support and other advanced features that have been added to ORM 2. These features seem to me to make ORM much more useful, as you can use ORM to document subtleties of intent (i.e. natural language).

I read on to refresh my memory of the exact mapping between ORM modeling and various aspects of T-SQL, i.e. data types, joins, subqueries, etc... Next I was pleased to find another new chapter on advanced SQL mapping. For example, views, triggers, transactions and more were covered. This really completed my understanding of how ORM models map to relational output either in ERD diagrams or code (such DDL and DML).

The other thing I was reminded of was how elegant and clear the author's explanations of core relational database concepts are. I have been a Microsoft Certified Trainer for 10 years and have written two books on SQL Server myself, so I have some understanding of the challenge of presenting this information. Even after this experience, when reading these sections I learned some fine points about relational databases.

Probably the most interesting chapters though, were NOT on relational databases - rather there is a new chapter on Process and State Modeling. In it the authors compare standard modeling techniques, such as UML, in great detail to using ORM for the same challenges. They point out limits in modeling in the various types of UML diagrams.

Of course, being the futurist that I am, my favorite chapter is still the last one. Here the authors give an overview of OLAP and data warehousing and briefly discuss some possibilities for using ORM as a core modeling tool in this area. Next they include a section on conceptual query languages. This area is of particular interest to me given my background in linguistics. The authors include a bit of information about an active project - ConQuer and some work-in-progress extensions to NORMA. I show a picture of a conceptual query below.

Conceptual Query Picture

Next the authors take a look at ORM and the semantic web ontologies, including information about OWL and more. They end the chapter and book with a look at what they call 'post-relational' databases. This seems particularly timely as I've been digging deep into the new storage mechanism in Microsoft's own Azure native storage and Azure (SQL) Services storage and these implementations (at least currently) seem to me to be a type of post-relational storage mechanism. Based on your feedback to me so far (via email, during geekSpeak, etc...) it seems that many of you are struggling to understand this paradigm shift. At the very least, currently our cloud storage implementations are NOT what we developers have known and worked with, .i.e. pure relational storage. Reading this book helped me to gain perspective about new possibilities and I think it will do the same for you.

So, go buy it, enjoy and learn!

The best technical book of 2009

Additional resources