Futility of Object-Relational technologies

 OR (Object-Relational) technologies attempt to abstract OO developer from knowing intricacies of relational database. It does so by de-coupling the object definition from its persistence. Persistence layer will do the 'right thing' to persist information in the persistent store. It also knows how to fetch information from the persistent store. In general, most technologies use OO language types such as classes or in some case structs to map to tables or views with properties of the type mapped to columns or fields. This post discusses some of the short falls of such technologies.

 

One of the main issues with attempting to map relational tables to objects is that of flexibility. For example, think of application like Outlook which lets you choose whatever the fields you want to see (like Size, received time, subject etc). In fact this feature is called Field Chooser. I am sure most of the applications require end-users to select a view of data. Relational database lets you create views of a table through select statement. What is the equivalent in OO world? As far as I know there is no way one can create a view of an object statically, leave alone dynamic definition. OR guys may object to this, possibly saying that the application can ignore the fields that it doesn't want to see. This is a lousy argument at best. For example, think of an Employee object that has hundreds of attributes. Let's say that I want to search for an employee and all I want to see is the name, title and department. Let's say that the search returned with 1000 possible matches. If you use OR technology, you probably have 1000 employee object instances out there that contains all the attributes (even though the attributes may not be set) that the user doesn't want to see, but still occupying memory. This is not the optimal use of working-set. Again, OO guys might object to this by saying that in today's systems memory doesn't matter. In my opinion, it always matters and world class applications always use memory intelligently and spartan use of memory is always preferable.

 

Another related issue is that binding the result sets to presentation layer widgets such as grid or list controls. In the above said case, where the view is dynamic based on user choice, application developer has to write some code to map the object or object set to controls. Instead, if you use datasets, then you don't have to do any extra tricks to map to user controls.

 

Yet another issue is that of security. For example, certain users may have access to only certain fields. In this case again, application has to write additional amount of code to enforce the authorization.

 

There are other cases, where OR has no answers at all. For example, continuing the flexibility argument, there are cases, where an application have to send a free-form SQL query to RDBMS. I can think of situations where the users are provided with a flexible interface where users can choose what they want to see and they can define search predicate. One fine example is Query by example (QBE). The implementation might require the application to generate a dynamic sql on the fly that may involve one or more tables (again determined on the fly). I would like to call this dynamic  'aggregation' of data from many different tables. There is no way by which one can achieve this functionality using current OR technology. Objects/classes, by definition are inflexible data-structures. As of today, type system supported by the OO languages or the runtime they are based on, do not support this semantics.

 

Predominant implementations of OR technologies uses type system provided by the host language. For example, if you take Java or CLR or C++, classes are used to represent an object. Most of these languages or the runtime based on which the language compilers are developed, doesn't support dynamic modification of the types. The type system of OO languages is not flexible where as RDBMS is very flexible by its very nature. One can define a type in RDBMS system, define a view on it, define an  aggregated view using the semantics of relational calculus. Sure enough, RDBMS requires huge amount of meta-data that one has to supply in the form of DDL. But, that is the power of RDBMS.

 

So, the question would then be, what are the desirable characteristics of the data-structure that represents relational information in-memory. The main characteristic, I would think of, is that the data-structure should be flexible - one should be able to make modifications to the structure on the fly. I can think of three possible data structures that fit this criteria. They are

 

  • Datasets
  • XML
  • Roll your own which will ultimately look like datasets.