Nix the DataSet??????


Some interesting comments from my entry on new DataSet features…

 

Please nix the DataSet from the framework entirely and re-focus on the domain-oriented data access work you were doing prior to Tech-Ed 04. If that’s not possible, please have the Visual Studio team desist from distracting .NET culture from Domain-Driven Design with that nasty, nasty typed DataSet RADule 🙂

I hope this helps,  Scott Bellware

http://www.geekswithblogs.com/sbellware/

 

I don’t advocate that for multiple reasons.

      Sahil Malik

http://codebetter.com/blogs/sahil.malik/

 

Scott is an old friend from the ObjectSpaces days, so I think he is partially kidding. And I can’t say I totally disagree with him WRT the typed DataSet because I think it has been shown that O/R solutions can do better then code generation.  But that is a different topic for a different time.

 

What I do want to talk about is projections in terms of an O/R solution – which is an interesting topic when discussing the DataSet. 

 

Object instances in O/R solution can basically have five states:

 

  1. Persisted in the data store (shredded or not).
  2. In memory, but not materialized.
  3. In memory, materialized as an object instance.
  4. Partially in memory, materialized as an object instance.
  5. Partially in memory, not materialized.

#1 is the traditional way of storing data in relational databases.  Shredded means the individual properties including references have been set as column values and that the object identity is defined by a single row of storage (in fancier mappings – this row itself might be a project of base tables and/ or views).   With relational database advances, there is also the ability to store the object instance wholly (i.e. Sql Server 2005 UDTs, etc) or even partially shred it and store the rest in an untyped bag (i.e. the Xml DataType)

 

#2 When O/R frameworks pull data out of the data store there is a transition between the rowsets being returned from the data store and the object graph returned to the consumer.  Basically, this means that the object instances live in an un-materialized state.  Since materialization is not cheap, many O/R solutions – particularly with client side caches, store the data in an “un-materialized” state until the specific object instances are required by the consumer.  In fact, with some O/R frameworks – users can access this data.

 

#3 This is the in-memory materialized domain model – basically the corner stone feature of all O/R frameworks.

 

#4 Same as 3, but value properties and/ or references are delayed loaded.  In terms of domain model consumer and for this discussion, #3 & #4 are the same.  (I know there are some issues here, see my comments in the past on this).

#5 are projections of the domain model which have not been materialized as object instances – and may never get materialized. (Actually, #2 is probably a sub case of this).  In fact, depending on what the projection is – it might not even be possible or even desirable to materialize the objects.

Take for example a scenario where a domain model consumer wants to search for all the customers with a given last name.  From there they want to display the results to the application user and let them select the specific customer.  Now, let’s assume that the Customer type is fairly complex and has 30 or so properties of which only a handful are useful for the user to visually select the “right customer”.  Performance wise, this would only mean selecting a few of the type properties (a projection of the domain model) in the initial query.  This design issue is quite familiar to anyone who has ever written app that binds query results to a grid.  For O/R solutions this presents an interesting issue in that the properties in the project may not allow for materialization of the object instances for the retrieved type.  For example, the type has several properties which either don’t have meaningful defaults, or if not initialized to persisted values leave an object instance in an illegal state.

 

Further, for a common last name, performance wise it might not even be desirable to materialize all the customers as object instances even if one could.  So the question is, if the objects are not materialized, how is it stored/ accessible in memory.  Obviously, “rowsets” are a very good idea since that is generally how relational data is exposed through data access APIs.   So, a relational, client side cache (like the DataSet) would seem to be really useful.  Interestingly, for this sole purpose, the DataSet is probably overkill.  Really all is needed is some sort of client side collection object, even arrays could work.  However, for the scenario discussed above, one is going to need other features like binding and sort/ filter/ find capabilities – which the DataSet is very good at.  Ironically, these are features that are required by most applications independent of data access model.

 

My belief has always been that on average roughly two-thirds of all queries executed by applications requiring database persistence are projections WRT to the domain model.  Unless an O/R framework has a solution for this, it either forces the domain model designer to include awkward “partial types” (i.e. PartialCustomer), utilize some sort of weak typing hack in their domain model (which kinds of goes against O/R in the first place), or translate projection queries into queries which can always generate results that can be materialized (while accepting the performance overhead).

 

 

Comments (16)

  1. Frans Bouma says:

    What I miss in this whole story is the theoretical approach to the problem. I find it a little (just a little) dissapointing it is explained in technical terms, which is completely unnecessary.

    O/R mapping is the key technique to use if you want to work with single entity instances, and set of entity instances, but always the entity is the core building block.

    Datasets are the key technique to use if you want to use relational model sets of data, i.e.: the set is the core building block.

    It’s just set theory applied in different forms onto a fixed (DDL defined) schema.

  2. Relational client side caches of unmaterialized objects, as you describe them, are useless, IMHO. You would still need to flush the state to the database before doing the query that does the projection. I like a projection to return arrays of objects as you have already mentioned. An ObjectView class can wrap the collection and provide the databinding support. Having said all that, I do use DataSets currently to reshape my domain model to the presentation model, but this is only because the tools are not currently here yet to databind to a domain model in a sophisticated way .I would much prefer to have an updatable projection.

  3. kai says:

    you should humber your self, learn, then, learn again, and learn again. Be honest. Do not pretend you know. — Learn CSLA (Rocky’s framework), learn NHibernet. Use them to create some real applications. Better, learn Java. Java does not have dataset, so, it will give you some idea how things work in a real world enterprise level system. —- You are suppose to know those things when you post; not some "theories", an undergraduate knows those stupid theries. Do not wast everybody’s time!!!!!!!!!!!!!!!!!!!!

  4. kai again says:

    Scott is an old friend from the ObjectSpaces days, so I think he is partially kidding. And I can’t say I totally disagree with him WRT the typed DataSet because I think it has been shown that O/R solutions can do better then code generation. But that is a different topic for a different time.

    ——-thanks for providing the link; however, I’m puzzled, why you do not learn from him?????

  5. Blade7 says:

    Ping Back来自:blog.csdn.net

  6. Greg Finzer says:

    I recently wrote an article on the advantages of business objects over datasets.  Please visit and let me know what you think.

    http://www.kellermansoftware.com/t-articlebusinessobjects.aspx

  7. Kris - TECH says:

    摘要: 有些情况下,非类型化的 DataSet 可能并非数据操作的最佳解决方案。本指南的目的就是探讨 DataSet 的一种替代解决方案,即:自定义实体与集合。(本文包含一些指向英文站点的链接。) 本页内容

  8. 上一篇blog还是在6月底写的,然后就是第一年的performance review,再有beta版的开发工作,一直都没有更新。 到今天Windows Live Data的Beta基本已尘埃落定(其实两周前已经code

  9. Mainz says:

    Both DataSets and custom classes don’t limit what you can do in any way, and both can be used to accomplish the same aims. That said, DataSets are fantastic tools for prototyping applications and represent excellent solutions for building systems in a