ObjectSpaces: The Matrix is not Enough


In response to my earlier post, “Spanning the Matrix“, Ralfs Sudelbücher has something to say about building O-R mapping engines in “Spans in ObjectSpaces are not Enough.“ 


He is making the point that it is not enough just to describe which associated objects/collections should be retrieved along with the primary objects of the query, that you should in addition be able to specify exactly which properties of an object are retrieved to avoid pulling back too much data.


I have to agree that it would be desirable to do this for may particular scenarios, however, as long as the objects and schema remain static throughout your applications this has a variety of drawbacks.


1) Selectively populating data for an object has a big downside if you ever intend to pass off this information to another part of your application or someone else’s component.  The object itself does not encapsulate the semantics you were implying by restricting certain fields of information, so another piece of code that sees instances of the same object may very well assume all data is accurate.  Even if the properties are encoded using something equivalent to NULL, it is not evident given an object is the data is really NULL in the database, or it was just omitted from the query.


2) Behaviors built into the object may depend on data being available.  For example, a read-only property that calculates its value based on other fields would be impossible if you could merely omit certain fields given a query. 


The only rational thing to do would not be to omit properties during a query but to project your data into a new strongly typed object definition that would correctly describe the result set that you want.  You would basically be re-encapsulating your data and would likely not have any behaviors associated with the result at all. 


You really have to decide whether you want objects to be bastions of behavior, that describe fixed semantics over your entire database schema, possibly encoding strong relationships between properties as well as stronger criteria such as cardinality.  This puts you in a world where your objects are truly just mirrors of the database state in the truest Object-Persistence world view.  Or you have to decide that your object-data is really just a projection of this meta model, and that the results you obtain in your application code are merely just the resulting data, and there is no strong tie back to the semantics of the database.  You can only have consistency given these two extremes.  There is no middle ground.


So are your objects data or are you data objects?


You decide.


Matt

Comments (5)

  1. Hi Matt,

    VERY interesting! I see your points, but we have still tried an approach where we lazy load properties also (optional of course). The instance has "way" (a ref to a workspace instance) to expand itself and that will be used implicitly or explicitly when it’s time for loading unloaded properties.

    The downside is that there is a big risk that each and every instance has to be expanded after initially loaded and that increase the chattiness dramatically. This is especially the case when it comes to validation. Then it’s probably common that most properties are needed so for update-scenarios, most often all properties should be eager loaded. But for list scenarios, it is from time to time nice to not load more than a handful of properties eagerly. And it’s in the list scenarios that you load many objects and would benefit most for not moving too many properties over the network.

    I think this solution is useful in some scenarios. There is no silver bullet…

    🙂

    Best Regards,

    Jimmy

    http://www.jnsk.se/weblog/

    ###

  2. Matt says:

    I guess I forgot to say, "unless the object’s properties are design to be dynamically loaded." This works fine for a while, but has additional drawbacks, and as you said there is no silver bullet. However, you can choose an option that at least keeps you data and semantics as consistent as possible.

  3. My understanding of the above situation described matches the issue of reporting quite well. IMO, reporting is not where OO shines, and so, to use it as an example to drawbacks of objectspaces would be like saying a screwdriver isn’t good at hammering in nails – although the need to hammer nails is commonly agreed.

    This is not meant to slam Ralfs in any way, just that I think that too often developers look at a new tool/technique and try to do everything with it. "But does it do the dishes ?" – Objectspaces is good for what it was meant to do.

  4. Saul Bloom says:

    I think spans are fine if you (the app developer) know that you’ll want to pull data into, for example, customers and orders. If you don’t, then lazy loading is the way to go. I’m hoping that the way Ospaces are built, you can specify lazy loading in your schema & class definition, and use it or spans depending on how you structure your query. I haven’t quite figured out what magic is hap’nin under the hood for lazy loading. Let’s say we have an order table and an order catalog table. Our order table has a columns, order_catalog_id, which is a foreign key into the catalog table. When I get an order, I’d to be able to lazy load the row from the catalog. My first query runs, and pulls in the order. Does my catalog ID column get placed into a property of my order class? I then try to access order.catalog.name. Does it next use a query that runs based on the catalog_id field that I have stashed? Or does it have to join back to the order table? In WebObjects EOF, this is handled properly by storing a "fault" in the order class-catalog relationship, and resolving that at late load time.

    Or course, I’d like to be able to intervene during this process to allow a couple of improvements. For instance, I’d like to cache all or parts of my relatively static catalog table, and use that instead of the query. I’d also like to be able to specify an additional flavor of lazy loading and/or span – one that runs in a background thread, where I have information that I want to present to the user immediately, and anticipate the next big click.

  5. David Goldstein says:

    Lazy loading…

    If your authority is the database and the objects model it, then the sort of collection (array, array list, etc) that you choose for related rows (e.g. children) is arbitrary, isn’t it?

    So then once we have generics why not have a DelayedLoadList<MyChildClass> which is loaded on demand?

    Life is easier when you can dictate some of the data structures that the implementor can use… or at least what they should use to get the best functionality.

    Of course if you load a list of X where condition (A) is true, then it might be ideal to load all children of X where (parent key in condition(A))

    I always imagined that the best way is to allow casual queries, but to focus the skilled/disciplned workers on designing _access patterns_ that can define how a network of objects are loaded.