What's An Entity, Anyway?

These days, I seem to be encountering a lot of entities. Not in the sense of non-corporeal beings as usually depicted in certain science fiction TV shows, but in the sense of data structures. Sometimes, they are called business entities.

Although the concept of entities differ from project to project, I think I have identified at least one common trait of all the entities I come across: They contain (structured) data, but no behavior. Usually, these entities are being consumed and manipulated by something called the business logic. In some cases, entities are even used to transfer data from one layer of an application to the next layer (some people then call them data transfer objects). Since architecture diagrams with vertical columns adjacent to layers appear to be much in vogue these days, I'll use one as an example:

The idea here is to have a single definition for data that spans multiple levels so that you only have to write the data structure implementation once. The code in the different layers interact with the entities: The data access layer creates and stores the entities, the business logic layer modifies the data, and the UI layer presents the data. Pretty clean architecture, right?

No.

So what's wrong with it? First of all, what does the name entity tell us? Nothing, really. Entity is a synonym for object, but surely, the term business objects is so last year that any self-respecting architect would never use such a term. On the other hand, an object with structure but no behavior sounds awfully familiar.

Your code takes one or more structures of data as input, operate on them and outputs other structures. Fowler calls this pattern a Transaction Script; I call it procedural programming, and since I have had my experiences with this programming style early in my career, I never want to go back. Domain Model is where it's at.

In Patterns of Enterprise Application Architecture, Fowler wrote that "a Data Transfer Object is one of those objects our mothers told us never to write." While the pattern itself is valid, it's only supposed to be used for communication across process boundaries, not across layers in the same process.

If you are still not convinced about my arguments, let's take a look at an example. Imagine that you want to model a product catalog. Since we are modeling with entities, we create Product and Category classes. Both are just dumb classes with default constructors, read/write properties, and no behavior. To decouple data access, we also define a data access interface:

 public interface ICatalogDataAccess
 {
     Category ReadCategory(int categoryId);
  
     Product ReadProduct(int productId);
 }

Implementing this interface is fairly straightforward, and goes something like this:

 using (IDataReader r = this.GetProductReader(productId))
 {
     if (!r.Read())
     {
         throw new ArgumentException("No such product.", "productId");
     }
  
     Product p = new Product();
     p.ProductId = (int)r["ProductId"];
     p.Name = (string)r["Name"];
     p.ListPrice = (decimal)r["ListPrice"];
     p.Discount = (decimal)r["Discount"];
     p.InventoryCount = (int)r["InventoryCount"];
  
     return p;
 }

This code is actually fairly benign - trouble only starts to appear in the business logic layer. Imagine that we need the business logic to implement the calculation of the discounted price, and whether the product is in stock (yes, rather inane business logic, I know). Since the Product entity is just a structure without behavior, it's necessary to create another class to implement this business logic:

 public class ProductOperator
 {
     private Product product_;
  
     public ProductOperator(Product p)
     {
         this.product_ = p;
     }
  
     public decimal DiscountedPrice
     {
         get { return this.product_.ListPrice - this.product_.Discount; }
     }
  
     public bool InStock
     {
         get { return this.product_.InventoryCount > 0; }
     }
 }

Now you are left with the problem of how to pass this information on to the next layer.

One alternative is to create an abstraction of ProductOperator (say; IProductOperator) and pass that to the next layer together with the Product entity. That approach can quickly grow quite unpleasant, since each layer adding content to the entity needs to define yet another auxiliary class to be passed along with the ProductOperator and the Product entity.

Another alternative is to model the Product entity to include properties for this information from the start. That would mean that the data access component would fill in only the properties of the Product entity that comes from the database, and a variant of ProductOperator would then fill in the DiscountedPrice and InStock properties in the business logic layer:

 public partial class ProductOperator
 {
     public ProductOperator()
     {
     }
  
     public void UpdateDiscountedPrice(Product p)
     {
         p.DiscountedPrice = p.ListPrice - p.Discount;
     }
  
     public void UpdateInStock(Product p)
     {
         p.InStock = p.InventoryCount > 0;
     }
 }

Beware: Here be dragons.

One problem with this approach is that you'd end up with a lot of properties whose values may or may not be null (DiscountedPrice and InStock, in this case), so you always need to check for null before reading and using a property value.

The other problem with this design is that it railroads your components into a particular usage scenario. In the end, you model the entity in order to communicate it across your process boundary (via a UI, service interface, etc.). This boundary has a particular usage scenario; e.g. you need to show product information in a UI. Such a usage scenario then becomes the driver for the entity structure: You need to show the discounted price, so you need a property for that, etc. If you need to display product information in another screen, you include properties for this screen as well. In the end, you end up with a data structure that carries around a lot of data that may or may not be used in any particular scenario.

There are lots of nicer ways to pass data between layers in extensible ways, and in a future post, I'll describe one such approach.