Just delete it already!

Today in the forum a question came up that illuminates some non-obvious aspects of the EntityFramework which is pretty important so I wanted to copy some of my response here and expand a little to give it a bit more exposure.

The problem statement is essentially this: I have the entity key for something I want to delete, and I want to minimize round trips to the database so I fake up an entity object, set the key, attach it to the context, call DeleteObject and then call SaveChanges.  Sounds great right?  Unfortunately, in a number of cases SaveChanges will throw an exception at this point without even attempting to make a call to the database.  To make things more confusing, the answer is to retrieve more info from the DB first (either through a query or by explicitly attaching).  You might say, "So in order to delete something successfully I have to load more data?  Huh?  Why can't you just delete the record with the key I gave you and be done?"

The background is a bit interesting.  What we are encountering here is the fact that the EF both allows you to describe some fairly high level semantics about your model and abstracts away the underlying physical representation of that model in the database.  The place where this bites us is when I have required relationships for an entity.  For example, I might have a model with customers and salespeople where the association is modeled such that ever customer must have a salesperson.  With this kind of model, I might map this to the database as a foreign key column in the customer table which contains the id of the salesperson -- or -- I might model it as a separate link table which has customer ids and salesperson ids but just has a constraint that the customer id is unique (that way the same salesperson can appear many times saying that the salesperson has multiple customers, but the customer can only appear once and will specify which salesperson goes with that customer). 

The way the EF deals with the fact that either database schema is valid with this model (just with different mappings) is that it reasons about entities and relationships largely independently of one another.  When it comes time to save changes, the EF looks over the ObjectStateManager to find the operations it must map to and carry out on the database and then it goes through a validation phase to make sure that the operations will make sense and result in a coherent final database state.  So if you have an entry in the state manager saying that an entity should be deleted, then the EF will look at your model to determine if there are any required relationships.  If the model says that there are (because for instance, deleting a customer means you must also delete the relationship between that customer and their salesperson), then the EF will make sure that the state manager also has an entry indicating the relationship that should be deleted.  If that entry doesn't exist, then the SaveChanges operation will fail.  If it didn't know about the relationships and didn't make this validation step, then the EF could try to delete the row in the customer table without deleting the corresponding row in the link table if your database schema worked that way which would cause even less clear exceptions to flow up from the database.  With the relationship info, the EF can delete things from both tables automatically if necessary.

Given this relatively non-intuitive requirement, the EF goes to some lengths to make most scenarios handle this automatically.  In particular, when you retrieve entities via a query, the query is automatically rewritten to bring along relationship info (a feature we internally call "relationship span").  That way the state manager will be aware of those relationships, and if you indicate that an entity should be deleted, then the relationships are marked deleted as well, and the EF has all the info it needs to delete things. 

In the event that you attach things, though, you must supply all the needed info.  This can be done either by querying for the entity you want to delete using its key, or it can be done by setting the key of required relationships on the EntityReference's EntityKey property.  In our example this would be the key of the salesperson.  If you set that property and then attach the customer, the attach operation will automatically create the relationship entry as well as the entry for the customer entity.

For more information about relationship span, you might want to check out this previous blog post: blogs.msdn.com/dsimmons/archive/2007/12/21/filtered-association-loading-and-re-creating-an-entity-graph-across-a-web-service-boundary.aspx

- Danny