Auditing Data Changes in the Entity Framework: Part 1

If you’ve read my last post on Types of Auditing, you should be primed for this one; I’m looking at a way to do Data Operation Auditing in the Entity Framework. That is, I want to track who changes which tables & columns, when, and in what way.

Why does this specifically apply to the Entity Framework? Well there are three main reasons;

1. A lot of DBAs don’t like triggers in their database. In fact, I don’t like triggers in my database! I find them to be more difficult to manage, and often overlooked during defect investigation. I also don’t like the cascading updates that can occur as a result of triggers.

2. In a Trusted Subsystem model, which is often my preferred security configuration, the database is accessed as a service account. This means that triggers and SQL statements have no automatic concept of who is performing the data changes, as the user’s security context has not been delegated through to the data tier.

3. People are starting to use the Entity Framework, so I’m keen to maximise the benefits of using it. It also places some constraints on an auditing approach – for example, out of the box, the Entity Framework cannot pass values that are not contained in an entity to the database during DML operations... so username, for example, can’t be passed down to a Stored Procedure when saving entity changes.

In this post I’ll cover the basic solution concept I’ve used, and some of the problems I encountered. In the next post I’ll detail my final solution. This isn’t some poor attempt at leaving a post on a cliff-hanger to keep you coming back – I just want to keep each post short enough to digest easily!

Basic Design

The Entity Framework extensibility point that seems to fit my requirements best is the SavingChanges event on an ObjectContext. This fires when changes are just about to be saved (i.e. before the SQL is actually generated). Therefore this is a great opportunity for us to check what has been changed, and generate our audit records. I can even add more entities (containing audit information) at this point to the ObjectContext and they’ll get saved too.

To find what has changed we can use ObjectStateManager.GetObjectStateEntries – this is accessible via the ObjectContext’s ObjectStateManager property, and gives us access to information about the original and current values of each property on an entity, whether it has been Added, Modified, or Deleted, and some other useful metadata.

To store data audit records I created a copy of the table I’m auditing, and added some metadata fields. For example, if I have a Product table, I would add a ProductHistory table – something like the following;

Note that the Primary Key in Product is the Id field – but in ProductHistory a basic counter field is the Primary Key (AuditId), and Id is a reference to the Id field in the Product table. This is because we’re expecting multiple history records per product.

I’ve also added metadata about who changed the record, when, and in what way (i.e. Modified, Deleted, etc).

None of this is particularly surprising – it is a pretty standard way to record audit information, in the form of data “deltas”.

Gotchas

I did try a number of different approaches before I settled on one, and this was due to some interesting gotchas...

1: Don’t Create a Foreign Key

You may have noticed there’s no relationship in my data model. That’s intentional; if I delete a Product, I still want to keep my ProductHistory records for that Product, and that means I can’t have ProductHistory.Id related to Product.Id – as referential integrity would break.

2: Can’t Save Added Records

The SavingChanges event fires before changes are saved to the database. In my data model I’m using database-generated keys (i.e. IDENTITY(1,1) fields). This means only after the changes have been saved to the database is the value of the key available in our in-memory representation of the database – our ObjectContext.

This means that if we detect an entity has been added to the ObjectContext, I might want to add a record to the history table... but I cannot get the value to put in the Id field, because it hasn’t been created yet. To get around this, I could allow SaveChanges to complete, and then create my audit records, and call SaveChanges again... but that feels like a hack and would lead to other compromises, so I avoided it.

3: Don’t Create a Conceptual Association

To get around point (2), I thought I had a solution. The Entity Framework automatically manages this key-relationship problem for us using navigation properties. If I just add an association in my CSDL linking Product and ProductHistory, this would mean ProductHistory records would be inserted after Product records, and the framework will fix up the Id values for me.

But of course, as soon as we come to deleting records we come across the same issue as point (1), but within the ObjectContext rather than the database – it will not permit us to add a ProductHistory record with an Id that refers to a Product record that we’re about to delete in the same transaction. In fact, a bit of hacking and I managed to create some pretty serious bugs in my code trying to work around this!

There are possibly some solutions here that involve Conditional Mappings, QueryViews, DefiningQuery, database-level updateable Views, or more. But each one felt wrong to me – they had maintenance implications, could create quite brittle code, would be overwritten when using the “update from database” menu option, or worse. If you have a good solution, let me know – but I avoided them.

4: Avoid Passing Strings Around

One of the things I love about the Entity Framework is that a lot of code becomes strongly typed, and is therefore checked at compile time. In a few places this isn’t necessarily the case – such as accessing DbDataRecord field values, or calling ObjectContext.AddObject.

To address this I took a tip from Stuart Leeks – check out this great post on how lambda expressions can be used to bring compile-time checking to your application. I added similar code to my solution to address slightly different areas.

5: Avoid Auditing by Convention

Convention over configuration can be a valid approach in some circumstances. For a general blog post and flexible solution, though, it felt wrong – I wanted to explicitly select which entities are audited to which tables, through compile-time checked, strongly typed, clear code.

You might not like my approach to this – but I chose it to ensure it was clear what I was doing. Therefore it’s easy for you to put your own flavour on it if needed; I hope you agree!

Solution

So how did I implement it in the end? Watch this space and I’ll post my code shortly.