Auditing Data Changes in the Entity Framework: Part 2

Article
06/15/2009

In my previous post I described the basic concept behind my auditing approach using the Entity Framework, and covered some of the problems I encountered. This post focuses on the solution I went with; do feel free to comment if you have any thoughts.

Audit Lifecycle

To get to the root of my requirements I wrote down the lifecycle of a piece of data, and what needed to be tracked against it.

Action	Audit Requirements
Record Added	The date it was created, by whom, and what the original values are.
Record Modified * repeated many times	The date it was changed, by whom, and which values were changed.
Record Deleted	The date it was deleted, and by whom.

The problem is that creating an audit record is tricky based on point (2) in my previous post. Therefore, I chose to record the original values when each change is made. This means I actually track the following;

Action	Audit Requirements
Record Added	The date it was created, and by whom.
Record Modified * repeated many times	The date it was changed, by who, and what the pre-change values were.
Record Deleted	The date it was deleted, by who, and what the pre-delete values were.

This has effectively flipped the approach on its head, but I can still get a point-in-time view of the data whenever I need it. Therefore my data model becomes something like this;

When a new Product is added, the CreatedBy and CreatedDate fields are recorded on the Product entity. When changes are made, the previous unchanged version of the entity is copied to the ProductHistory table, and then the new Product entity is saved. When a Product is deleted, it is removed from the Product table and a copy of its last values is saved to ProductHistory.

This means my ChangedDate field on ProductHistory could arguably be described better as “ExpiredDate”, as it is the date and time that the data on that record became out of date.

Enabling Auditing on the ObjectContext

I described in my previous post how I would use the SavingChanges event to create audit records. To wire this up, I’ve created an extension method that can be called on an ObjectContext;

public static void Audit<fromType, toType>(

this ObjectContext context,

Func<IDataRecord, EntityState, toType> mapping,

Action<toType> addToContext)

{

context.SavingChanges +=

new EventHandler((o, e) =>

CreateAuditRecord<fromType, toType>(

context, mapping, addToContext));

}

Calling this method sets up auditing for one specific type of entity by adding an event handler to the SavingChanges event. The handler invokes a method called CreateAuditRecord.

You might think it would be better to set all audit types up at once, or you might want to configure auditing in the ObjectContext’s OnContextCreated partial method... and that should be easy enough. The point of this code is not to be final, but that it should be easy for you to understand and adapt.

So why have I used two type parameters, a Func<T,U,V> and an Action<T> on the Audit method then? It comes down to points (4) and (5) in my previous post; I wanted auditing to be very explicit about how it behaved, and I wanted to avoid passing string values around to identify field names or entity sets.

To demonstrate this, consider the following example of how I would enable auditing on the Product entity, saving changes to ProductHistory.

using (MyEntities db = new MyEntities())

{

db.Audit<Product, ProductHistory>(

(record, action) => new ProductHistory() {

Id = record.Field<Product, int>(f => f.Id),

Description = record.Field<Product, string>(

f => f.Description),

Price = record.Field<Product, double>(f => f.Price),

CreatedDate = record.Field<Product, DateTime>(

f => f.CreatedDate),

CreatedBy = record.Field<Product, string>(

f => f.CreatedBy),

ChangedBy = "Simon",

ChangedDate = DateTime.Now,

ChangeType = action.ToString() },

(ph) => db.AddToProductHistory(ph));

// TODO: make some changes to Product here...

db.SaveChanges();

}

In my Audit method I specify the source entity (Product) and target audit store entity (ProductHistory) as type parameters. This then enables me to easily create a strongly typed lambda for the first parameter that I know accepts an IDataRecord (“record”) and an EntityState (“action”), returning a ProductHistory record;

(record, action) => new ProductHistory() {

Id = record.Field<Product, int>(f => f.Id), ... <snip>

The purpose of this method is to map a Product data record to a ProductHistory entity. It has an IDataRecord input as I pass the OriginalValues to it, not the full Product entity (remember I’m saving the previous values in my audit table, not the new ones). Therefore this code creates a ProductHistory record, and initialises its properties using object initialiser syntax. If this code became too unwieldy I could easily factor it out into a helper method.

Next, you’ll notice that I’m using another extension method named “Field” that applies to the IDataRecord type. I use this to fetch a property value in a strongly typed way, and to make it easy to get at fields that represent an entity property.

In this case, I’m specifying that I expect the IDataRecord to contain fields that represent Product’s members. I’m also saying the particular property I’m after is an integer, and then I use a lambda to identify the property itself (Id in this case).

The second parameter to my Audit call looks like this;

(ph) => db.AddToProductHistory(ph)

This is because I need to know how to add audit records that are created to the ObjectContext, ensuring they are saved as part of the same SaveChanges transaction as the actual changes we’re tracking. The lambda receives a ProductHistory entity (which is known because of the “toType” type parameter in my call to Audit) which I choose to add to the ObjectContext by calling AddToProductHistory.

For this task I could have used the more general AddObject method available on an ObjectContext – but this needs a string value to identify the entity set, and I want as much compile time checking as possible.

The Mechanics of Auditing

Now you see how we setup auditing, let’s dive into the CreateAuditRecord method to see how it actually creates the audit trail.

First, the method retrieves a list of ObjectStateEntry objects that describe entities that have either been modified or deleted. Remember that my approach doesn’t record anything in the audit table when they are added;

IEnumerable<ObjectStateEntry> entities =

from e in context.ObjectStateManager.GetObjectStateEntries(

EntityState.Modified | EntityState.Deleted)

where

e.IsRelationship == false &&

typeof(fromType).IsAssignableFrom(e.Entity.GetType())

select e;

This also ensures that we only get results for entity types that are of type “fromType”, or inherit from it. Arguably this would be more efficient if it looped through all changes looking for the audit configuration for each type as it went... or there may be other desired behaviours for inheritance heirarchies – but that is out of scope for this post, and it should be easy for you to see how you might change it.

Next, we loop through all of our results;

foreach (ObjectStateEntry item in entities)

{

toType auditRecord =

mapping(item.OriginalValues, item.State);

addToContext(auditRecord);

}

... and it is just a case of using the supplied helper functions to map the OriginalValues IDataRecord to a ProductHistory (in this example), and then add it to the ObjectContext. Of course these helper functions are in the form of the lambdas we passed to our call to the Audit extension method.

Retrieving Point-in-Time Products

This design is based on the assumption that I will rarely need to retrieve audited data, as it requires some calculations and slightly complex SQL to complete. This is fine, but if you were retrieving audit data very frequently I would consider doing some further testing, with a view to optimising the approach.

So how do I get a Product record as it looked at 3pm on the 20^th April 2009, for example?

Simply put, to get the correct data fields for a given time I need to find the first ProductHistory record after the specified date & time. If there are none, I need the values on the Product record (as this means no changes had been made to the record at that point in time). Let’s break that down.

First I need to get the first record from ProductHistory for products that were created before the specified date & time, but were changed after it...

SELECT

Id, [Description], Price, CreatedBy, CreatedDate

FROM

(

SELECT

Id, [Description], Price, CreatedBy, CreatedDate,

ROW_NUMBER() OVER(ORDER BY ChangedDate) AS DateOrder

FROM

ProductHistory

WHERE

CreatedDate <= @PointInTime

AND ChangedDate > @PointInTime

AND Id = @Id

) A

WHERE

A.DateOrder = 1

Next, I need all Products that have no ProductHistory records that fall into the above category, but were created before the specified date & time;

SELECT

Id, [Description], Price, CreatedBy, CreatedDate

FROM

Product

WHERE

NOT EXISTS (

SELECT * FROM ProductHistory ph

WHERE ph.Id = Product.Id AND

CreatedDate <= @PointInTime

AND ChangedDate > @PointInTime)

AND CreatedDate <= @PointInTime

AND Id = @Id

Put these two query sections together using a UNION statement and we have the content of my [GetPointInTimeProduct] Stored Procedure (see the code download), which allows us to find the exact state of a Product given its identifier and a specific date and time. I’ve brought it into my model using a Function Import.

The Good and the Bad

I hope all that has made sense – as usual I’ve included the code as a download (standard disclaimers apply). There are of course some pros and cons to my approach – some that spring to mind are below. If you have comments on better ways to do this, or thoughts about my approach, feel free to chip in!

· We can easily record the username of whoever made the changes as we can deduce it in our C# code; even in a trusted subsystem model.

· The approach is easy to configure in a compile-time checked fashion. It doesn’t rely on possibly flawed convention (e.g. always store audit records in tables ending “Audit”). It should also therefore be easy to refactor to suit your needs.

· Mappings between entity and audit records are explicit, so it does not dictate too much about how you would do this. You could store completely different audit data to me if you’d like.

· The audit records participate in the ObjectContext’s Unit of Work; that is, they are saved in a batch of SQL statements with the data changes. It also means that due to the ordering and batching of updates that the likelihood of locks and deadlocks is arguably reduced when compared to some other approaches.

· The configuration of auditing for each type independently keeps the code clear, but introduces some inefficiency (such as many subscribers to the SavingChanges event, and many loops through the ObjectStateEntry collection).

· The data model, although simple, needs explanation. The history records may not work quite how a newcomer would assume.

· The complexity of the point-in-time SQL is slightly higher than I would like.

· [Edit] It is important to use some kind of concurrency checking to ensure that multiple audit records are not written by different users... I tend to use a timestamp with “Fixed” concurrency.

· [Edit] The biggest drawback of the code so far is that it doesn’t record changes to relationships... so it is suited well to resource data tables, but not so well to those involved in a complex model.

On balance, overall I like the approach. What do you think?

Note: Regarding the edits, I just reread this and realised I’d oversimplified for this post and missed two important points from my notes... that’ll teach me to type up blog posts late at night! Sorry.

Originally posted by Simon Ince on April 20th 2009 here.

Auditing Data Changes in the Entity Framework: Part 2

Audit Lifecycle

Enabling Auditing on the ObjectContext

The Mechanics of Auditing

Retrieving Point-in-Time Products

The Good and the Bad

Additional resources