Unit Testing DbContexts and queries – the status quo

So here’s the scene. I’ve been working on unit testing for a solid day, my percent coverage is up, my code is better factored, and now I do my ‘what is the most untested class I have’ analysis one more time, and discover that it is… ‘BillingEntitiesContext’.

BillingEntitiesContext is my subclassed DbContext for describing my EF code first data model. (I am using EF 6.) This class logically has three sections:

  1. The constructor that calls the base class with a connection string (or connection string name). We can hope this is actually easy to unit tests, since it won’t talk to a database – database connection is lazy by default.
  2. The OnModelCreating() override. Which is where I declare the entities that are my data model. We can hope this is also actually easy to unit test, since it also doesn’t need to talk to a database!
  3. The kitchen sink! I.e. it is a bunch of prebaked LINQ queries exposed as helper methods for my application to use.

Here’s an example of a kitchen sink helper, just so we start on the same page:

internal Operation GetLastSuccessfulOperation()
{
return Set<Operation>()
.Where(op => op.State == OperationStates.NotificationDone)
.OrderByDescending(op => op.EndTime)
.FirstOrDefault();
}

OK. Anyway, the first half of the puzzle is how to test our Constructor and OnModelCreating(). We can do it easily, in one fell swoop:

[Fact]
public void TheBillingEntityDataModelCanBeInitialized()
{
Database.SetInitializer<BillingEntitiesContext>(null);
var context = new BillingEntitiesContext("database;doesnt;exist");
context.Database.Initialize(false);
}

Code coverage tools verify that this does indeed call our desired constructor and OnModelCreating() override.
We can guess it never tried to touch a database, since a totally bogus connection string is accepted. But who really knows.

So, anyway, that leaves me with 25 code blocks covered, and a hundred to go. An impressively easy 3% coverage gain in this tiny (almost toy sized) project. And I feel happy that it’s not just a pure numeric enhancement, it might theoretically even find a silly bug in model creation, should we ever accidentally introduce one… Of course F5 would have found it instantly also. OK.

Well that was the easy half of the problem, what about the queries? The subject of unit testing such queries is something I’ve looked at before, and I’ve previously seen, and tried, following approaches:

  • Integration Testing Optimist - Don’t unit test queries at all. In fact you don’t even need to unit test the code that calls the queries, as long as you have integration tests.
  • Query Blind – a.k.a. Repository pattern - Encapsulate all your queries (or create an entire abstract data access layer) in an interface, and stub out the queries. You end up unit testing everything except the queries. I have found that this approach does enable you to write tests of the code calling the queries fairly easily. But Ayende has written very nice lambasting of repository wrappers in general, which my personal experience somewhat backs up. Maintaining repository interfaces and wrapper classes for every DbSet in your repository is no fun.
  • Query pattern - Create a query class for every query, and treat each query as an injectable dependency which can be mocked out. You’re still testing everything except the queries.
  • Database Faker - Move the queries into to business logic and fake out the DBContext/DBSets . This is becoming a reasonably well-known approach, however, this comes with many possible leaky abstraction issues... In specific this works for testing simple queries but general it's actually quite hard to be sure the behavior of your fake matches the real DB.

Now I have 2 gut feelings about what is the right way to test this query code. I feel 100% code coverage of all this EF code should be possible in a meaningful way. Testing queries makes sense as a thing to do because you can think of test cases to validate that they find the right subset of data. However I have another gut feeling that queries are so expressive, in order to really verify a moderately complex query it can take a lot of tests, or more obviously a lot of test data to do some querying on.

Also, I received some advice about Database Faker. So far is the only approach of above that really comes close to testing the queries, unless you actually get to integration testing. And you can probably get it to work for the first few queries you try. Unfortunately the idea of testing with a fake database has some serious downsides. As my manager once put it, you’re walking down a slippery slope of implementing an ever more fake database to emulate more and more behaviors of the real database that you discover you need – like foreign key references, transactions, ooh lah lah. Not all your code is select queries after all!

So… who has found a better way?

Some interesting proposals I have not yet truly tried:

1) SQLite – as an in-memory database
2) Effort – an in-memory database: https://www.nuget.org/packages/Effort/

Unfortunately, I so far cannot get Effort to work. And SQLite sounds like it will be perhaps a little bit too full-SQL in performance cost, requiring applying the database schema etc… So that’s where I am right now. Still trying to think of an effective way that takes those gut feelings into account.