Repository Nightmares


The Repository pattern is a famous (or infamous?) pattern that we can find in Martin Fowler’s Patterns of Enterprise Application Architecture.

It was meant to be used as an interface to a collection, but what I have seen more often is that it becomes an abstraction to the data layer or ORM framework.

Not so long ago I did a presentation on Who killed object oriented programming, and I mentioned the Repository implementation as one of the culprits.

So what’s so bad about it? What can we do to improve it?

I have a counter-proposal: What if we don’t need it at all?

Repository overpopulation

Have you ever felt that the Repositories in your code replicate at night? You remember that when you started the project you had one or two, but now the number has grown to almost one per data model.

I remember starting with something like this:

public interface ICustomerRepository { IEnumerable<Customer> GetCustomers(); }

And then I needed to add queries such as find by Address, find by Id and find which customers have pending invoices:

public interface ICustomerRepository { IEnumerable<Customer> FindByAddress(Address address); IEnumerable<Customer> FindById(CustomerId id); IEnumerable<Customer> FindWithPendingInvoices(); }

Once one was completed, I replicated the recipe for each model: Invoice, Car, Chair, etc.

Eliminating the repetition

Maybe we don’t need one per model. What if we could use a generic class?

public interface IRepository<T> { IEnumerable<T> GetAll(); }

Now that’s slightly better, but what happened to the queries?

Extracting custom queries

Instead of adding queries following the business rules and modifying the Repository each time, why not use IQueryable instead? That way we can combine the results with further queries.

public interface IRepository<out T> { IQueryable<T> GetAll(); }

Then we just need to combine the filtering and ordering after:

var customers = ... // instantiate a concrete IRepository<Customer> return customers .Where(c => c.Address != null) .OrderBy(c => c.Name) .Take(100);

Common queries

We can easily put this logic (if it’s really commonly used) into an extension method (or another class) to avoid repeating the same query and to capture complexity.

public static class CustomerQueries { public static IQueryable<Customer> WithAddress(this IQueryable<Customer> customers) { return customers .Where(c => c.Address != null) .OrderBy(c => c.Name) .Take(100); } }

This would allow us to reuse the query when needed:

var customers = ... // instantiate a concrete IRepository<Customer> return customers.WithAddress();

Abstracted abstraction

We’ve done a great job (pat on the back) so far! Now it’s much simpler…so simple that when we look at the implementation, we see we are just using the ORM (NHibernate, Entity Framework, etc.) to return the main collection and nothing else.

The result in this case is that the Repository pattern is abstracting us from another abstraction! What are we gaining from this?

Why not just use the ORM? Are we planning on changing ORMs in the middle on the project?

We should take advantage of all the power that the ORM is giving us. We don’t want to have to copy the ability to do joins, sorting, etc.

Weeping…is cathartic…

What about testing?

Another reason to introduce the Repository could be testing.

But what are we going to test? Should we be testing the method GetAll?

Are we planning to mock IQueryable? Or, even further, mock all the methods that come with the ORM session?

The logic is mostly in the queries or, for example, in controllers doing queries in a web application.

Unit tests won’t work in this case. We don’t want to depend on which methods are used or in which order they’re used, so we need to make sure the inputs are defined and write assertions on the outputs enforcing the pre- and post-condition.

This case calls for integration tests (small tests setting up the database before and cleaning it up after) to help us ensure it works as expected.

Summary

Make your life simpler and your work more enjoyable. Abstractions should make the code easier to read. If that is not the case, then it is time to reflect and review.

Perhaps it is a bit late to change the project we are working on right now, but next time let’s make sure we give quite a bit more thought to using the Repository pattern.

Comments (6)

  1. Steve Smith says:

    The repository pattern (deviq.com/repository-pattern) isn't meant to "be used as an interface to a collection", it's mean to provide a collection-like interface to any kind of data. You were on the right track with this post as you eliminated duplication with generic implementations and support for passing queries into the repository, but after that I have to disagree with you. Exposing IQueryable is generally asking for trouble, since it allows your data access layer concerns to leak all over your application. At compile time, some expressions will work just fine, but then will fail if EF tries to convert them into SQL, for instance. Query logic should be encapsulated, not scattered throughout your UI layer. In web applications, you almost never want lazy loading, but IQueryable often works best with lazy loading.

    What about testing? It's not about trying to test GetAll, it's about trying to test any method that uses it. And these shouldn't require integration tests - you should be able to simply have a list and use that for your test, rather than having to spin up a database (or use an in memory database) using EF.

    How often should we plan on updating our ORM? About every 18 months, even if we just use the current recommendation coming out of Microsoft. Let's see, DataReaders, DataSets, SqlHelper, Typed DataSets, LINQ-to-SQL, EF 1, EF 4, EF 5-6, soon EF 7. That doesn't even account for the rise of NoSQL options and cloud stores. Of *course* you're going to want to avoid having references to a particular ORM or data access technology scattered throughout your application. To do otherwise is irresponsible and demonstrates a lack of understanding of the history of this space.

    Used properly, the Repository pattern is a great way to keep infrastructure concerns out of your core business logic and your UI project. Used poorly, like anything else, it can do more harm than good. But please don't promote hard-coding references to the ORM-of-the-week throughout the application, with query logic scattered everywhere to boot.

  2. Amir Barylko says:

    Thanks Steve for your comment, though I agree with "used poorly ... can do more harm than good" here are a few clarifications:

    * The fact that some expressions can't be converted to SQL is not related to using a repository or not.

    * My suggestion is to encapsulate the queries, I never said that you have to scatter the logic in the UI layer.

    * Not sure why you say that web apps don't want lazy loading... though I don't agree I respect your experience and point of view,

    * Database queries at some point require integration tests... you can postpone and mock interfaces etc, but you need an integration test to make sure that the query is getting the right data. You can have the db in memory if you wish ... but it is still an integration test.

    *I never changed ORMs every 18 months in any of the projects I worked on... once the project is implemented with a particular technology is hard to sell the case for such a big refactoring... unless of course is causing bugs/pain. I understand implementing new technology in parallel if the need arises but going for a full refactor... not so much.

  3. Steve Smith says:

    Hi Amir,

    Lazy loading is a separate discussion, but ideally you want as few out-of-process calls per web request as possible, and lazy loading makes more calls than if you just got the data you needed in one call.

    And I'm not against integration tests. Yes, of course, have integration tests. Just don't use them when a faster test (i.e. a unit test) can do the same thing. Forcing yourself to only be able to test using integration tests because of tight coupling to data access concerns is not good.

    Upgrading a data access strategy is a HUGE refactor...in a system that references it from all over the place. Doing it for a system that has it encapsulated away in its own project and/or implementations is often not a big deal. Coding in such a way that future changes in data access technology can be accommodated as no big deal is something I've learned to do. You can argue it goes against YAGNI (http://deviq.com/yagni/), but in my experience it's worth the small amount of effort it takes once you know how to use dependency injection and how to architect the solution properly to support this domain-driven style.

  4. Kyle Baley says:

    A couple of things jump out at me reading the comments:

    - I don't think lazy loading has anything to do with this. Making calls to IQueryable will get back the data you want regardless of if this is a web app or not. Many ORMs will even optimize this for you if I'm not mistaken depending on how you have it configured. I don't see anything in the article that screams out "this is going to the database more than it should".

    - I think you _have to_ have the integration test to make sure you get back the proper data. Yes, you can have an IRepository that you can mock for the purpose of some test but at some point, you need to make sure that the query is returning the exact data that you want from your data store regardless of what you do with that data afterward. In my experience, this test is often ignored.

    - "Why not just use the ORM?" sounds snappy but probably needed some clarification. I don't think we need to reference the ORM directly everywhere in the code. There's always room for abstraction where it makes sense. In this case, I think the underlying point is "Why abstract something that's already an abstraction (i.e. IQueryable)"

    - Who swaps ORMs in any but the tiniest of projects? Just because there's something new? I can see maybe updating to the newest one if you really want to put it on your resume but even then, there has to be some real business value there.

  5. Mathematics says:

    I just couldn't get your argument first and then the solution you are presenting either...

  6. amirci says:

    Hi Mathematics,

    Please can you clarify what you mean?

Skip to main content