Monday, June 8, 2020

Is the repository pattern useful with Entity Framework Core?


  1. Original: Analysing whether Repository pattern useful with Entity Framework (May 2014).
  2. First solution: Four months on – my solution to replacing the Repository pattern (Sept 2014).
  3. THIS ARTICLE: Is the repository pattern useful with Entity Framework Core?
  4. Architecture of Business Layer working with Entity Framework (Core and v6).
  5. Creating Domain-Driven Design entity classes with Entity Framework Core.
  6. GenericServices: A library to provide CRUD front-end services from a EF Core database.
  7. Wrapping your business logic with anti-corruption layers – NET Core.

TL;DR – summary

No, the repository/unit-of-work pattern (shortened to Rep/UoW) isn’t useful with EF Core. EF Core already implements a Rep/UoW pattern, so layering another Rep/UoW pattern on top of EF Core isn’t helpful.

A better solution is to use EF Core directly, which allows you to use all of EF Core’s feature to produce high-performing database accesses.

The aims of this article

This article looks at

  • What people are saying about the Rep/UoW pattern with EF.
  • The pro and cons of using a Rep/UoW pattern with EF.
  • Three ways to replace the Rep/UoW pattern with EF Core code.
  • How to make your EF Core database access code easy to find and refactor.
  • A discussion on unit testing EF Core code.

I’m going assume you are familiar with C# code and either Entity Framework 6 (EF6.x) or Entity Framework Core library. I do talk specifically about EF Core, but most of the article is also relevant to EF6.x.

Setting the scene

In 2013 I started work on a large web application specifically for healthcare modelling. I used ASP.NET MVC4 and EF 5, which had just come out and supported SQL Spatial types which handles geographic data. At that time the prevalent database access pattern was a Rep/UoW pattern – see this article written by Microsoft in 2013 on database access using EF Core and the Rep/UoW pattern.

I built my application using Rep/UoW, but found it a real pain point during development. I was constantly having to ‘tweak’ the repository code to fix little problems, and each ‘tweak’ could break something else! It was this that made me research into how to better implement my database access code.

Coming more up to date, I was contracted by a start-up company at the end of 2017 to help with a performance issue with their EF6.x application. The main part of the performance issue turned out to be due to lazy loading, which was needed because the application used the Rep/UoW pattern.

It turns out that a programmer that helped start the project had used the Rep/UoW pattern. On talking to the founder of the company, who is very tech savvy, he said that he found the Rep/UoW part of the application was quite opaque and hard to work with.

What people are saying against the repository pattern

In researching as part of my review of the current Spatial Modeller™ design I found some blog posts that make a compelling case for ditching the repository. The most cogent and well thought-out post of this kind is ‘Repositories On Top UnitOfWork Are Not a Good Idea’. Rob Conery’s main point is that the Rep/UoW just duplicates what Entity Framework (EF) DbContext give you anyway, so why hide a perfectly good framework behind a façade that adds no value. What Rob calls ‘this over-abstraction silliness’.

Another blog is ‘Why Entity Framework renders the Repository pattern obsolete’. In this Isaac Abraham adds that repository doesn’t make testing any easier, which is one thing it was supposed to do. This is even truer with EF Core, as you will see later.

So, are they right?

My views on the pros and cons of repository/unit-of-work pattern

Let me try and review the pros/cons of the Rep/UoW pattern in as even-handed way as I can. Here are my views.

The good parts of the Rep/UoW pattern (best first)

  1. Isolate your database code: The big plus of a repository pattern is that you know where all your database access code is. Also, you normally split your repository into sections, like the Catalogue Repository, the Order Processing Repository, etc which makes it easy to find the code a specific query that has a bug or needs performance tuning. That is definitely a big plus.
  2. AggregationDomain Driven-Design (DDD) is a way to design systems, and it suggests that you have a root entity, with other associated entities grouped to it. The example I use in my book “Entity Framework Core in Action” is a Book entity with a collection of Review entities. The reviews only make sense when linked to a book, so DDD says you should only alter the Reviews via the Book entity. The Rep/UoW pattern does this by providing a method to add/remove reviews to the Book Repository.
  3. Hiding complex T- SQL commands: Sometimes you need to bypass EF Core’s cleverness and use T-SQL. This type of access should be hidden from higher layers, yet easy to find to help with maintenance/refactoring. I should point out that Rob Conery’s post Command/Query Objects can also handle this.
  4. Easy to mock/test: It is easy to mock an individual repository, which makes unit testing code that accesses the database easier. This was true some years ago, but nowadays this there are other ways around this problem, which I will describe later.

You will note that I haven’t listed “replacement of EF Core with another database access library”. This is one of the ideas behind the Rep/UoW, but my view it’s a misconception, because a) it’s very hard replace a database access library, and b) are you really going to swap such a key library in your application? You wouldn’t put up a facade around ASP.NET or React.js, so why do that to your database access library?

The bad parts of the Rep/UoW pattern (worst first)

The first three items are all around performance. I’m not saying you can’t write an efficient Rep/UoW’s, but its hard work and I see many implementations that have built-in performance issues (including Microsoft’s old Rep/UoW’s implementation). Here is my list of the bad issues I find with the Rep/UoW pattern:

    1. Performance – handling relationships: A repository normally returns a IEnumerable /IQueryable result of one type, for instance in the Microsoft example, a Student entity class. Say you want to show information from a relationship that the Student has, such as their address? In that case the easiest way in a repository is to use lazy loading to read the students’ address entity in, and I see people doing this a lot. The problem is lazy loading causes a separate round-trip to the database for every relationship that it loads, which is slower than combining all your database accesses into one database round-trip. (The alternative is to have multiple query methods with different returns, but that makes your repository very large and cumbersome – see point 4).
    2. Data not in the required format: Because the repository assembly is normally created near to the database assembly the data returned might not be in the exact format the service or user needs. You might be able to adapt the repository output, but its a second stage you have to write. I think it is much better to form your query closer to the front-end and include any adaption of the data you need  (see more on this in the section “Service Layer” in one of my articles).
    3. Performance – update: Many Rep/UoW implementations try to hide EF Core, and in doing so don’t make use of all its features. For instance, a Rep/UoW would update an entity using the EF Core’ Update method, which save every property in the entity. Whereas, using EF Core’s built-in change tracking features it will only update the properties that have changed.
    4. Too genericThe more reusable the code is, the less usable it is.” Neil Ford, from the book Building evolutionary architectures.The allure of the Rep/UoW comes from the view that you can write one, generic repository then you use that to build all your sub-repositories, for instance Catalogue Repository, Order Processing Repository, etc. That should minimise the code you need to write, but my experience is that a generic repository works at the beginning, but as things get more complex you end up having to add more and more code to each individual repository.

To sum up the bad parts – a Rep/UoW hides EF Core, which means you can’t use EF Core’s features to produce simple, but efficient database access code.

How to use EF Core, but still benefit from the good parts of the Rep/UoW pattern

In the previous “good parts” section I listed isolationaggregationhiding, and unit testing, which a Rep/UoW did well. In this section I’m going to talk about a number different software patterns and practices which, when combined with a good architectural design, provides the same isolation, aggregation, etc. features when you are using EF Core directly.

I will explain each one and then pull them together in a layered software architecture.

1. Query objects: a way to isolate and hide database read code.

Database accessed can be broken down into four types: Create, Read, Update and Delete – known as CRUD. For me the read part, known as a query in EF Core, are often the hardest to build and performance tune. Many applications rely on good, fast queries such as, a list of products to buy, a list of things to do, and so on. The answer that people have come up with is query objects.

I first came across them in 2013 in Rob Conery’s article (mentioned earlier), where he refers to Command/Query Objects. Also, Jimmy Bogard produced post in 2012 called ‘Favor query objects over repositories’. Using .NET’s IQueryable type and extension methods then we can improve the query object pattern over Rob and Jimmy’s examples.

The listing below gives a simple example of a query object that can select the order in which a list of integers is sorted.

1
2
3
4
5
6
7
8
9
10
public static class MyLinqExtension
{
    public static IQueryable<int> MyOrder
        (this IQueryable<int> queryable, bool ascending)
    {
        return ascending
            ? queryable.OrderBy(num => num)
            : queryable.OrderByDescending(num => num);
    }
}

And here is an example of how the MyOrder query object is called.

1
2
3
4
5
6
var numsQ = new[] { 1, 5, 4, 2, 3 }.AsQueryable();
 
var result = numsQ
    .MyOrder(true)  
    .Where(x => x > 3) 
    .ToArray();

The MyOrder query object works because the IQueryable type holds a list of commands, which are executed when I apply the ToArray method. In my simple example I’m not using a database, but if we replaced the numsQ variable with a DbSet<T> property from the application’s DbContext, then the commands in the IQueryable<T> type would be converted to database commands.

Because the IQueryable<T> type isn’t executed until the end, you can chain multiple query objects together. Let me give you a more complex example of a database query from my book “Entity Framework Core in Action”. In the code below uses four query objects chained together to select, sort, filter and page the data on some books. You can see this in action on the live site http://efcoreinaction.com/.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public IQueryable<BookListDto> SortFilterPage
    (SortFilterPageOptions options)
{
    var booksQuery = _context.Books         
        .AsNoTracking()                     
        .MapBookToDto()                     
        .OrderBooksBy(options.OrderByOptions)
        .FilterBooksBy(options.FilterBy,    
                       options.FilterValue);
 
    options.SetupRestOfDto(booksQuery);     
 
    return booksQuery.Page(options.PageNum-1,
                           options.PageSize);
}

Query objects provides even better isolation than the Rep/UoW pattern because you can split up complex queries into a series of query objects that you can chain together. This makes it easier to write/understand, refactor and test. Also, if you have a query that needs raw SQL you can either use EF Core’s FromSql method, which returns IQueryable<T> too.

2. Approaches to handling Create, Update and Delete database accesses

The query objects handle the read part of the CRUD, but what about the Create, Update and Delete parts, where you write to the database? I’m going to show you two approaches to running a CUD action: direct use of EF Core commands, and using DDD methods in the entity class. Let’s look at very simple example of an Update: adding a review in my book app (see http://efcoreinaction.com/).

Note: If you want to try adding a review you can do that. There is a GitHub repo that goes with my book at https://github.com/JonPSmith/EfCoreInAction. To run the ASP.NET Core application then a) clone the repo, select branch Chapter05 (every chapter has a branch) and run the application locally. You will see an Admin button appear next to each book, with a few CUD commands.

Option 1 – direct use of EF Core commands

The most obvious approach is to use EF Core methods to do the update of the database. Here is a method that would add a new review to a book, with the review information provided by the user. Note: the ReviewDto is a class that holds the information returned by the user after they have filled in the review information.

1
2
3
4
5
6
7
8
9
10
public Book AddReviewToBook(ReviewDto dto)
{
    var book = _context.Books
        .Include(r => r.Reviews)
        .Single(k => k.BookId == dto.BookId);
    var newReview = new Review(dto.numStars, dto.comment, dto.voterName);
    book.Reviews.Add(newReview);
    _context.SaveChanges();
    return book;
}

The steps are:

  • Lines 3 to 5: load specific book, defined by the BookId in the review input, with its list of reviews
  • Line 6 to 7: Create a new review and add it to the book’s list of reviews
  • Line 8: The SaveChanges method is called, which updates the database.

NOTE: The AddReviewToBook method is in a class called AddReviewService, which lives in my ServiceLayer. This class is registered as a service and has a constructor that takes the application’s DbContext, which is injected by dependecy injection (DI). The injected value is stored in the private field _context, which the AddReviewToBook method can use to access the database.

This will add the new review to the database. It works, but there is another way to build this using a more DDD approach.

Option 2 – DDD-styled entity classes

EF Core offers us a new place to add your update code to – inside the entity class. EF Core has a feature called backing fields that makes building DDD entities possible. Backing fields allow you to control access to any relationship. This wasn’t really possible in EF6.x.

DDD talks about aggregation (mentioned earlier), and that all aggregates should only be altered via a method in the root entity, which I refer to as access methods. In DDD terms the reviews are an aggregate of the book entity, so we should add a review via an access method called AddReview in the Book entity class. This changes the code above to a method in the Book entity, here

1
2
3
4
5
6
7
8
public Book AddReviewToBook(ReviewDto dto)
{
    var book = _context.Find<Book>(dto.BookId);
    book.AddReview(dto.numStars, dto.comment,
         dto.voterName, _context);
    _context.SaveChanges();
    return book;
}

The AddReview access method in the Book entity class would look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
public class Book
{
    private HashSet<Review> _reviews;
    public IEnumerable<Review> Reviews => _reviews?.ToList();
    //...other properties left out
 
    //...constructors left out
 
    public void AddReview(int numStars, string comment,
        string voterName, DbContext context = null)
    {
        if (_reviews != null)   
        {
            _reviews.Add(new Review(numStars, comment, voterName));  
        }
        else if (context == null)
        {
            throw new ArgumentNullException(nameof(context),
                "You must provide a context if the Reviews collection isn't valid.");
        }
        else if (context.Entry(this).IsKeySet) 
        {
            context.Add(new Review(numStars, comment, voterName, BookId));
        }
        else                                    
        {                                       
            throw new InvalidOperationException("Could not add a new review."); 
        }
    }
    //... other access methods left out

This method is more sophisticated because it can handle two different cases: one where the Reviews have been loaded and one where it hasn’t. But it is faster than the original case, as it uses a “create relationship via foreign keys” approach if the Reviews are not already loaded.

Because the access method code is inside the entity class it can be more complex if need be, because its going to be the ONLY version of that code you need to write (DRY). In option 1 you could have the same code repeated in different places wherever you need to update the Book’s review collection.

NOTE: I have written an article called “Creating Domain-Driven Design entity classes with Entity Framework Core” all about DDD-styled entity classes. That has a much more detailed look at this topic. I have also updated my article on how to write business logic with EF Core to use the same DDD-styled entity classes.

Why doesn’t the method in the entity class called SaveChanges? In option 1 a single method contained all the parts: a) load entity, b) update entity, c) call SaveChanges to update the database. I could do that because I knew it was being called by a web action, and that was all I wanted to do.
With DDD entity methods you can’t call SaveChanges in the entity method because you can’t be sure the operation has finished. For instance, if you were loading a book from a backup you might want to create the book, add the authors, add any reviews, and then call SaveChanges so that everything is saved together.

Option 3: the GenericServices library

There is a third way. I noticed there was a standard pattern when using CRUD commands in the ASP.NET applications I was building, and back in 2014 I build a a library called GenericServices, which works with EF6.x. In 2018 I built a more comprehensive version called EfCore.GenericServices for EF Core (see this article on EfCore.GenericServices).

These libraries don’t really implement a repository pattern, but act as an adapter pattern between the entity classes and the actual data that the front-end needs. I have used the original, EF6.x, GenericServices and it has saved me months of writing boring front-end code. The new EfCore.GenericServices is even better, as it can work with both standard styled entity classes and DDD-styled entity classes.

Which option is best?

Option 1 (direct EF Core code) has the least code to write, but there is a possibility of duplication, because different parts of the application may want to apply CUD commands to an entity. For instance, you might have an update via the ServiceLayer when the user via changes things, but external API might not go through the ServiceLayer, so you have to repeat the CUD code.

Option 2 (DDD-styled entity classes) places the crucial update part inside the entity class, so the code going to be available to anyone who can get an entity instance. In fact, because the DDD-styled entity class “locks down” access to properties and collections everybody HAS to use the Book entity’s AddReview access method if they want to update the Reviews collection.  For many reasons this is the approach I want to use in future applications (see my article for a discussion on the pros and cons). The (slight) down side is its it needs a separate load/Save part, which means more code.

Option 3 (the EF6.x or EF Core GenericServices library) is my preferred approach, especially now I have build the EfCore.GenericServices version that handles DDD-styled entity classes. As you will see in the article about EfCore.GenericServices, this library drastically reduces the code you need to write in your web/mobile/desktop application. Of course, you still need to access the database in your business logic, but that is another story.

Organising your CRUD code

One good thing about the Rep/UoW pattern is it keeps all your data access code in one place. When swapping to using EF Core directly, then you could put your data access code anywhere, but that makes it hard for you or other team members to find it. Therefore, I recommend having a clear plan for where you put your code, and stick to it.

The following figure shows a Layered or Hexagonal architecture, with only three assemblies shown (I have left out the business logic, and in a hexagonal architecture you will have more assemblies). The three assemblies shown are:

  • ASP.NET Core: This is the presentation layer, either providing HTML pages and/or a web API. This no database access code but relies on the various methods in the ServiceLayer and BusinessLayers.
  • ServiceLayer: This contains the database access code, both the query objects and the Create, Update and Delete methods. The service layer uses an adapter pattern and command pattern to link the data layer and the ASP.NET Core (presentation) layer. (see this section from one of my articles about the service layer).
  • DataLayer: This contains the application’s DbContext and the entity classes. The DDD-styled entity classes then contain access methods to allow the root entity and its aggregates to be changed.

NOTE: The library GenericServices (EF6.x) and EfCore.GenericServices (EF Core) mentioned earlier are, in effect, a library that provides ServiceLayer features, i.e. that act as an adapter pattern and command pattern between the DataLayer and your web/mobile/desktop application.

The point I want make from this figure is, by using different assemblies, a simple naming standard (see the word Book in bold in the figure) and folders, you can build an application where your database code is isolated and it’s easy to find. As your application grows this can be critical.

Unit testing methods that use EF Core

The final part to look at is unit testing applications that use EF Core. One of the pluses of a repository pattern is you can replace it with a mock, when testing. So, using EF Core directly removed the option of mocking (technically you could mock EF Core, but it’s very hard to do well).

Thankfully things have moved on with EF Core and you can simulate the database with an in-memory database. In-memory databases are quicker to create and have a default start point (i.e. empty), so it’s much easier to write tests against. See my article, Using in-memory databases for unit testing EF Core applications, for a detailed look at how you can do that, plus an NuGet package called EfCore.TestSupport that provide methods to make writing EF Core unit tests quicker to write.

Conclusions

My last project that used the Rep/UoW pattern was back in 2013, and I have never used a Rep/UoW pattern again since then. I have tried a few approaches, a custom library called GenericServices with EF6.x, and now a more standard query object and DDD entity methods with EF Core. They are easy to write and normally perform well, but if they are slow it’s easy to find and performance tune individual database accesses.

In the book I wrote for Manning Publications I have a chapter where I performance tune a ASP.NET Core application that “sells” books. That process used query objects and DDD entity methods and shows that it can produce great performing database accesses (see my article Entity Framework Core performance tuning – a worked example for a summary).

My own work follows the query object for reads and DDD-styled entity classes with their access methods for CUD and business logic. I do need to use these in a proper application to really know if they work, but its promising. Wtach this space for more in DDD-styled entity classes, architecture that benefit from that, and maybe a new library :).

Happy coding!

No comments:

Post a Comment