You Suck at TDD #7 - Improvements Phase 2

All changes from this post are in the Improvements Phase 2 repo...

Commits that start with "R:" are done through Reshaper. I'll cluster them together this time.

R: Start of Phase 2

Last time, I created the EmployeeFilter abstraction and populated it with code from the GetEmployees() method. That was a pretty typical refactoring to deal with primitive obsession. But, I don't think I'm quite done with that particular thought, as there's another bit of primitive obsession hanging around in the code. This kind is arguably more common than the first type, especially in the Linq/IEnumerable world...

List<Employee> result = new List<Employee>();

Employee is a domain type, but List<T> is one of those computer-sciency types that I don't like. We end up with little bits of code that do things to the collection/enumerable scattered all over the codebase, and it makes our testing and cohesion a lot harder.

If your code has helper methods that take a List<T> or IEnumerable<T>, you probably have this. If you have extension methods that take these types, then you very likely have this. Or, if you have code that does foreach or Linq operations on the list in our code, you also have this. Fixing this can have a big impact on testability and maintainability.

So, let's see if we can morph this particular instance into something that's a little better.

I'll start by extracting the creation of the list into a separate method named CreateEmployeeCollection(), and extract the new method into a new EmployeeCollection class. This new class will be used instead of the raw list.

Now, I'm going to convert this method to an instance method. This takes two separate operations; first I need to add a parameter that will become the "this" parameter, and then I can convert it to a non-static method. When I add the parameter, I can specify that I want it to create a new instance of the class. Then I just use the "make it non-static" refactoring.

R: Pull result creation into a separate method
R: Extract class EmployeeCollection
R: Changed CreateEmployeeCollection signature to take an EmployeeCollection parameter
R: Make it non-static

I should probably note that it took me a couple of times to get this next series to work the way that I wanted it to work, so there are a couple of reverts that I did before starting forward again.

The first thing I need to do is to take the local variable and make it a field. When I do this, I'll initialize it in the constructor.

R: Introduce field to hold the collection

 And now I'm going to encapsulate it into a field. When I do this, I choose to update all of the usages to use the property. This choice is important.

R: Encapsulate the list of employees into a property

I'm almost done with this set of transformations. I now inline the CreateEmployeeCollection() method, which gives us a more normal usage in the caller:

var result = new EmployeeCollection().Items;

R: Inline create method back into the caller...

Still a bit weird, but there's a way to fix that. I'll start by extracting out the collection to a variable,which gives me this:

EmployeeCollection employeeCollection = new EmployeeCollection();
var result = employeeCollection.Items

and then I can inline result. That leaves us using the employee collection instead of the result.

R: Introduce a variable for the new type
R: Inline the result variable

All that I have left of this transformation is changing the return type of the method so that it returns the collection rather than the items. I used to just do this by hand, but I figured out a better way to do it. It's another variant of the "extract method / inline method" approach. Basically, I'm going to sequester the ".Items" part of the return statement and shove it back to the caller.

I'll extract the collection from the return statement into a new variable. Then I'll extract a new method that does everything except the last part of the return type, inline the old method, and rename the method to get back to where we started. Look at the commits if you want the details:

R: Create a variable for the new return type
R: Extract method except for the last return statement
R: Inline the original method
R: Rename GetEmployees2() to GetEmployees
R: One inline to clean things up

At this point, the EmployeeCollection is fully-formed. You may not be impressed by the simplification in this sample code - and that's a fair criticism - but this is a very simple bit of code. In most cases there are a lot of places where callers are using this data that I can fold into the EmployeeCollection class.

And... I don't think I'm quite done with the class yet. But that will have to wait, as it's time to write some tests. There really isn't a lot going on in the collection right now, so I only added a single test.

Onto the meat of the problem...

Here's a bit of code that I've been wrestling with since I wrote the original code. The problem is that the filtering code is intertwined into the database code. That's the thing I'm going to address next:

while (reader.Read())

                 {
                    int id = reader.GetInt32(EmployeeIdColumnIndex);
                    string name = reader.GetString(EmployeeNameColumnIndex);
                    int age = reader.GetInt32(EmployeeAgeColumnIndex);
                    bool isSalaried = reader.GetBoolean(EmployeeIsSalariedColumnIndex);

                    if (employeeFilter.Matches(name, age, isSalaried))
                    {
                        employeeCollection.Items.Add(new Employee
                        {
                            Name = name,
                            Id = id,
                            Age = age,
                            IsSalaried = isSalaried
                        });
                    }
                }

I started by pulling the creation of the employee instance out into a separate variable. And then, I decided that was the wrong thing, so I reverted it. Instead, I target the "match and add" code, and did the usual set of transformations to move it to the collection class:

R: Extract filter and add to a new method
R: Move the method to the EmployeeCollection class
R: Convert to instance method

This is looking better, but the signature for the function is like this:

public void AddEmployeeIfMatch(EmployeeFilter employeeFilter, string name, int age, bool isSalaried, int id)

It doesn't make a lot of sense to pass in the individual items to the filter; we should pass in an employee. So *now* I want to pull the creation of the employee out of the method.

R: Extract the Employee creation out of the method.

And extract it out into a variable...

R: Extract employee creation into a variable.

That leaves me with the following code;

         public void AddEmployeeIfMatch(EmployeeFilter employeeFilter, string name, int age, bool isSalaried, int id, Employee employee)
        {
            if (employeeFilter.Matches(name, age, isSalaried))
            {
                Items.Add(employee);
            }
        }

Now, I'd like to get rid of those individual variables and replace them with the employee instance. I want to do that at the filter level, and I'm going to use a little trick to get most of the way there. Basically, I'm going to create another class that is like that is the existing Employee class, and then merge them together.

On EmployeeFilter.Match, I do the "create class from parameters" refactoring, and create a nested Employee class. This looks pretty much like the employee class I've already created, except it's missing one of the fields.

R: Extract new Employee class

I'm left with this line:

            if (employeeFilter.Matches(new EmployeeFilter.Employee(name, age, isSalaried)))

and I'm just going to hand-edit it to say what I want:

            if (employeeFilter.Matches(employee))

and then delete the new extracted Employee class, and clean up the tests.

I'm not really happy with the mechanics of this refactoring, but I'm not smart enough to figure out something better.

Merge together usages of both Employee classes

Now I can delete the parameters I don't need any more

R: Delete no-longer-used parameters

Now that I've written some code, I want to backfill with tests. I pushed code into EmployeeCollection, so I'll start there. Here's an example of one of the tests that I wrote:

         [TestMethod]
        public void When_I_create_an_EmployeeCollection_and_add_an_employee_that_does_match__its_in_the_items_collection()
        {
            EmployeeCollection employeeCollection = new EmployeeCollection();

            EmployeeFilter employeeFilter = new EmployeeFilter(EmployeeFilterType.ByName, "X");

            employeeCollection.AddEmployeeIfMatch(employeeFilter, new Employee() { Name = "Xavier" });

            Assert.AreEqual(1, employeeCollection.Items.Count);
            Assert.AreEqual("Xavier", employeeCollection.Items.First().Name);
        }

Since these posts are supposedly about TDD, let's look at the feedback that the tests are giving us.

My preference is for tests that are very simple. This test isn't bad, but I don't like the fact that we require an EmployeeFilter instance to test whether the EmployeeCollection class works correctly. They really shouldn't be coupled together.

My traditional way of dealing with this is to define an interface - let's call it IEmployeeFilter - and then define a few test versions. Something like EmployeeFilterTrue and EmployeeFilterFalse. That's not a bad choice, but I had a different thought.

What if I went more functional? Instead of describing the matching function as implementing a specific interface, what if I described it as having a specific shape?

public void AddEmployeeIfMatch(Func<Employee, bool> employeeFilter, Employee employee)

That decouples the two classes, and now the tests do not need to create an EmployeeFilter instance. This design takes a little bit of getting used to, but the benefits of coupling through something other than a class or interface are significant.

Let me say that again because I think it's really important:

Taking a more abstract/functional approach to how classes coordinate with each other can have significant impact on how testable and maintainable your code is because it eliminates dependencies rather than just abstracting them.  

And, we also gained a new functionality; you can write a custom filter inline without having to modify the EmployeeFilter class, since you can just write an inline lambda filter.

Use Func<Employee, bool> as parameter

The code is getting better, but I think the filtering part is still a little bit weird. You can only filter as part of adding an item to a collection, which means that filtering must be coupled to fetching. This has been bothering me since I first started on the code. I've been thinking about how to do it with automatic refactoring, but haven't figured out how, so I'm just going to do it by hand.

I do it by introducing a new collection, so the fetching into the collection and the filtering act as two separate operations.

Here's the new code:

                 while (reader.Read())
                {
                    int id = reader.GetInt32(EmployeeIdColumnIndex);
                    string name = reader.GetString(EmployeeNameColumnIndex);
                    int age = reader.GetInt32(EmployeeAgeColumnIndex);
                    bool isSalaried = reader.GetBoolean(EmployeeIsSalariedColumnIndex);

                    var employee = new Employee
                    {
                        Name = name,
                        Id = id,
                        Age = age,
                        IsSalaried = isSalaried
                    };
                    employeeCollection.Items.Add(employee);
                }
            }

            EmployeeCollection filteredEmployees = new EmployeeCollection();
            foreach (Employee employee in employeeCollection.Items)
            {
                filteredEmployees.AddEmployeeIfMatch(employeeFilter.Matches, employee);
            }

            return filteredEmployees;

Now we have a two separate chunks of code. Let's see what we can do with it. First, let's move the Items.Add() into the collection:

R: Extract AddEmployee method
R: Move to EmployeeCollection class
R: Make it an instance method
R: Rename to Add()

Hmm. There's no reason that the GetEmployees() method needs to iterate over the collection and call the filter itself. Let's encapsulate and move it.

R: Extract to Filter method
R: Move to EmployeeCollection class
R: Make it an instance method
R: inline filtered collection

That is nicer, though I need to hoist the function out.

R: Inline add method
R: Make filter a parameter
R: Remove unused parameter

At this point, our approach is very Linq-like. We do need to do some cleanup on the tests. I changed them to use the new Filter() method and removed the old method.

Refactor tests, remove AddWithFilter()

That was a lot of individual steps, and there's probably about 30 minutes of work there real-time. I'm reasonably happy with the result, though there are some further opportunities for simplification in the collection classes.