You Suck at TDD #6 - Improvements Phase 1

Welcome to the first post on improving the yucky code. I have a few points I'd like to cover briefly, and then we'll dive into the code.

First off, I'm going to end up refactoring to a specific endpoint, but that is not necessarily the only endpoint or even the best endpoint. Not only are there different opinions on whether one design is better or not, I have found that my own opinions have evolved over time.

Second, I'd like to talk about a meta question. Unless we are working with clean code, there are generally a number of things "wrong" with a given piece of code, which means a decision needs to be made of which issue to address first. I've noticed that many developers try to address the biggest issue first. If you go down this road, you will often find that what looked like a simple change balloons into a change involving multiple changes in 30 different files. You are fighting against the complexity, and the big problems often have solutions that are harder to find. This increases the  cost of the whole refactoring - it takes lots of time, it's disruptive to other people working in the code - and this can give refactoring a bad name on your team.

Instead, start with the smallest problem that you find. This should be easy and quick to fix, which means you will make good progress. Do this a few times, and sometimes the big problem becomes more tractable.

I also am a firm believer that refactoring has to be done in small, incremental steps. To make it easier to follow, I have checked in after every atomic change, and in reality I will probably bunch a few changes together and commit chunks that are a bit bigger, but I still prefer to work in the small, and I find that my commit size has gotten smaller over the years. Some engineering environments do not permit small checkins, and if you're in one of those, I suggest that you still try to work in very small tests.

Pairing is a wonderful way to do refactoring; pairs attack more issues and solve them better than individuals. You may also be able to use the continuous code review part of pairing to meet your code review requirement so that you can check in often.

My final bit of advice is that if you are lucky to work in languages with good refactoring tools, learn the refactorings that your tool offers, including the keyboard shortcuts. This makes a significant difference in the speed at which you can operate.

I hope you've spent a bit of time studying the code here. I've created a separate branch, and I'm committing along the way if you'd like to follow on. If the change in a commit was done automatically - in this case with Resharper - it is prefixed with a "R:".

I'd like to start making this code better. To refresh your memory, here's the method that we are dealing with. Take a quick look at it:

         public IEnumerable<Employee> GetEmployees(EmployeeFilterType employeeFilterType, string filter, FakeSqlConnection connection)
        {
            if (employeeFilterType == EmployeeFilterType.ByName && filter == null)
            {
                throw new ArgumentNullException("filter");
            }

            string query = "select * from employee, employee_role inner join employee.Id == employee_role.EmployeeId";

            List<Employee> result = new List<Employee>();
            using (FakeSqlCommand sqlCommand = new FakeSqlCommand(query, connection))
            {
                FakeSqlDataReader reader;
                int retryCount = 5;

                while (true)
                {
                    try
                    {
                        reader = sqlCommand.ExecuteReader();
                        break;
                    }
                    catch (Exception)
                    {
                        if (retryCount-- == 0) throw;
                    }
                }

                while (reader.Read())
                {
                    int id = reader.GetInt32(EmployeeIdColumnIndex);
                    string name = reader.GetString(EmployeeNameColumnIndex);
                    int age = reader.GetInt32(EmployeeAgeColumnIndex);
                    bool isSalaried = reader.GetBoolean(EmployeeIsSalariedColumnIndex);

                    switch (employeeFilterType)
                    {
                        case EmployeeFilterType.ByName:
                            if (!name.StartsWith(filter)) continue;
                            break;
                        case EmployeeFilterType.ExemptOnly:
                            if (age < 40 || !isSalaried) continue;
                            break;
                    }

                    result.Add(new Employee {Name = name, Id = id, Age = age, IsSalaried = isSalaried});
                }
            }

            return result;
        }

The simple thing that I'm going to attack is something that I see all over the place in code, a little code smell that is called "Primitive Obsession".

Simply speaking, primitive obsession is using a computer-sciency construct - such as a integer, a string, or a generic List - instead of a domain construct. When we do this, we generally end up with code specific to the construct scattered all over the codebase because there is no place for such code to cluster.

Are there any examples of primitive obsession in our code? Anyone? Bueller?

The employee filtering approach jumped out at me; we have both validity checking and the filter implementation in our method. After each description, I'll list the commit label so you can see what changed. The first thing I need to do is to create an abstraction for filter. Resharper has a cool refactoring that will take a set of parameters, and wrap them in a class, so that's what I used.

R: Extract filter class from parameters

The next thing I want to do is start moving the code related to the filter class from the yucky class to the new EmployeeFilter class. I'll start with the validation code; first I'll extract it into a method, and then I'll move the method into the EmployeeFilter class.

R: Extract Validate Method
R: Move Validate() to filter class

I'll do the same thing with the filtering code in the middle of the routine

R: Extract DoFilter() method
R: Move DoFilter() to EmployeeFilter

Now that I've done that, it's time to push the new class into a separate file.

R: Move type to another file

The two methods that I used are both static methods which take the EmployeeFilter class as a parameter. I'll make them instance methods.

Incidentally, the pattern of pulling out primitive types and creating domain types is known as Whole Value.

R: Make methods non-static

Looking at the code, I have the following statement:

                     if (employeeFilter.DoFilter(name, age, isSalaried)) continue;

That's a little weird; in that the true return means the filter didn't match. I couldn't find a way to do this with a refactoring, so I did it by hand.

Invert filter boolean

Let's look a bit more at the EmployeeFilter class. I note that it has properties for the Filter and EmployeeFilterType, but with the refactoring neither of those are used outside of the class any more. Resharper will let me inline them.

R: Inline Filter and EmployeeFilterType properties

The class is looking fairly good now, but there's still one thing that is bothering me - that call that I make to Validate() seems extraneous. Why should the GetEmployees() be responsible for that? Why can't EmployeeFilter guarantee that it is valid?

I'll start by extracting out the part of the method that I want to keep:

R: Extract the part of GetEmployees() that doesn't have the validate.

Now, I can inline the original method and rename the new one back to GetEmployees().

R: Inline method and rename

That has pulled the code out of the GetEmployees() class, but I want it in the EmployeeFilter class. I'll extract it into a factory method, and then move that to the EmployeeFilter class.

R: Extract creation into factor method
R: Move factory to EmployeeFilter class
R: Extract parameters out of the factory

At this point, what I would really like is a "Convert factory to constructor" refactoring, but it does not exist. If anybody has an idea how to do this automatically, let me know.

Move validate method to constructor

And now I can inline the factory to get rid of it...

R: Inline factory method

I don't really need a separate validation method, so I'll inline that as well.

R: Inline validate method

At this point, there's a brief little side issue I'd like to discuss... When I moved the validation to the EmployeeFilter class, I lost the null check in the GetEmployees() method. I'm not a huge fan of adding null checks just to guard against developer errors if the code isn't a library being used outside the team; I don't think the cost is worth the benefit.

But if you want that functionality, feel free to add the check in.

Now that we have a nice type, it's missing something. Tests. So, I went off and wrote a few. They are the next series of commits:

Validation test
Added test for failed name filter
Added test for passing named filter
Added exempt test for too young
Added test for old enough but not salaried
Added passing test for exempt

So, what did I accomplish? Well, the filter code now has a separate abstraction, and the GetEmployees() method does look a little bit better. The refactoring took me about 30 minutes, though the writing took a fair bit more. The vast majority of the work was done with automatic refactorings.

There is still a lot to be done, but that is all for this time.