You Suck at TDD #8 – Doing fewer things


Welcome back to You Suck at TDD. Today's code will show up in the Improvements-Phase-3 branch if you would like to follow along.

In our last episode, we concentrated mostly on the Employee fetching and filtering. Things are better, but we still have a problem...

Well, actually, we have a number of problems, but we'll start with the first one that I see, which is that Yucky.GetEmployees() is trying to do too many things - it both fetches data and filters it.

This is one of the most common things I see when I look at code - methods that try to do too much, and they are generally quite challenging to test because they do too much.

EmployeeSource class

So, we'll start by working on that. I pulled the fetching code out into a new EmployeeSource class and then I pulled the creation of the EmployeeSource instance out of GetEmployees().

At this point, GetEmployees() only has a couple of lines in it:

public static EmployeeCollection GetEmployees(EmployeeFilter employeeFilter, EmployeeSource employeeSource)
{
    var employeeCollection = employeeSource.FetchEmployees();

    return employeeCollection.Filter(employeeFilter.Matches);
}

We can easily simplify it with an inline of employeeCollection:

return employeeSource.FetchEmployees().Filter(employeeFilter.Matches);

At this point, there is no reason to have a separate method, so I inlined FetchEmployees() and did some cleanup in Main(), and deleted class Yucky.

This does make Main() more complex than I would like it, but I did the inline with a purpose; I find that my thoughts are often constrained by the abstractions that are present, and the act of getting rid of a bad abstraction sometimes makes it easier to find the right abstraction.

Spending a little time with Main(), it isn't testable code, and I'd like to get some tests in there...

If we abstract out writing the output to the console, we get something like this:

var collection = employeeSource.FetchEmployees().Filter(employeeFilter.Matches);

WriteToConsole(collection);

That's decent code; since it doesn't have any conditionals, it's likely code that either works all the time or never works at all. Not my first target when I'm looking to make things more testable, but if we wanted to cover it, how can we do it?

Well, the most obvious thing to do is to extract the IEmployeeSource interface, create a method that calls that, create a mock (or a simulator using P/A/S), and then write a test or two. That adds some complexity; you have another interface hanging around, and you need to create and maintain the mock and simulator. It will definitely work, but it is a fairly heavyweight approach.

I'm instead going to go with a different option. One of the things I've noticed with developers is that while they may pick up on the opportunity to create a data abstraction - or not, given the title of this series - they rarely pick up on the opportunity to create an algorithmic abstraction.

So, I'm going to go in that direction.

From an abstract perspective, this code does the following:

  • Creates a chunk of data
  • Performs an operation on that chunk of data to create a second chunk of data
  • Consumes that chunk of data

Our current implementation implements this through procedural statements, but we could consider doing this differently. Consider the following code:

public class Pipeline
{
    public static void Process<T1, T2>(Func<T1> source, Func<T1, T2> processor, Action<T2> sink )
    {
        sink(processor(source()));
    }
}

This is an abstraction of the pattern that we were using; it's general-purpose code that takes a source, processor, and sink, and knows how to wire them up to pass data between them. The version of Process() that I wrote handles this specific case, but you can obviously write versions that have more than one processor.

Doing that with our code results in the following:

Pipeline.Process(
    employeeSource.FetchEmployees,
    (employees) => employees.Filter(employeeFilter.Matches),
    WriteToConsole);

I like the first and third arguments, as they are just method names. I would like the second line also to be a method name.

We can recast this by adding a Filter() method to EmployeeFilter, which then yields:

Pipeline.Process(
    employeeSource.FetchEmployees,
    employeeFilter.Filter,
    WriteToConsole);

That makes me fairly happy; Pipeline.Process() is tested, and the code to use pipeline is almost declarative.

Take a few minutes and look through the current implementation, and write down what you like and what you don't like.

Looking at the code, it is improved, but there are still a number of things that I don't really like:

  • I don't have a way to test my EmployeeSource class, so there is probably a trip to P/A/S coming up.
  • I'm not really in love with Pipeline; I have an alternate approach in mind that I think will be better.
  • The implementation of EmployeeSource has that ugly retry logic in it; that is something algorithmic that I hope can be teased out of the fetching.

Those will come up next time.

 

Comments (1)

  1. Bjorn says:

    Although testability wise, the pipeline is probably a win, readability wise, I think it’s a step in the wrong direction:
    WriteToConsole( employeeSource.FetchEmployees().Filter(employeeFilter.Matches) );
    is a little easier to understand what’s happening than
    Pipeline.Process( employeeSource.FetchEmployees, employeeFilter.Filter, WriteToConsole);

    I think it’s because of parameters vs chaining. In the first form, I can easily tell that the filter operation is being applied to the fetched employees (because that’s what dot operator does), and that filter as applying the matches filter, because it’s the only parameter and that makes it, I think an obvious and easy assumption to make. And then we’re passing the result as the only parameter to writeToConsole, which we can assume will do the obvious thing.

    But in the second form, I’ve got some pipeline process that takes three things, but at the call site, just reading that line I have to think carefully about what each of the things being passed to the pipeline is, and then make an educated guess about what the process does with those three things. Because it’s not as obvious what each of those things represents in the context of the pipeline, I have to think harder about it. Maybe this can be helped by explicit parameter naming like:
    Pipeline.Process( source: employeeSource.FetchEmployees, processor: employeeFilter.Filter, sink: WriteToConsole);

Skip to main content