Does test-driven development harm clarity?

In a recent keynote at RailsConf called Writing Software*, David Heinemeier Hansson argues that test-driven development (TDD) can harm the clarity of our code, and that clarity is the key thing we should be aiming for when writing software.

(*contains swearing)

It’s an excellent talk, and I would highly recommend watching it, especially if you are convinced (like me) that TDD is a good thing.

I was inspired by watching the video. Clarity certainly is extremely important, and the name he coins, Software Writer, sits better with me than Software Engineer or Software Developer. I have often felt Programmer was the best name for what I am, but maybe I will adopt Software Writer.

The real goal

I would argue that clarity is not our goal in writing software. I think our goal is:

Working, modifiable software

Clarity helps us feel confident that our software works because we can read the code and understand what it does.

Clarity helps us modify our software because we can understand what needs to be changed and are less likely to make mistakes when we change it.

A good set of full-system-level tests helps us feel confident that our software works because they prove it works in certain well-defined scenarios. A good set of component-level and unit tests gives us confidence that various parts work, but as David points out, confidence in these separate parts does not give us much meaningful confidence that the whole system works.

Good sets of tests at all levels help us modify our software because we are free to refactor (or re-draft as David puts it). Unit and component tests give us confidence that structural changes we are making don’t modify the external behaviour of the part we are re-structuring. Once we have made enabling structural changes, the changes we make that actually modify the system’s behaviour are smaller and easier to deal with. The tests that break or must be written when we modify the system’s behaviour help us understand and explain the behaviour changes we are making.

So both clarity and tests at all levels can contribute to our goal of writing working, modifiable software.

But David wasn’t talking about tests – he was talking about TDD – driving the design of software by writing tests.

How TDD encourages clarity

Before I talk about how we should accept some of what David is saying, let’s first remind ourselves of some counter-points. TDD is explicitly intended to improve our code.

I agree with David when he defines good code as clear code, so how does TDD encourage clarity?

TDD encourages us to break code into smaller parts, with names. Smaller, named classes and functions are clearer to me than large blocks of code containing sub-blocks that do specific jobs but are not named. When I write in a TDD style I find it encourages me to break code into smaller parts.

TDD encourages us to write code that works at a single level of abstraction. Code that mixes different levels is less clear than code at a single level. I find that using TDD helps me resist the temptation to mix levels because it encourages me to compose two pieces that deal separately with each level, rather than linking them together.

It is vital to point out here that TDD doesn’t push you towards clarity unless you already wanted to go there. I have seen TDD code that is unclear, stuffed full of boilerplate, formed by copy-paste repetition, and is generally awful. As a minimal counter-example, TDD doesn’t stop you using terrible names for things.

But, when you care about clarity, and have an eye for it, I feel TDD can help you achieve it.

How TDD hurts clarity

David’s argument against TDD is that it makes us write code that is less clear. His main argument, as I understand it, is:

TDD forces us to use unnecessary layers of abstraction. Because we must never depend on “the world,” TDD forces us to inject dependencies at every level. This makes our code more complex and less clear.

At its core, we must acknowledge that this argument is true. Where TDD causes us to inject dependencies that we otherwise would not inject, we are making our code more complex.

However, there are elements of a straw man here too. Whenever I can, I allow TDD to drive me towards systems with fewer dependencies, not injected dependencies. When I see a system with fewer dependencies, I almost always find it clearer.

Test against the real database?

David frequently repeats his example of testing without hitting the database. He points out that allowing this increases complexity, and that the resulting tests do not have anything like as much value as tests that do use the database.

This hurts, because I think his point is highly valid. I have seen lots of bugs, throughout systems (not just in code close to the database) that came from wrong assumptions about how the database would behave. Testing a wide range of functionality against the real database seems to be the only answer to this problem. Even testing against a real system that is faster (e.g. an in-memory database) will not help your discover all of these bugs because the faster database will have different behaviour from the production one.

On the other hand, tests that run against a real database will be too slow to run frequently during development, slowing everything down and reducing the positive effects of TDD.

I don’t know what the answer is, but part of it has got to be to write tests at component level, testing all the behaviour that is driven by database behaviour, but not requiring huge amounts of other systems to be spun up (e.g. the web server, LDAP, other HTTP endpoints) and run these against the real database as often as possible. If they only take about 5 minutes maybe it’s reasonable to ask developers to run them before they commit code.

But my gut tells me that running tests at this level should not absolve us from abstracting our code from the production database. It just feels Right to write code that works with different storage back ends. We are very likely to have to change the specific database we use several times in the history of our code, and we may well need to change the paradigm e.g. NoSQL->SQL.

In a component that is based on the database, I think you should unit test the logic in the standard TDD way, unit test the database code (e.g. code that generates SQL statements) against a fast fake database, AND comprehensively test the behaviour of the component as a whole against a real database. I admit this looks like a lot of tests, but if you avoid “dumb” unit tests that e.g. check getters and setters, I think these 3 levels have 3 different purposes, and all have value*.

*Writing the logic TDD encourages smaller units of logic, and gives confidence that it is correct. Testing the database code against a fake database gives confidence that our syntax is right, and testing the whole component against the real database gives us confidence that our behaviour really works.

Injecting the database as a dependency gives us the advantage of our code having two consumers, which is one of the strongest arguments put forward by proponents of TDD that it gives us better code. All programmers know that there are only three numbers: 0, 1 and more. By having “more” users of our database code, we (hopefully) end up with code that speaks at a single level e.g. there are no SQL statements peppered around code which has no business talking direct to the database.

Clarity of tests

In order for tests to drive good APIs in our production code, and for them to serve as documentation, they must be clear. I see a lot of test code that is full of repetition and long methods, and for me this makes it much less useful.

If our tests are complex and poorly factored, they won’t drive good practice in our production code. If we view unclear tests as a smell, the fixes we make will often encourage us to improve the clarity of our production code.

If our tests resemble (or actually are) automatically-generated dumb callers of each method, we will have high coverage and almost no value from them. If we try to change code that is tested in this way, we will be thwarted at every turn.

If, on the other hand, we write tests that are clear and simple expressions of behaviour we expect, we will find them easy to understand and maintain, they will drive clear code in production, and sometimes we may realise we are writing them at a higher level than unit tests. When this happens, we should “float freely” with David and embrace that. They are testing more. That is good.

Higher-level tests

David (with the support of Jim Coplien) encourages us to test at a coarser grain than the unit. I strongly agree that we need more emphasis on testing at the component and system levels, and sometimes less unit testing, since we don’t want write two tests that test the same thing.

However, there are some problems with larger tests.

First, larger tests make it difficult to identify what caused a problem. When a good unit test fails, the problem we have introduced (or the legitimate change in behaviour) is obvious. When a component or system test fails, often all we know is that something is wrong, and the dreaded debugging process must begin. In my experience, this is not just a myth. One of the pieces of code that I am most proud of in my career was a testing framework allowed you to write concise and clear tests of a horrible tangled mess of a component. The tests ran fairly quickly, with no external dependencies being used, but if one of them failed your heart sank.

Second, large tests can be fragile. Sometimes they fail because the database or network is down. Sometimes they fail because your threads did stuff in an unexpected order. Threading is a really good example: when I want to debug a weird threading issue I want to write an unthreaded test that throws stuff at a component in a well-defined but unusual order. When I can get that test to fail I know I’ve found a real problem, and I can fix it. If the code is not already unit tested (with dependencies being injected, sometimes) then writing that test can be really difficult.

TDD makes our code testable, and while testability (like clarity) is not an end in itself, it can be darn useful.

Conclusion

Good things

Clarity is good because it supports working, modifiable code.

Tests are good because they support working, modifiable code.

Testability is good because it supports tests, especially debugging tests.

TDD is good when it supports clarity, testability and tests.

TDD is bad when it hurts clarity.

If you throw out TDD, try not to throw out tests or testability. You will regret it.

What to do

Write tests at the right level. Don’t worry if the clearest level is not the unit level.

Use tests to improve clarity.

If your tests are unclear, there is something wrong. Fix it.

Steps to success

1. Get addicted to TDD.

2. Wean yourself off TDD and start to look at the minimal set of tests you can write to feel that sweet, sweet drug of confidence.

3. Do not skip step 1.

Avoid mocks by refactoring to functional

At this week’s ACCU Conference I went to an excellent talk by Sven Rosvall entitled “Unit Testing Beyond Mock Objects”.

The talk covered the newer Java and C# unit testing frameworks that allow inserting mock objects even where legacy code is using some undesired dependency directly, meaning there is no seam where you can insert a different implementation.

These tools solve a real problem, and could be useful.

However, I want to move the discussion in a different direction: can we avoid mocking altogether, and end up with better code?

Sven gave us an example of some legacy code a little bit like this. (I translated into C++ just for fun.)

// Return true if it's between 12:00 and 14:00 now.
bool is_it_lunch_time_now()
{
    system_clock::time_point now = system_clock::now();
    time_t tt = system_clock::to_time_t( now );
    tm local_tm = ( *localtime( &tt ) );
    int hour = local_tm.tm_hour;

    return (
        12 <= hour      &&
              hour < 14
    );
}

To test this code, we would have something like this:

TEST_CASE( "Now: It's lunchtime now (hopefully)" )
{
    // REQUIRE( is_it_lunch_time_now() ); // NOTE: only  run at lunch time!
    REQUIRE( !is_it_lunch_time_now() );   // NOTE: never run at lunch time!
}

Which is agony. But the production code is fine:

if ( is_it_lunch_time_now() )
{
    eat( sandwich );
}

So, the normal way to allow mocking out a dependency like this would be to add an interface, making our lunch-related code take in a TimeProvider. To avoid coupling the choice of which TimeProvider is used to the calling code, we pass it into the constructor of a LunchCalculator that we plan to make early on:

class TimeProvider
{
public:
    virtual tm now_local() const = 0;
};

class RealTimeProvider : public TimeProvider
{
public:
    RealTimeProvider()
    {
    }
    virtual tm now_local() const
    {
        system_clock::time_point now = system_clock::now();
        time_t tt = system_clock::to_time_t( now );
        return ( *localtime( &tt ) );
    }
};

class HardCodedHourTimeProvider : public TimeProvider
{
public:
    HardCodedHourTimeProvider( int hour )
    {
        tm_.tm_hour = hour;
    }
    virtual tm now_local() const
    {
        return tm_;
    }
private:
    tm tm_;
};

class LunchCalc
{
public:
    LunchCalc( const TimeProvider& prov )
    : prov_( prov )
    {
    }
    bool is_it_lunch_time()
    {
        int hour = prov_.now_local().tm_hour;
        return (
            12 <= hour      &&
                  hour < 14
        );
    }
private:
    const TimeProvider& prov_;
};

and now we can write tests like this:

TEST_CASE( "TimeProvider: Calculate lunch time when it is" )
{
    HardCodedHourTimeProvider tp( 13 ); // 1pm (lunch time!)
    REQUIRE( LunchCalc( tp ).is_it_lunch_time() );
}

TEST_CASE( "TimeProvider: Calculate lunch time when it isn't" )
{
    HardCodedHourTimeProvider tp( 10 ); // 10am (not lunch :-( )
    REQUIRE( ! LunchCalc( tp ).is_it_lunch_time() );
}

Innovatively, these tests will pass at all times of day.

However, look at the price we’ve had to pay: 4 new classes, inheritance, and class names ending with unKevlinic words like “Provider” and “Calculator”. (Even worse, in a vain attempt to hide my embarrassment I abbreviated to “Calc”.)

We’ve paid a bit of a price in our production code too:

    // Near the start of the program:
    config.time_provider = RealTimeProvider();

    // Later, with config passed via PfA
    if ( LunchCalc( config.time_provider ).is_it_lunch_time() )
    {
        eat( sandwich );
    }

The code above where we create the RealTimeProvider is probably not usefully testable, and the class RealTimeProvider is also probably not usefully testable (or safely testable in the case of some more dangerous dependency). The rest of this code is testable, but there is a lot of it, isn’t there?

The advantage of this approach is that we have been driven to a better structure. We can now switch providers on startup by providing config, and even code way above this stuff can be fully exercised safe in the knowledge that no real clocks were poked at any time.

But. Sometimes don’t you ever think this might be better?

tm tm_local_now()  // Not testable
{
    system_clock::time_point now = system_clock::now();
    time_t tt = system_clock::to_time_t( now );
    return ( *localtime( &tt ) );
}

bool is_lunch_time( const tm& time )  // Pure functional - eminently testable
{
    int hour = time.tm_hour;
    return (
        12 <= hour      &&
              hour < 14
    );
}

The tests look like this:

TEST_CASE( "Functional: Calculate lunch time when it is" )
{
    tm my_tm;
    my_tm.tm_hour = 13; // 1pm (lunch time!)
    REQUIRE( is_lunch_time( my_tm ) );
}

TEST_CASE( "Functional: Calculate lunch time when it isn't" )
{
    tm my_tm;
    my_tm.tm_hour = 10; // 10am (not lunch time :-( )
    REQUIRE( ! is_lunch_time( my_tm ) );
}

and the production code looks very similar to what we had before:

if ( is_lunch_time( tm_local_now() ) )
{
    eat( sandwich );
}

Where possible, let’s not mock stuff we can just test as pure, functional code.

Of course, the production code shown here is now untestable, so there may well be work needed to avoid calling it in a test. That work may involve a mock. Or we may find a nicer way to avoid it.

Using the final keyword in interface method parameters does nothing

Consider the following Java code:

class FinalInInterface
{
    private static interface WithFinal
    {
        public void run( final int x );
    }

    private static class WithoutFinal implements WithFinal
    {
        public void run( int x )
        {
            x = 4;
            System.out.println( x );
        }
    }

    public static void main( String[] args )
    {
        new WithoutFinal().run( 3 );
    }
}

This code compiles, and when it runs, it prints “4”.

So adding “final” to the x parameter of the run() method in the interface WithFinal has no effect – the implementor of this interface, WithoutFinal is allowed to declare its own run() method without “final”, and modify x as much as they want.

This makes more sense if you realise that in Java every method argument is passed by value. When you pass an object reference, the method and the calling code are talking about the same object, but the reference is passed by value, so if you reassign it inside the method, the original object is unaffected, and the reference to it in the calling code is also unaffected. The semantics are basically identical to passing a pointer in C or C++ – the pointer is copied, but the pointed-to object is the same.

If you declare method parameters final in your interface, people implementing that interface don’t need to declare them final, and can modify them in their implementations. It’s possible that by declaring them final you are trying to communicate to implementors of the interface that they should also declare them final, but as the designer of the interface, it’s really none of your business how the implementor implements it.

There is no “const” in Java, so you have no way of preventing the implementor of your interface from modifying an object that is passed in (by copying a reference to it).

If that makes you sad, join the club.