Does test-driven development harm clarity?

In a recent keynote at RailsConf called Writing Software*, David Heinemeier Hansson argues that test-driven development (TDD) can harm the clarity of our code, and that clarity is the key thing we should be aiming for when writing software.

(*contains swearing)

It’s an excellent talk, and I would highly recommend watching it, especially if you are convinced (like me) that TDD is a good thing.

I was inspired by watching the video. Clarity certainly is extremely important, and the name he coins, Software Writer, sits better with me than Software Engineer or Software Developer. I have often felt Programmer was the best name for what I am, but maybe I will adopt Software Writer.

The real goal

I would argue that clarity is not our goal in writing software. I think our goal is:

Working, modifiable software

Clarity helps us feel confident that our software works because we can read the code and understand what it does.

Clarity helps us modify our software because we can understand what needs to be changed and are less likely to make mistakes when we change it.

A good set of full-system-level tests helps us feel confident that our software works because they prove it works in certain well-defined scenarios. A good set of component-level and unit tests gives us confidence that various parts work, but as David points out, confidence in these separate parts does not give us much meaningful confidence that the whole system works.

Good sets of tests at all levels help us modify our software because we are free to refactor (or re-draft as David puts it). Unit and component tests give us confidence that structural changes we are making don’t modify the external behaviour of the part we are re-structuring. Once we have made enabling structural changes, the changes we make that actually modify the system’s behaviour are smaller and easier to deal with. The tests that break or must be written when we modify the system’s behaviour help us understand and explain the behaviour changes we are making.

So both clarity and tests at all levels can contribute to our goal of writing working, modifiable software.

But David wasn’t talking about tests – he was talking about TDD – driving the design of software by writing tests.

How TDD encourages clarity

Before I talk about how we should accept some of what David is saying, let’s first remind ourselves of some counter-points. TDD is explicitly intended to improve our code.

I agree with David when he defines good code as clear code, so how does TDD encourage clarity?

TDD encourages us to break code into smaller parts, with names. Smaller, named classes and functions are clearer to me than large blocks of code containing sub-blocks that do specific jobs but are not named. When I write in a TDD style I find it encourages me to break code into smaller parts.

TDD encourages us to write code that works at a single level of abstraction. Code that mixes different levels is less clear than code at a single level. I find that using TDD helps me resist the temptation to mix levels because it encourages me to compose two pieces that deal separately with each level, rather than linking them together.

It is vital to point out here that TDD doesn’t push you towards clarity unless you already wanted to go there. I have seen TDD code that is unclear, stuffed full of boilerplate, formed by copy-paste repetition, and is generally awful. As a minimal counter-example, TDD doesn’t stop you using terrible names for things.

But, when you care about clarity, and have an eye for it, I feel TDD can help you achieve it.

How TDD hurts clarity

David’s argument against TDD is that it makes us write code that is less clear. His main argument, as I understand it, is:

TDD forces us to use unnecessary layers of abstraction. Because we must never depend on “the world,” TDD forces us to inject dependencies at every level. This makes our code more complex and less clear.

At its core, we must acknowledge that this argument is true. Where TDD causes us to inject dependencies that we otherwise would not inject, we are making our code more complex.

However, there are elements of a straw man here too. Whenever I can, I allow TDD to drive me towards systems with fewer dependencies, not injected dependencies. When I see a system with fewer dependencies, I almost always find it clearer.

Test against the real database?

David frequently repeats his example of testing without hitting the database. He points out that allowing this increases complexity, and that the resulting tests do not have anything like as much value as tests that do use the database.

This hurts, because I think his point is highly valid. I have seen lots of bugs, throughout systems (not just in code close to the database) that came from wrong assumptions about how the database would behave. Testing a wide range of functionality against the real database seems to be the only answer to this problem. Even testing against a real system that is faster (e.g. an in-memory database) will not help your discover all of these bugs because the faster database will have different behaviour from the production one.

On the other hand, tests that run against a real database will be too slow to run frequently during development, slowing everything down and reducing the positive effects of TDD.

I don’t know what the answer is, but part of it has got to be to write tests at component level, testing all the behaviour that is driven by database behaviour, but not requiring huge amounts of other systems to be spun up (e.g. the web server, LDAP, other HTTP endpoints) and run these against the real database as often as possible. If they only take about 5 minutes maybe it’s reasonable to ask developers to run them before they commit code.

But my gut tells me that running tests at this level should not absolve us from abstracting our code from the production database. It just feels Right to write code that works with different storage back ends. We are very likely to have to change the specific database we use several times in the history of our code, and we may well need to change the paradigm e.g. NoSQL->SQL.

In a component that is based on the database, I think you should unit test the logic in the standard TDD way, unit test the database code (e.g. code that generates SQL statements) against a fast fake database, AND comprehensively test the behaviour of the component as a whole against a real database. I admit this looks like a lot of tests, but if you avoid “dumb” unit tests that e.g. check getters and setters, I think these 3 levels have 3 different purposes, and all have value*.

*Writing the logic TDD encourages smaller units of logic, and gives confidence that it is correct. Testing the database code against a fake database gives confidence that our syntax is right, and testing the whole component against the real database gives us confidence that our behaviour really works.

Injecting the database as a dependency gives us the advantage of our code having two consumers, which is one of the strongest arguments put forward by proponents of TDD that it gives us better code. All programmers know that there are only three numbers: 0, 1 and more. By having “more” users of our database code, we (hopefully) end up with code that speaks at a single level e.g. there are no SQL statements peppered around code which has no business talking direct to the database.

Clarity of tests

In order for tests to drive good APIs in our production code, and for them to serve as documentation, they must be clear. I see a lot of test code that is full of repetition and long methods, and for me this makes it much less useful.

If our tests are complex and poorly factored, they won’t drive good practice in our production code. If we view unclear tests as a smell, the fixes we make will often encourage us to improve the clarity of our production code.

If our tests resemble (or actually are) automatically-generated dumb callers of each method, we will have high coverage and almost no value from them. If we try to change code that is tested in this way, we will be thwarted at every turn.

If, on the other hand, we write tests that are clear and simple expressions of behaviour we expect, we will find them easy to understand and maintain, they will drive clear code in production, and sometimes we may realise we are writing them at a higher level than unit tests. When this happens, we should “float freely” with David and embrace that. They are testing more. That is good.

Higher-level tests

David (with the support of Jim Coplien) encourages us to test at a coarser grain than the unit. I strongly agree that we need more emphasis on testing at the component and system levels, and sometimes less unit testing, since we don’t want write two tests that test the same thing.

However, there are some problems with larger tests.

First, larger tests make it difficult to identify what caused a problem. When a good unit test fails, the problem we have introduced (or the legitimate change in behaviour) is obvious. When a component or system test fails, often all we know is that something is wrong, and the dreaded debugging process must begin. In my experience, this is not just a myth. One of the pieces of code that I am most proud of in my career was a testing framework allowed you to write concise and clear tests of a horrible tangled mess of a component. The tests ran fairly quickly, with no external dependencies being used, but if one of them failed your heart sank.

Second, large tests can be fragile. Sometimes they fail because the database or network is down. Sometimes they fail because your threads did stuff in an unexpected order. Threading is a really good example: when I want to debug a weird threading issue I want to write an unthreaded test that throws stuff at a component in a well-defined but unusual order. When I can get that test to fail I know I’ve found a real problem, and I can fix it. If the code is not already unit tested (with dependencies being injected, sometimes) then writing that test can be really difficult.

TDD makes our code testable, and while testability (like clarity) is not an end in itself, it can be darn useful.

Conclusion

Good things

Clarity is good because it supports working, modifiable code.

Tests are good because they support working, modifiable code.

Testability is good because it supports tests, especially debugging tests.

TDD is good when it supports clarity, testability and tests.

TDD is bad when it hurts clarity.

If you throw out TDD, try not to throw out tests or testability. You will regret it.

What to do

Write tests at the right level. Don’t worry if the clearest level is not the unit level.

Use tests to improve clarity.

If your tests are unclear, there is something wrong. Fix it.

Steps to success

1. Get addicted to TDD.

2. Wean yourself off TDD and start to look at the minimal set of tests you can write to feel that sweet, sweet drug of confidence.

3. Do not skip step 1.

Avoid mocks by refactoring to functional

At this week’s ACCU Conference I went to an excellent talk by Sven Rosvall entitled “Unit Testing Beyond Mock Objects”.

The talk covered the newer Java and C# unit testing frameworks that allow inserting mock objects even where legacy code is using some undesired dependency directly, meaning there is no seam where you can insert a different implementation.

These tools solve a real problem, and could be useful.

However, I want to move the discussion in a different direction: can we avoid mocking altogether, and end up with better code?

Sven gave us an example of some legacy code a little bit like this. (I translated into C++ just for fun.)

// Return true if it's between 12:00 and 14:00 now.
bool is_it_lunch_time_now()
{
    system_clock::time_point now = system_clock::now();
    time_t tt = system_clock::to_time_t( now );
    tm local_tm = ( *localtime( &tt ) );
    int hour = local_tm.tm_hour;

    return (
        12 <= hour      &&
              hour < 14
    );
}

To test this code, we would have something like this:

TEST_CASE( "Now: It's lunchtime now (hopefully)" )
{
    // REQUIRE( is_it_lunch_time_now() ); // NOTE: only  run at lunch time!
    REQUIRE( !is_it_lunch_time_now() );   // NOTE: never run at lunch time!
}

Which is agony. But the production code is fine:

if ( is_it_lunch_time_now() )
{
    eat( sandwich );
}

So, the normal way to allow mocking out a dependency like this would be to add an interface, making our lunch-related code take in a TimeProvider. To avoid coupling the choice of which TimeProvider is used to the calling code, we pass it into the constructor of a LunchCalculator that we plan to make early on:

class TimeProvider
{
public:
    virtual tm now_local() const = 0;
};

class RealTimeProvider : public TimeProvider
{
public:
    RealTimeProvider()
    {
    }
    virtual tm now_local() const
    {
        system_clock::time_point now = system_clock::now();
        time_t tt = system_clock::to_time_t( now );
        return ( *localtime( &tt ) );
    }
};

class HardCodedHourTimeProvider : public TimeProvider
{
public:
    HardCodedHourTimeProvider( int hour )
    {
        tm_.tm_hour = hour;
    }
    virtual tm now_local() const
    {
        return tm_;
    }
private:
    tm tm_;
};

class LunchCalc
{
public:
    LunchCalc( const TimeProvider& prov )
    : prov_( prov )
    {
    }
    bool is_it_lunch_time()
    {
        int hour = prov_.now_local().tm_hour;
        return (
            12 <= hour      &&
                  hour < 14
        );
    }
private:
    const TimeProvider& prov_;
};

and now we can write tests like this:

TEST_CASE( "TimeProvider: Calculate lunch time when it is" )
{
    HardCodedHourTimeProvider tp( 13 ); // 1pm (lunch time!)
    REQUIRE( LunchCalc( tp ).is_it_lunch_time() );
}

TEST_CASE( "TimeProvider: Calculate lunch time when it isn't" )
{
    HardCodedHourTimeProvider tp( 10 ); // 10am (not lunch :-( )
    REQUIRE( ! LunchCalc( tp ).is_it_lunch_time() );
}

Innovatively, these tests will pass at all times of day.

However, look at the price we’ve had to pay: 4 new classes, inheritance, and class names ending with unKevlinic words like “Provider” and “Calculator”. (Even worse, in a vain attempt to hide my embarrassment I abbreviated to “Calc”.)

We’ve paid a bit of a price in our production code too:

    // Near the start of the program:
    config.time_provider = RealTimeProvider();

    // Later, with config passed via PfA
    if ( LunchCalc( config.time_provider ).is_it_lunch_time() )
    {
        eat( sandwich );
    }

The code above where we create the RealTimeProvider is probably not usefully testable, and the class RealTimeProvider is also probably not usefully testable (or safely testable in the case of some more dangerous dependency). The rest of this code is testable, but there is a lot of it, isn’t there?

The advantage of this approach is that we have been driven to a better structure. We can now switch providers on startup by providing config, and even code way above this stuff can be fully exercised safe in the knowledge that no real clocks were poked at any time.

But. Sometimes don’t you ever think this might be better?

tm tm_local_now()  // Not testable
{
    system_clock::time_point now = system_clock::now();
    time_t tt = system_clock::to_time_t( now );
    return ( *localtime( &tt ) );
}

bool is_lunch_time( const tm& time )  // Pure functional - eminently testable
{
    int hour = time.tm_hour;
    return (
        12 <= hour      &&
              hour < 14
    );
}

The tests look like this:

TEST_CASE( "Functional: Calculate lunch time when it is" )
{
    tm my_tm;
    my_tm.tm_hour = 13; // 1pm (lunch time!)
    REQUIRE( is_lunch_time( my_tm ) );
}

TEST_CASE( "Functional: Calculate lunch time when it isn't" )
{
    tm my_tm;
    my_tm.tm_hour = 10; // 10am (not lunch time :-( )
    REQUIRE( ! is_lunch_time( my_tm ) );
}

and the production code looks very similar to what we had before:

if ( is_lunch_time( tm_local_now() ) )
{
    eat( sandwich );
}

Where possible, let’s not mock stuff we can just test as pure, functional code.

Of course, the production code shown here is now untestable, so there may well be work needed to avoid calling it in a test. That work may involve a mock. Or we may find a nicer way to avoid it.

setUp and tearDown considered harmful

Some unit test frameworks provide methods (often called setUp and tearDown, or annotated with @Before and @After) that are called automatically before a unit test executes, and afterwards.

This structure is presumably intended to avoid repetition of code that is identical in all the tests within one test file.

I have always instinctively avoided using these methods, and when a colleague used them recently I thought I should try to write up why I feel negative about them. Here it is:

Update: note I am talking about unit tests here. I know these frameworks can be used for other types of test, and maybe in that context these methods could be useful.

1. It’s action at a distance

setUp and tearDown are called automatically, with no indication in your code that you use them, or don’t use them. They are “magic”, and everyone hates magic.

If someone is reading your test (because it broke, probably) they don’t know whether some setUp will be called without manually scanning your code to find out whether it exists. Do you hate them?

2. setUp contains useless stuff

How many tests do you have in one file? When you first write it, maybe, just maybe, all the tests need the exact same setup. Later, you’ll write new tests that only use part of it.

Very soon, you grow an uber-setUp that does all the setup for various different tests, creating objects you don’t need. This adds complexity for everyone who has to read your tests – they don’t know which bits of setUp are used in this test, and which are cruft for something else.

3. They require member variables

The only useful work you can do inside setUp and tearDown is creating and modifying member variables.

Now your tests aren’t self-contained – they use these member variables, and you must make absolutely sure that your test works no matter what state they are in. These member variables are not useful for anything else – they are purely an artifact of the choice to use setUp and tearDown.

4. A named function is better

When you have setup code to share, write a function or method. Give it a name, make it return the thing it creates. By giving it a name you make your test easier to read. By returning what it creates, you avoid the use of member variables. By avoiding the magic setUp method, you give yourself the option of calling more than one setup function, making code re-use more granular (if you want).

5. What goes in tearDown?

If you’re using tearDown, what are you doing?

Are you tearing down some global state? I thought this was a unit test?

Are you ensuring nothing is left in an unpredictable state for future tests? Surely those tests guarantee their state at the start?

What possible use is there for this function?

Conclusion

A unit test should be a self-contained pure function, with no dependencies on other state. setUp and tearDown force you to depend on member variables of your test class, for no benefits over named functions, except that you don’t have to type their names. I consider typing the name of a properly-named function to be a benefit.

Everybody loves build.xml (test-driven Ant)

In the previous post we looked at how it is possible to write reasonable code in Ant, by writing small re-usable blocks of code.

Of course, if you’re going to have any confidence in your build file you’re going to need to test it. Now we’ve learnt some basic Ant techniques, we’re ready to do the necessary magic that allows us to write tests.

Slides: Everybody loves build.xml slides.

First, let me clear up what we’re testing:

What do we want to test?

We’re not testing our Java code. We know how to do that, and to run tests, if we’ve written them using JUnit, just needs a <junit> tag in our build.xml. (Other testing frameworks are available and some people say they’re better).

The things we want to test are:

  • build artifacts – the “output” of our builds i.e. JAR files, zips and things created when we run the build,
  • build logic – such as whether dependencies are correct, whether the build succeeds or fails under certain conditions, and
  • units of code – checking whether individual macros or code snippets are correct.

Note, if you’re familiar with the terminology, that testing build artifacts can never be a “unit test”, since it involves creating real files on the disk and running the real build.

Below we’ll see how I found ways to test build artifacts, and some ideas I had to do the other two, but certainly not a comprehensive solution. Your contributions are welcome.

Before we start, let’s see how I’m laying out my code:

Code layout

build.xml      - real build file
asserts.xml    - support code for tests
test-build.xml - actual tests

I have a normal build file called build.xml, a file containing support code for the tests (mostly macros allowing us to make assertions) called asserts.xml, and a file containing the actual tests called test-build.xml.

To run the tests I invoke Ant like this:

ant -f test-build.xml test-name

test-build.xml uses an include to get the assertions:

<include file="asserts.xml"/>

Tests call a target inside build.xml using subant, then use the code in asserts.xml to make assertions about what happened.

Simple example: code got compiled

If we want to check that a <javac …> task worked, we can just check that a .class file was created:

Here’s the test, in test-build.xml:

<target name="test-class-file-created">
    <assert-target-creates-file
        target="build"
        file="bin/my/package/ExampleFile.class"
    />
</target>

We run it like this:

ant -f test-build.xml test-class-file-created

The assert-target-creates-file assertion is a macrodef in asserts.xml like this:

<macrodef name="assert-target-creates-file">
    <attribute name="target"/>
    <attribute name="file"/>
    <sequential>
        <delete file="@{file}" quiet="true"/>
        <subant antfile="build.xml" buildpath="." target="@{target}"/>
        <assert-file-exists file="@{file}"/>
    </sequential>
</macrodef>

It just deletes a file, runs the target using subant, then asserts that the file exists, which uses this macrodef:

<macrodef name="assert-file-exists">
    <attribute name="file"/>
    <sequential>
        <echo message="Checking existence of file: @{file}"/>
        <fail message="File '@{file}' does not exist.">
            <condition>
                <not><available file="@{file}"/></not>
            </condition>
        </fail>
    </sequential>
</macrodef>

This uses a trick I’ve used a lot, which is the fail task, with a condition inside it, meaning that we only fail if the condition is satisfied. Here we use not available which means fail if the file doesn’t exist.

Harder example: JAR file

Now let’s check that a JAR file was created, and has the right contents. Here’s the test:

<target name="test-jar-created-with-manifest">

    <assert-target-creates-file
        target="build"
        file="dist/MyProduct.jar"
    />
    <assert-file-in-jar-contains
        jarfile="dist/MyProduct.jar"
        filename="MANIFEST.MF"
        find="Main-Class: my.package.MyMain"
    />

This just says after we’re run the target, the file MyProduct.jar exists, and it contains a file called MANIFEST.MF that has the right Main-Class information in it.

assert-file-in-jar-contains looks like this:

<macrodef name="assert-file-in-jar-contains">
    <attribute name="jarfile"/>
    <attribute name="filename"/>
    <attribute name="find"/>

    <sequential>
        <!-- ... insert checks that jar exists, and contains file -->

        <delete dir="${tmpdir}/unzip"/>
        <unzip src="@{jarfile}" dest="${tmpdir}/unzip"/>

        <fail message="@{jarfile}:@{filename} should contain @{find}">
            <condition>
                <resourcecount when="equal" count="0">
                    <fileset dir="${tmpdir}/unzip">
                        <and>
                            <filename name="**/@{filename}"/>
                            <contains text="@{find}"/>
                        </and>
                    </fileset>
                </resourcecount>
            </condition>
        </fail>

        <delete dir="${tmpdir}/unzip"/>

    </sequential>
</macrodef>

Which basically unzips the JAR into a directory, then searches the directory using fileset for a file with the right name and contents, and fails if it’s not found (i.e. if the resourcecount of the fileset is zero. These are the kinds of backflips you need to do to bend Ant to your will.

Or, you can choose the Nuclear Option.

The Nuclear Option

If ant tasks just won’t do, since Ant 1.7 and Java 1.6 we can drop into a <script> tag.

You ain’t gonna like it:

<script language="javascript"><![CDATA[
system.launchMissiles(); // Muhahahaha
]]></script>

The script tag allows us to use a scripting language as provided through the JSR 223 Java feature directly within our Ant file, meaning we can do anything.

In all the JVMs I’ve tried, the only scripting language actually available is JavaScript, provided by the Rhino virtual machine, which is now part of standard Java.

When using the script tag, expect bad error messages. Rhino produces unhelpful stack traces, and Ant doesn’t really tell you what went wrong.

So now we know how to test the artifacts our build produces, but what about directly testing the logic in build.xml?

Testing build logic

We want to:

  • Confirm that targets succeed or fail under certain conditions

  • Check indirect dependencies are as expected

  • Test a unit of Ant logic (e.g. a macrodef)

Success and failure

Here’s a little macro I cooked up to assert that something is going to fail:

<macrodef name="expect-failure">
    <attribute name="target"/>
    <sequential>
        <local name="ex.caught"/>
        <script language="javascript"><![CDATA[
        try {
            project.executeTarget( "@{target}" );
        } catch( e ) {
            project.setProperty( "ex.caught", "yes" )
        }
        ]]></script>
        <fail message="@{target} succeeded!!!" unless="ex.caught"/>
    </sequential>
</macrodef>

I resorted to the Nuclear Option of a script tag, and used Ant’s Java API (through JavaScript) to execute the target, and catch any exceptions that are thrown. If no exception is thrown, we fail.

Testing dependencies

To check that the dependencies are as we expect, we really want to run ant’s dependency resolution without doing anything. Remarkably, ant has no support for this.

But we can hack it in:

<target name="printCdeps">
    <script language="javascript"><![CDATA[

        var targs = project.getTargets().elements();
        while( targs.hasMoreElements() )
        {
            var targ = targs.nextElement();
            targ.setUnless( "DRY.RUN" );
        }
        project.setProperty( "DRY.RUN", "1" );
        project.executeTarget( "targetC" );

    ]]></script>
</target>

(See Dry run mode for Ant for more.)

Now we need to be able to run a build and capture the output. We can do that like this:

<target name="test-C-depends-on-A">
    <delete file="${tmpdir}/cdeps.txt"/>
    <ant
        target="printCdeps"
        output="${tmpdir}/cdeps.txt"
    />
    <fail message="Target A did not execute when we ran C!">
        <condition>
            <resourcecount when="equal" count="0">
                <fileset file="${tmpdir}/cdeps.txt">
                    <contains text="targetA:"/>
                </fileset>
            </resourcecount>
        </condition>
    </fail>
    <delete file="${tmpdir}/cdeps.txt"/>
</target>

We use ant to run the build, telling it to write to a file cdeps.txt. Then, to assert that C depends on A, we just fail if cdeps.txt doesn’t contain a line indicating we ran A. (To assert a file contains a certain line we use a load of fail, condition, resourcecount and fileset machinery as before.)

So, we can check that targets depend on each other, directly or indirectly. Can we write proper unit tests for our macrodefs?

Testing ant units

To test a macrodef or target as a piece of logic, without touching the file system or really running it, we will need fake versions of all the tasks, including <jar>, <copy>, <javac> and many more.

If we replace the real versions with fakes and then run our tasks, we can set up our fakes to track what happened, and then make assertions about it.

If we create a file called real-fake-tasks.xml, we can put things like this inside:

<macrodef name="jar">
    <attribute name="destfile"/>
    <sequential>
        <property name="jar.was.run" value="yes"/>
    </sequential>
</macrodef>

and, in build.xml we include something called fake-tasks.xml, with the optional attribute set to true:

<include file="fake-tasks.xml" optional="true"/>

If the target we want to test looks like this (in build.xml):

<target name="targetA">
    <jar destfile="foo.jar"/>
</target>

Then we can write a test like this in test-build.xml:

<target name="test-A-runs-jar" depends="build.targetA">
    <fail message="Didn't jar!" unless="jar.was.run"/>
</target>

and run the tests like this:

cp real-fake-tasks.xml fake-tasks.xml
ant -f test-build.xml test-A-runs-jar
rm fake-tasks.xml

If fake-tasks.xml doesn’t exist, the real tasks will be used, so running your build normally should still work.

This trick relies on the fact that our fake tasks replace the real ones, which appears to be an undocumented behaviour of my version of Ant. Ant complains about us doing this, with an error message that sounds like it didn’t work, but actually it did (on my machine).

If we wanted to avoid relying on this undocumented behaviour, we’d need to write our real targets based on special macrodefs called things like do-jar and provide a version of do-jar that hands off to the real jar, and a version that is a fake. This would be a lot of work, and pollutes our production code with machinery needed for testing, but it could work with Ant’s documented behaviour, making it unlikely to fail unexpectedly in the future.

Summary

You can write Ant code in a test-driven way, and there are even structures that allow you to write things that might be described as unit tests.

At the moment, I am using mostly the “testing artifacts” way. The tests run slowly, but they give real confidence that your build file is really working.

Since I introduced this form of testing into our build, I enjoy working with build.xml a lot more, because I know when I’ve messed it up.

But I do spend more time waiting around for the tests to run.

Everybody hates build.xml (code reuse in Ant)

If you’re starting a new Java project, I’d suggest suggest considering the many alternatives to Ant, including Gant, Gradle, SCons and, of course, Make. This post is about how to bend Ant to work like a programming language, so you can write good code in it. It’s seriously worth considering a build tool that actually is a programming language.

If you’ve chosen Ant, or you’re stuck with Ant, read on.

Slides: Everybody hates build.xml slides.

Most projects I’ve been involved with that use Ant have a hateful build.xml surrounded by fear. The most important reason for this is that the functionality of the build file is not properly tested, so you never know whether you’ve broken it, meaning you never make “non-essential” changes i.e. changes that make it easier to use or read. A later post and video will cover how to test your build files, but first we must address a pre-requisite:

Can you write good code in Ant, even if you aren’t paralysed by fear?

One of the most important aspects of good code is that you only need to express each concept once. Or, to put it another way, you can re-use code.

I want to share with you some of the things I have discovered recently about Ant, and how you should (and should not) re-use code.

But first:

What is Ant?

Ant is 2 languages:

  • A declarative language to describe dependencies
  • A procedural language to prescribe actions

In fact, it’s just like a Makefile (ignore this if Makefiles aren’t familiar). A Makefile rule consists of a target (the name before the colon) with its dependencies (the names after the colon), which make up a declarative description of the dependencies, and the commands (the things indented by tabs) which are a normal procedural description of what to do to build that target.

# Ignore this if you don't care about Makefiles!
target: dep1 dep2   # Declarative
    action1         # Procedural
    action2

The declarative language

In Ant, the declarative language is a directed graph of targets and dependencies:

<target name="A"/>
<target name="B" depends="A"/>
<target name="C" depends="B"/>
<target name="D" depends="A"/>

This language describes a directed graph of dependencies. I.e. they say what depends on what, or what must be built before you can build something else. Targets and dependencies are completely separate from what lives inside them, which are tasks.

The procedural language

The procedural language is a list of tasks:

<target ...>
    <javac ...>
    <copy ...>
    <zip ...>
    <junit ...>
</target>

When the dependency mechanism has decided a target will be executed, its tasks are executed one by one in order, just like in a programming language. Except that tasks live inside targets, they are completely separate from them. Essentially each target has a little program inside it consisting of tasks, and these tasks are a conventional programming language, nothing special (except for the lack of basic looping and branching constructs).

I’m sorry if the above is glaringly obvious to you, but it only recently became clear to me, and it helped me a lot to think about how to improve my Ant files.

Avoiding repeated code

Imagine you have two similar Ant targets:

<target name="A">
    <javac
        srcdir="a/src" destdir="a/bin"
        classpath="myutil.jar" debug="false"
    />
</target>

<target name="B">
    <javac
        srcdir="b/code" destdir="b/int"
        classpath="myutil.jar" debug="false"
    />
</target>

The classpath and debug information are the same in both targets, and we would like to write this information in one single place. Imagine with me that the code we want to share is too complex for it to be possible to store it as the values of properties in some properties file.

How do we share this code?

The Wrong Way: antcall

Here’s the solution we were using in my project until I discovered the right way:

<target name="compile">
    <javac
        srcdir="${srcdir}" destdir="${destdir}"
        classpath="myutil.jar" debug="false"
    />
</target>

<target name="A">
    <antcall target="compile">
        <param name="srcdir" value="a/src"/>
        <param name="destdir" value="a/bin"/>
    </antcall>
</target>

<target name="B">
    <antcall target="compile">
    ...

Here we put the shared code into a target called compile, which makes use of properties to access the varying information (or the parameters, if we think of this as a function). The targets A and B use the <antcall> task to launch the compile target, setting the values of the relevant properties.

This works, so why is it Wrong?

Why not antcall?

antcall launches a whole new Ant process and runs the supplied target within that. This is wrong because it subverts the way Ant is supposed to work. The new process will re-calculate all the dependencies in the project (even if our target doesn’t depend on anything) which could be slow. Any dependencies of the compile target will be run even if they’ve already been run, meaning some of your assumptions about order of running could be incorrect, and the assumption that each target will only run once will be violated. What’s more, it subverts the Ant concept that properties are immutable, and remain set once you’ve set them: in the example above, srcdir and destdir will have different values at different times (because they exist inside different Ant processes).

Basically what we’re doing here is breaking all of Ant’s paradigms to force it to do what we want. Before Ant 1.8 you could have considered it a necessary evil. Now, it’s just Evil.

The Horrific Way: custom tasks

Ant allows you to write your own tasks (not targets) in Java. So our example would look something like this:

Java:

public class MyCompile extends Task {
    public void execute() throws BuildException
    {
        Project p = getProject();

        Javac javac = new Javac();

        javac.setSrcdir(  new Path( p, p.getUserProperty( "srcdir" ) ) );
        javac.setDestdir( new File( p.getUserProperty( "destdir" ) ) );
        javac.setClasspath( new Path( p, "myutil.jar" ) );
        javac.setDebug( false );
        javac.execute();
    }
}

Ant:

<target name="first">
    <javac srcdir="mycompile"/>
    <taskdef name="mycompile" classname="MyCompile"
        classpath="mycompile"/>
</target>

<target name="A" depends="first">
    <mycompile/>
</target>

<target name="B" depends="first">
    <mycompile/>
</target>

Here we write the shared code as a Java task, then call that task from inside targets A and B. The only word to describe this approach is “cumbersome”. Not only do we need to ensure our code gets compiled before we try to use it, and add a taskdef to allow Ant to see our new task (meaning every target gets a new dependency on the “first” target), but much worse, our re-used code has to be written in Java, rather than the Ant syntax we’re using for everything else. At this point you might start asking yourself why you’re using Ant at all – my thoughts start drifting towards writing my own build scripts in Java … anyway, I’m sure that would be a very bad idea.

The Relatively OK Way: macrodef

So, enough teasing. Here’s the Right Way:

<macrodef name="mycompile">
    <attribute name="srcdir"/>
    <attribute name="destdir"/>
    <sequential>
        <javac
            srcdir="@{srcdir}" destdir="@{destdir}"
            classpath="myutil.jar" debug="false"
        />
    </sequential>
</macrodef>

<target name="A">
    <mycompile srcdir="a/src" destdir="a/bin"/>
</target>

<target name="B">
    <mycompile srcdir="b/code" destdir="b/int"/>
</target>

Since Ant 1.8, we have the macrodef task, which allows us to write our own tasks in Ant syntax. In any other language these would be called functions, with arguments which Ant calls attributes. You use these attributes by giving their name wrapped in a @{} rather than the normal ${} for properties. The body of the function lives inside a sequential tag.

This allows us to write re-usable tasks within Ant. But what about re-using parts from the other language – the declarative targets and dependencies?

Avoiding repeated dependencies?

Imagine we have a build file containing targets like this:

<target name="everyoneneedsme"...

<target name="A" depends="everyoneneedsme"...
<target name="B" depends="everyoneneedsme"...
<target name="C" depends="everyoneneedsme"...
<target name="D" depends="everyoneneedsme"...

In Ant, I don’t know how to share this. The best I can do is make a single target that is re-used whenever I want the same long list of dependencies, but in a situation like this where everything needs to depend on something, I don’t know what to do. (Except, of course, drop to the Nuclear Option of the <script> tag, which we’ll see next time.)

I haven’t used it in anger, but this kind of thing seems pretty straightforward with Gradle. I believe the following is roughly equivalent to my example above, but I hope someone will correct me if I get it wrong:

task everyoneneedsme

tasks.whenTaskAdded { task ->
    task.dependsOn everyoneneedsme
}
    
task A
task B
task C
task D

(Disclaimer: I haven’t run this.)

So, if you want nice features in your build tool, like code-reuse and testability, you should consider a build tool that is integrated into a grown-up programming language where all this stuff comes for free. But, if you’re stuck with Ant, you should not despair: basic good practice is possible if you make the effort.