In a recent keynote at RailsConf called Writing Software*, David Heinemeier Hansson argues that test-driven development (TDD) can harm the clarity of our code, and that clarity is the key thing we should be aiming for when writing software.
(*contains swearing)
It’s an excellent talk, and I would highly recommend watching it, especially if you are convinced (like me) that TDD is a good thing.
I was inspired by watching the video. Clarity certainly is extremely important, and the name he coins, Software Writer, sits better with me than Software Engineer or Software Developer. I have often felt Programmer was the best name for what I am, but maybe I will adopt Software Writer.
The real goal
I would argue that clarity is not our goal in writing software. I think our goal is:
Working, modifiable software
Clarity helps us feel confident that our software works because we can read the code and understand what it does.
Clarity helps us modify our software because we can understand what needs to be changed and are less likely to make mistakes when we change it.
A good set of full-system-level tests helps us feel confident that our software works because they prove it works in certain well-defined scenarios. A good set of component-level and unit tests gives us confidence that various parts work, but as David points out, confidence in these separate parts does not give us much meaningful confidence that the whole system works.
Good sets of tests at all levels help us modify our software because we are free to refactor (or re-draft as David puts it). Unit and component tests give us confidence that structural changes we are making don’t modify the external behaviour of the part we are re-structuring. Once we have made enabling structural changes, the changes we make that actually modify the system’s behaviour are smaller and easier to deal with. The tests that break or must be written when we modify the system’s behaviour help us understand and explain the behaviour changes we are making.
So both clarity and tests at all levels can contribute to our goal of writing working, modifiable software.
But David wasn’t talking about tests – he was talking about TDD – driving the design of software by writing tests.
How TDD encourages clarity
Before I talk about how we should accept some of what David is saying, let’s first remind ourselves of some counter-points. TDD is explicitly intended to improve our code.
I agree with David when he defines good code as clear code, so how does TDD encourage clarity?
TDD encourages us to break code into smaller parts, with names. Smaller, named classes and functions are clearer to me than large blocks of code containing sub-blocks that do specific jobs but are not named. When I write in a TDD style I find it encourages me to break code into smaller parts.
TDD encourages us to write code that works at a single level of abstraction. Code that mixes different levels is less clear than code at a single level. I find that using TDD helps me resist the temptation to mix levels because it encourages me to compose two pieces that deal separately with each level, rather than linking them together.
It is vital to point out here that TDD doesn’t push you towards clarity unless you already wanted to go there. I have seen TDD code that is unclear, stuffed full of boilerplate, formed by copy-paste repetition, and is generally awful. As a minimal counter-example, TDD doesn’t stop you using terrible names for things.
But, when you care about clarity, and have an eye for it, I feel TDD can help you achieve it.
How TDD hurts clarity
David’s argument against TDD is that it makes us write code that is less clear. His main argument, as I understand it, is:
TDD forces us to use unnecessary layers of abstraction. Because we must never depend on “the world,” TDD forces us to inject dependencies at every level. This makes our code more complex and less clear.
At its core, we must acknowledge that this argument is true. Where TDD causes us to inject dependencies that we otherwise would not inject, we are making our code more complex.
However, there are elements of a straw man here too. Whenever I can, I allow TDD to drive me towards systems with fewer dependencies, not injected dependencies. When I see a system with fewer dependencies, I almost always find it clearer.
Test against the real database?
David frequently repeats his example of testing without hitting the database. He points out that allowing this increases complexity, and that the resulting tests do not have anything like as much value as tests that do use the database.
This hurts, because I think his point is highly valid. I have seen lots of bugs, throughout systems (not just in code close to the database) that came from wrong assumptions about how the database would behave. Testing a wide range of functionality against the real database seems to be the only answer to this problem. Even testing against a real system that is faster (e.g. an in-memory database) will not help your discover all of these bugs because the faster database will have different behaviour from the production one.
On the other hand, tests that run against a real database will be too slow to run frequently during development, slowing everything down and reducing the positive effects of TDD.
I don’t know what the answer is, but part of it has got to be to write tests at component level, testing all the behaviour that is driven by database behaviour, but not requiring huge amounts of other systems to be spun up (e.g. the web server, LDAP, other HTTP endpoints) and run these against the real database as often as possible. If they only take about 5 minutes maybe it’s reasonable to ask developers to run them before they commit code.
But my gut tells me that running tests at this level should not absolve us from abstracting our code from the production database. It just feels Right to write code that works with different storage back ends. We are very likely to have to change the specific database we use several times in the history of our code, and we may well need to change the paradigm e.g. NoSQL->SQL.
In a component that is based on the database, I think you should unit test the logic in the standard TDD way, unit test the database code (e.g. code that generates SQL statements) against a fast fake database, AND comprehensively test the behaviour of the component as a whole against a real database. I admit this looks like a lot of tests, but if you avoid “dumb” unit tests that e.g. check getters and setters, I think these 3 levels have 3 different purposes, and all have value*.
*Writing the logic TDD encourages smaller units of logic, and gives confidence that it is correct. Testing the database code against a fake database gives confidence that our syntax is right, and testing the whole component against the real database gives us confidence that our behaviour really works.
Injecting the database as a dependency gives us the advantage of our code having two consumers, which is one of the strongest arguments put forward by proponents of TDD that it gives us better code. All programmers know that there are only three numbers: 0, 1 and more. By having “more” users of our database code, we (hopefully) end up with code that speaks at a single level e.g. there are no SQL statements peppered around code which has no business talking direct to the database.
Clarity of tests
In order for tests to drive good APIs in our production code, and for them to serve as documentation, they must be clear. I see a lot of test code that is full of repetition and long methods, and for me this makes it much less useful.
If our tests are complex and poorly factored, they won’t drive good practice in our production code. If we view unclear tests as a smell, the fixes we make will often encourage us to improve the clarity of our production code.
If our tests resemble (or actually are) automatically-generated dumb callers of each method, we will have high coverage and almost no value from them. If we try to change code that is tested in this way, we will be thwarted at every turn.
If, on the other hand, we write tests that are clear and simple expressions of behaviour we expect, we will find them easy to understand and maintain, they will drive clear code in production, and sometimes we may realise we are writing them at a higher level than unit tests. When this happens, we should “float freely” with David and embrace that. They are testing more. That is good.
Higher-level tests
David (with the support of Jim Coplien) encourages us to test at a coarser grain than the unit. I strongly agree that we need more emphasis on testing at the component and system levels, and sometimes less unit testing, since we don’t want write two tests that test the same thing.
However, there are some problems with larger tests.
First, larger tests make it difficult to identify what caused a problem. When a good unit test fails, the problem we have introduced (or the legitimate change in behaviour) is obvious. When a component or system test fails, often all we know is that something is wrong, and the dreaded debugging process must begin. In my experience, this is not just a myth. One of the pieces of code that I am most proud of in my career was a testing framework allowed you to write concise and clear tests of a horrible tangled mess of a component. The tests ran fairly quickly, with no external dependencies being used, but if one of them failed your heart sank.
Second, large tests can be fragile. Sometimes they fail because the database or network is down. Sometimes they fail because your threads did stuff in an unexpected order. Threading is a really good example: when I want to debug a weird threading issue I want to write an unthreaded test that throws stuff at a component in a well-defined but unusual order. When I can get that test to fail I know I’ve found a real problem, and I can fix it. If the code is not already unit tested (with dependencies being injected, sometimes) then writing that test can be really difficult.
TDD makes our code testable, and while testability (like clarity) is not an end in itself, it can be darn useful.
Conclusion
Good things
Clarity is good because it supports working, modifiable code.
Tests are good because they support working, modifiable code.
Testability is good because it supports tests, especially debugging tests.
TDD is good when it supports clarity, testability and tests.
TDD is bad when it hurts clarity.
If you throw out TDD, try not to throw out tests or testability. You will regret it.
What to do
Write tests at the right level. Don’t worry if the clearest level is not the unit level.
Use tests to improve clarity.
If your tests are unclear, there is something wrong. Fix it.
Steps to success
1. Get addicted to TDD.
2. Wean yourself off TDD and start to look at the minimal set of tests you can write to feel that sweet, sweet drug of confidence.
3. Do not skip step 1.
As I was reading this this morning I just got to the part about testing against “real” databases when I finished my breakfast and got back to work.
The task in progress was investigating a bug that only happens when we connect to the real database. Ha!
The trouble is all our dev databases are currently being refreshed (with data from production) so I can’t work on the issue for an hour or so at this point – and when I can there’s no guarantee that the data that caused the issue will be the same!
D’oh!
In our case I believe it’s the difference in the data that caused the issue, rather than the database itself. This is why we run higher level integration and system regression test suites against the dev DBs at least nightly (and where this showed up). But once t does show up I’ll try and capture it in a deterministic unit test (or low-level integration test – by capturing a subset of the DB data).
Clearly both are needed – for all sorts of reasons. But it’s primarily the unit tests that drive the low-level design. System tests drive the high-level design (as they show how our code behaves when it hits “the real world”. There’s a dance between these levels as we try to capture things we learn at the system test level back at the unit test level (where appropriate).
In short TDD is driven by tests. Which level those tests are pitched at varies as appropriate to the project, time etc – but you typically need multiple levels that all influence the design in some respect. (I’d also say that a lot of what I label “unit tests” in our code are really low-level integration tests – but they largely meet Michael Feathers’ requirements for unit tests, so I continue to call them that).
Definitely some of my best tests have been “component” tests – things that check real behaviour of relatively large areas of code, but still satisfy Feathers’ requirements for unit tests.
Also, I think driving high-level design from high-level tests and low-level design from low-level tests is the right way to think about things.