Catch2 released

Super Catch

I've been talking about Catch2 for a while - but now it's finally here! The big news for Catch2 is that it drops all support for pre-C++11 compilers. Other than meaning that some users will not be supported (you can still use Catch "Classic" (1.x) - which will get some bug fix updates for a while, at least) that's mostly an internal change - however it enables a number of user-facing changes, both now and in the future. Let's take a look at what they are.

New and shiny

New, composable, command line processor

Clara is the command line parser in Catch. In Catch 1.0 I spun it out into its own library (but it's still embedded in the Catch single header). Like Catch 1.0 itself, Clara was constrained to C++98 compatibility. For Catch2 I've rewritten Clara from the ground up, not only to fully embrace C++11, but also to be composable. What that means here is that each individual command line option or argument can be represented using its own, self-contained, parser. A composite parser of all the options is then assembled, or composed, from those smaller parsers.

The main advantage of this approach is that the set of available options is now trivially extendible outside of Catch, so users can easily specify command line options that can tune their test code.

See my lightning talk at CppCon this year for a bit more

Commas in assertions

As Catch assertions are implemented using macros, it was susceptible to the old problem of how macros interpret commas within macro arguments. Commas may occur in contexts that macros don't know about, such as within angle brackets (e.g. for template instantiations) - and so get interpreted as argument separators for the macro itself.

Now that we can rely on C++11, which includes variadic macros, we can make the assertions variadic, and just reassemble all the arguments again inside. That means we can now write code like the following:

  REQUIRE( getPair() == std::pair<true, "banana">() );

Microbenchmarking (experimental)

Catch2 gains initial support for micro-benchmarking. This is where small pieces of code are timed, usually in a loop so they are repeated enough times to be significant compared to the system clock accuracy. Some extra adjustments need to be made to allow for other sources of jitter and slowdown on the host machine - and, even then, multiple samples should be taken so they can be subject to statistical analysis.

There are many shortcomings with micro-benchmarks - not least that the performance of a piece of code in isolation can often be drastically different to how well it performs in conjunction with other code. This is not only due to the way the compiler may inline or otherwise optimise code together, but even on the CPU instructions can be reordered, pipelined or run in parallel - and with cache levels and branch prediction, the relationship between these things becomes hugely unpredictable.

Nonetheless they can still be useful - and it can be convenient to use the same test framework that you use to write functional tests - not least because there is much shared infrastructure.

Catch's benchmarking support is incomplete at time of this writing, lacking the multi-sampling, statistical analysis and richer reporting that fully-fledged frameworks offer. The intention is to grow this, but only if it can be done without any significant impact to non-benchmarking tests. In lieu of full documentation, see the example tests for now.

Performance

Both runtime and compile time performance are becoming increasingly important for Catch, and a lot of work is going into improving both. Runtime performance was a non-goal initially, so there has been plenty of low-hanging fruit. As a result we're already seeing some significant improvements, and there is more to come.

Compile time is harder. This has always been important, but as Catch has grown over the years, it has begun to suffer. Improving it significantly means making some trade-offs. So far some features that drag compile times have been made configurable - e.g. whether breaking into the debugger on a failed assertion happens in the code that caused it (meaning the debugger code gets compiled into every assertion macro) or one level up the stack (so can be "hidden" in a function). Other areas to look at are whether to use (non-standard, potentially brittle) forward declarations of some standard library types. Again, this is an ongoing area of active development - but much is already in Catch2 at launch.

See some of the toggle macros for more details

A new name

Believe it or not "Catch2" is now the name - it's not just a version reference! In fact the current intention is that, even when we move to v3.x it will still be called Catch2. E.g. Catch2 v3.0. Why? Well there have been calls for a name change - for searchability reasons. Catch is obviously a common keyword in C++, and also in unit testing. So getting search terms sufficiently narrow has been tricky. But I didn't want to use an entirely different name (although I did toy with the idea of "Catfish" for a while) - because that would lose too much of the momentum behind Catch overall. A derivative name doesn't fully solve the problem, because people will still refer to it as Catch, casually - but at least it gives a slight advantage. So that's what I've gone with. There are also a few interesting numerological aspects to it. It stops short of being Catch22 - but if you consider the C++11 requirement you could multiply them to get 22. And you can add the digits in C++11 to get 2.

Upcoming features

It's always dangerous to talk about what's planned - and I've fallen into this trap with Catch before. So there are some feature promises that have been outstanding for a long time now. In fact most of those have been deferred to Catch2 for quite some time - either because C++11 has features that make them much easier/ possible to implement (e.g. threading) - or just because they involved a lot of code that gets less noisy in C++11. So we'll talk about those again here.

Threading

This was unfeasible in Catch 1.x due not having C++11 threading primitives, or being able to use external dependencies like Boost for threading. To provide a basic level of support should now be fairly straightforward as a lot of groundwork has been laid (e.g. how singletons are organised).

The idea is that if, within a test case, you use additional threads, you should be able to make assertions from those threads - as long as the test case is still in scope at the time. The aim is for this to be done without locks in the assertions. Running multiple test cases in parallel is not immediately planned (and may be best implemented at the process level, anyway).

Generators/ Property Based Testing

Generators give you what other frameworks might call (Data-)Parameterised Tests - i.e. being able to use the same test code with different inputs. An experimental version of generators was included in Catch from very early on. Other than not being complete and having some limitations it also had a serious issue in that it didn't work at all with Sections! This is because both features relied on the ability to re-enter test cases - but they were independent of each other. I rewrote the test-case tracking code a couple of years ago now to be able to support this properly - and had a proof-of-concept new implementation of Generators working with it - enough to give a demo at a talk I gave. However the implementation was getting noisy with C++98 syntax so I deferred work on it for Catch2. Now that Catch2 is released I'll be looking at this again. Closely related to Generators - in fact it builds on it - is the idea of Property Based Testing. The proof-of-concept I mentioned actually had an initial version of this, too. There's more work involved here to getting it right, but having Generators is a first step.

Breaking Changes

As a major version change we've taken advantage of the permission that Semantic Versioning gives us to introduce a few breaking changes. These should have little, if any, impact on most users - but it's worth checking these before making the move to be sure you're ready.

toString() has been removed

This is probably the biggest change, and the most likely to affect people. For a long time there have been three ways to tell Catch how to convert values into strings for reporting purposes. In order, the pipeline was like this:
  1. toString() overload
  2. StringMaker<> specialisation
  3. ostream& operator << overload
  4. give up and use {?}

If your types already have << overloads for ostream then you're good. If not then, in theory, overloading toString() was the simplest option.

However toString() had a number of limitations - mostly due to the point of template instantiation. Compiler differences with two-phase lookup, and other factors which are implementation defined, mean that toString() overloads were unreliable and caused a lot of confusion - hardly the simplest option after all!

Specialising StringMaker<> is slightly more work, but is more reliable, stable, and flexible. So this is now the recommended way to provide string conversion functionality for your types. In Catch2, toString() has been completely removed!

If you have code that calls toString() there is a new function that plays that role: Catch::detail::stringify(). However, note that (a) this should never be overloaded - it just wraps the call into the pipeline that starts with StringMaker<> and (b) the detail part of the namespace should be a clue that this is really an internal part of Catch and is subject to change.

To specialise StringMaker<> see the documentation.

Other removals and changes

As well as C++98 support and toString(), a number of deprecated features and interfaces have been removed, as well as a few other tweaks and changes that may impact some code-bases. See the "Breaking Changes" section in the release notes for the full list. In fact the release notes in general give a good overview of all the many small changes and improvements that have gone into Catch2 that have not been mentioned here.

A new home

The Catch(2) repository has moved! You may not have noticed as it has been transferred in GitHub, and that means GitHub maintains redirects for all the old links. However they do recommend updating your own urls, in bookmarks, direct download links and, of course, git remotes. We've made this move for two reasons:

1. As Catch has grown it has become more of a community effort. It already has an additional lead maintainer in Martin Hořeňovský, and others may be added. But as the sole owner of my own personal GitHub account there are some things that only I could do (webhooks and other integrations, for example). So as not to be a bottleneck we've created a GitHub "Organization" account, CatchOrg, which allows multiple admin users. That's where we've moved it to.

2. For Catch2 to get any traction as a new name it was important for it to be reflected in the repo name, so we've taken advantage of the move to change the repo name, too. Catch "Classic" (1.x) has also moved here, but is now on a branch. If you cannot move to Catch2 for C++98 compatibility reasons you can stay on Catch Classic on this branch. It will continue to receive critical fixes, at least for now, but is no longer the active development branch. Please try to move to Catch2 as soon as possible.

If you notice anything broken as a result of this move, please let us know so we can fix it.

Thanks!

As always, a huge thanks to all who have supported and contributed to Catch and Catch2 - especially for your patience when I wasn't getting to issues and PRs as quickly as was needed!

An extra thanks to Martin, who has been doing the majority of the work on Catch this year!

Injecting Singletons in Objective-C Unit Tests

I've promised to write this up a few times now. As I've just given another talk that covers it I thought it was time to make good on that promise.

The topic is the use of singletons in UIKit (and AppKit) and how that makes code using them hard to test. These APIs are riddled with singletons and you can't really avoid them. In case you need convincing that singletons are problematic take this contrived function:

NSString* makeWidget() {
    NSString* colour = 
        [[NSUserDefaults standardUserDefaults] stringForKey: @"defaultColour"];
    return [colour stringByAppendingString: @"Widget"];
}

NSUserDefaults is a singleton - the sole instance of which is returned when you call standardUserDefaults.

Monster1

A perturbing problem

Now consider how we might test this code. Obviously in an example this trivial there are various ways we could change the code to make the problem go away. Consider this a scaled down example of a problem that may be deeper in the code - perhaps a legacy code-base (or even some third party library!).

A naive test might set the "defaultColour" key in NSUserDefaults prior to calling makeWidget(). The problem with that is that the environment is left in a changed state after the test. Subsequent tests may now pick up a different value if they use NSUserDefaults. Worse: NSUserDefaults is backed by persistent storage that can potentially leave your whole user account in a changed state!

So, at the very least, we should restore the prior value at the end of the test. This leads to further problems: If the test fails, or an exception is otherwise thrown, the clean-up would not be called. So we'd need to wrap it in a @try-@finally too. Then, can we be sure we know what value to restore it to. It's probably nil - but if it's not the environment is still in a different state. So we should capture the prior value first and hold it in a variable.

Now what if you need to set more than one value. Or you change the keys used. We're starting to do a lot of bookkeeping just to compensate for the fact that a singleton is being used. Not only is it ugly but it's increasingly error prone.

Better if we can avoid this in the first place. If we have the option - prefer to pass dependencies in - rather than have your code reach out to these Dependency Singularities. In our example either pass in the default colour, or failing that, pass in NSUserDefaults.

NSString* makeWidget( NSUserDefaults* defaults ) {
    NSString* colour = [defaults stringForKey: @"defaultColour"];
    return [colour stringByAppendingString: @"Widget"];
}

At first this doesn't seem to buy us much. We still need an instance of NSUserDefaults. Even if we alloc-init it we'll get a copy of the global one. That's better but we'd still be dependent on the environment and have to take steps to compensate. And in other cases we may not even have that option

Monster2

If you can't make it - fake it!

We might not be able to create completely fresh instances of NSUserDefaults - but we can create instances of a stand-in class. Due to Objective-C's dynamic nature we don't even need to subclass - and we only have to implement the methods that are actually called - in this case stringForKey:. We could do that with a Mock Object. Or we can build our own Fake. Let's assume you've written a Fake called FakeUserDefaults, which contains an NSMutableDictionary, a means to populate it (perhaps via an initialiser) and an implementation of stringForKey: that looks the key up in the dictionary. Now we can test like this:

TEST_CASE() {
    id defaults =
        [[FakeUserDefaults alloc] initWithValue: @"Red" 
                                         forKey: @"defaultColour"];
    REQUIRE_THAT( makeWidget( defaults ), StartsWith( @"Red" ) );    
}

Great. That seems to tick all the boxes. We have complete control of the default value and we haven't perturbed our environment. No clean-up is required at the end of the test (not even memory, if we're using ARC)

Assuming you have the freedom to change the code under test, here, of course. If makeWidget() was buried deep in some legacy code, for example, it may not be feasible to make such a change (yet). Even if we can make the change it can be useful to be able to put the test in first to watch your back while you change it. If we need to leave the call to [NSUserDefaults standardUserDefaults] baked into the code under test for whatever reason what else can we do?

Monster3

To catch a singleton we must think like a singleton

What we'd like is that, when standardUserDefaults is called on NSUserDefaults deep in the bowels of the code under test, it returns an instance of our fake class instead - but only while we're testing. Again, due to Objective-C's dynamic nature we can achieve this. But it starts to get messier. It involves gritty low-level functions from objc/runtime.h. Can we package that away somewhere?

Of course we can! Enter TBCSingletonInjector. I've uploaded the code to GitHub, but there's actually not much to it. It exposes one public (class) method:

+(void) injectSingleton: (id) injectedSingleton
              intoClass: (Class) originalClass
            forSelector: (SEL)originalSelector
              withBlock: (void (^)(void) ) code;

The usage is best explained by example:

TEST_CASE() {
    id defaults =
        [[FakeUserDefaults alloc] initWithValue:@"Red" forKey:@"defaultColour"];

    [TBCSingletonInjector injectSingleton: defaults
                                intoClass: [NSUserDefaults class]
                              forSelector: @selector(standardUserDefaults)
                                withBlock: ^ {
            REQUIRE_THAT( makeWidget(), StartsWith( @"Red" ) );
        } ];
}

Magic! How does it work? It uses a technique known as "method swizzling" (Ruby or Pythonists know it as "monkey patching"). In short we replace a singleton accessor method (such as standardUserDefaults) with one we control (actually another, not otherwise exposed, class method of TBCSingletonInjector). More specifically we swap the two implementations. This is so we can swap them back again when we're done. Then we call the code block - all within a @try-@finally - so no matter what happens we always restore everything to its previous state.

What does the method we swap in do? It returns a global variable.

Wait, what? I thought globals and singletons were basically the same thing? Aren't we out of the frying pan into the fire?

In the war against singletons we must fight them with singletons! Well it's not all bad. This global is only in our test code and we have full control over it. It gets set to our "injected" singleton instance (and set back to nil at the end). It's not perfect - we can only use this implementation to handle one singleton at a time. I've not yet needed to handle more than one but I daresay the implementation could be extended to handle it.

Keep it clean

Since we've hand rolled our own fake class here (FakeUserDefaults) we can tidy things up further if we encapsulate the use of the singleton injector within it. Just adding a method like this should do the trick:

-(void) use:(void (^)(void) ) code
{
    [TBCSingletonInjector injectSingleton: self
                                intoClass: [NSUserDefaults class]
                              forSelector: @selector(standardUserDefaults)
                                withBlock: code ];
}

Now the test code becomes:

    FakeUserDefaults* defs = 
        [[FakeUserDefaults alloc] initWithValue: @"Red" 
                                         forKey: @"defaultColour"];
    [defs use:^{
            REQUIRE_THAT( makeWidget(), StartsWith( @"Red" ) );
         }];

Or, if you prefer, even:

    [[[FakeUserDefaults alloc] initWithValue: @"Red" 
                                      forKey: @"defaultColour"]
    	use:^{
            REQUIRE_THAT( makeWidget(), StartsWith( @"Red" ) );
         }];

Not too bad, really. But, still, prefer to avoid the singletons in the first place if you have the option.

Monster4

Mocking a monster

Rather than hand rolling a Fake you might prefer to use a Mock object too. I've found OCMock does the job well enough. I'm sure other mocking frameworks would do so at least as well. I prefer to use mocks when I want to test the behaviour, though. In this context that might equate to testing that some code under test sets a value in a singleton (e.g. sets a key in NSUserDefaults). The Singleton Injector works just as well for that, of course.

So there we have it. When you really have to deal with the beast you now have some tools to do so. If you do it please consider only doing so until you are able to replace the singularity with something better behaved instead.