Best GCC warning flags for compiling C++

A recent discussion on ACCU-general gave people an opportunity to share the warning flags they like to use with g++.

I thought I’d write down the consensus as I understood it, mainly for my own reference:

-Wredundant-decls
-Wcast-align
-Wmissing-declarations
-Wmissing-include-dirs
-Wswitch-enum
-Wswitch-default
-Wextra
-Wall
-Werror
-Winvalid-pch
-Wredundant-decls
-Wmissing-prototypes
-Wformat=2
-Wmissing-format-attribute
-Wformat-nonliteral

We were advised by Jonathan Wakely that -Weffc++ is not very useful since it is mostly based on the first edition of the book Effective C++, many of whose recommendations were improved in the second edition, and also apparently GCC doesn’t do a great job of warning about them.

Update: thanks to an article[1] by Roger Orr in CVu these flags are highly recommended in GCC 5.2+:

-flto
-Wodr

[1] Orr, Roger “One Definition Rule“, in CVu Volume 27, Issue 5 p16 (editor: Steve Love)

Simple template programming

C++ template meta-programming sounds harder than it is.

So long as you can fight through some horrific syntax and understand recursive functions, you can write any algorithm you like and have it run at compile time in your C++ programs.

Slides: Simple Template Programming

Andrei Alexandrescu’s amazing book on using template meta-programming for really useful, cool stuff is: Modern C++ Design.

Avoid mocks by refactoring to functional

At this week’s ACCU Conference I went to an excellent talk by Sven Rosvall entitled “Unit Testing Beyond Mock Objects”.

The talk covered the newer Java and C# unit testing frameworks that allow inserting mock objects even where legacy code is using some undesired dependency directly, meaning there is no seam where you can insert a different implementation.

These tools solve a real problem, and could be useful.

However, I want to move the discussion in a different direction: can we avoid mocking altogether, and end up with better code?

Sven gave us an example of some legacy code a little bit like this. (I translated into C++ just for fun.)

// Return true if it's between 12:00 and 14:00 now.
bool is_it_lunch_time_now()
{
    system_clock::time_point now = system_clock::now();
    time_t tt = system_clock::to_time_t( now );
    tm local_tm = ( *localtime( &tt ) );
    int hour = local_tm.tm_hour;

    return (
        12 <= hour      &&
              hour < 14
    );
}

To test this code, we would have something like this:

TEST_CASE( "Now: It's lunchtime now (hopefully)" )
{
    // REQUIRE( is_it_lunch_time_now() ); // NOTE: only  run at lunch time!
    REQUIRE( !is_it_lunch_time_now() );   // NOTE: never run at lunch time!
}

Which is agony. But the production code is fine:

if ( is_it_lunch_time_now() )
{
    eat( sandwich );
}

So, the normal way to allow mocking out a dependency like this would be to add an interface, making our lunch-related code take in a TimeProvider. To avoid coupling the choice of which TimeProvider is used to the calling code, we pass it into the constructor of a LunchCalculator that we plan to make early on:

class TimeProvider
{
public:
    virtual tm now_local() const = 0;
};

class RealTimeProvider : public TimeProvider
{
public:
    RealTimeProvider()
    {
    }
    virtual tm now_local() const
    {
        system_clock::time_point now = system_clock::now();
        time_t tt = system_clock::to_time_t( now );
        return ( *localtime( &tt ) );
    }
};

class HardCodedHourTimeProvider : public TimeProvider
{
public:
    HardCodedHourTimeProvider( int hour )
    {
        tm_.tm_hour = hour;
    }
    virtual tm now_local() const
    {
        return tm_;
    }
private:
    tm tm_;
};

class LunchCalc
{
public:
    LunchCalc( const TimeProvider& prov )
    : prov_( prov )
    {
    }
    bool is_it_lunch_time()
    {
        int hour = prov_.now_local().tm_hour;
        return (
            12 <= hour      &&
                  hour < 14
        );
    }
private:
    const TimeProvider& prov_;
};

and now we can write tests like this:

TEST_CASE( "TimeProvider: Calculate lunch time when it is" )
{
    HardCodedHourTimeProvider tp( 13 ); // 1pm (lunch time!)
    REQUIRE( LunchCalc( tp ).is_it_lunch_time() );
}

TEST_CASE( "TimeProvider: Calculate lunch time when it isn't" )
{
    HardCodedHourTimeProvider tp( 10 ); // 10am (not lunch :-( )
    REQUIRE( ! LunchCalc( tp ).is_it_lunch_time() );
}

Innovatively, these tests will pass at all times of day.

However, look at the price we’ve had to pay: 4 new classes, inheritance, and class names ending with unKevlinic words like “Provider” and “Calculator”. (Even worse, in a vain attempt to hide my embarrassment I abbreviated to “Calc”.)

We’ve paid a bit of a price in our production code too:

    // Near the start of the program:
    config.time_provider = RealTimeProvider();

    // Later, with config passed via PfA
    if ( LunchCalc( config.time_provider ).is_it_lunch_time() )
    {
        eat( sandwich );
    }

The code above where we create the RealTimeProvider is probably not usefully testable, and the class RealTimeProvider is also probably not usefully testable (or safely testable in the case of some more dangerous dependency). The rest of this code is testable, but there is a lot of it, isn’t there?

The advantage of this approach is that we have been driven to a better structure. We can now switch providers on startup by providing config, and even code way above this stuff can be fully exercised safe in the knowledge that no real clocks were poked at any time.

But. Sometimes don’t you ever think this might be better?

tm tm_local_now()  // Not testable
{
    system_clock::time_point now = system_clock::now();
    time_t tt = system_clock::to_time_t( now );
    return ( *localtime( &tt ) );
}

bool is_lunch_time( const tm& time )  // Pure functional - eminently testable
{
    int hour = time.tm_hour;
    return (
        12 <= hour      &&
              hour < 14
    );
}

The tests look like this:

TEST_CASE( "Functional: Calculate lunch time when it is" )
{
    tm my_tm;
    my_tm.tm_hour = 13; // 1pm (lunch time!)
    REQUIRE( is_lunch_time( my_tm ) );
}

TEST_CASE( "Functional: Calculate lunch time when it isn't" )
{
    tm my_tm;
    my_tm.tm_hour = 10; // 10am (not lunch time :-( )
    REQUIRE( ! is_lunch_time( my_tm ) );
}

and the production code looks very similar to what we had before:

if ( is_lunch_time( tm_local_now() ) )
{
    eat( sandwich );
}

Where possible, let’s not mock stuff we can just test as pure, functional code.

Of course, the production code shown here is now untestable, so there may well be work needed to avoid calling it in a test. That work may involve a mock. Or we may find a nicer way to avoid it.

Goodness in programming languages, part 4 – Ownership & Memory

Posts in this series: Syntax, Deployment, Metaprogramming, Ownership

There is often a trade-off between programming language features and how fast (and predictably) the programs run. From web sites that serve millions of visitors to programs running on small devices we need to be able to make our programs run quickly.

One trade-off that is made in many modern programming languages (including Python, Ruby, C#, Java and JVM-based languages) is that the system owns all the memory. This avoids the need for the programmer to think about how long pieces of memory need to live, but it means a lot of memory can hang around a lot longer than it really needs to. In addition, it can mean the CPU has to jump around to lots of different memory locations to find pieces of dynamically-allocated memory in different locations. Where this jumping around causes caches to be invalidated that can really slow things down.

While these garbage collection-based languages have been evolving, C++ has been developing along a different track. C++ allows the programmer to allocate and free up memory manually (as in C), but over time the community of C++ programmers has been developing a new way of thinking about memory, and developing tools in the C++ language to make it easier to work in this way.

Modern C++ code rarely or never uses “delete” or “free” to deallocate memory, but instead defines clearly which object owns each other object. When the owning object is no longer needed, everything it owns can be deleted, immediately freeing their memory. The top-level objects are owned by the current scope, so when the function or block of code we are in ends, the system knows these objects and the ones they own can be deleted. Objects that last for the whole life of the program are owned by the scope of the main function or equivalent.

One advantage of explicit ownership is that the right thing happens automatically when something unexpected happens (e.g. an exception is thrown, or we return early from a function). Because the objects are owned by a scope, as soon as we exit that scope they are automatically deleted, and no memory is “leaked”.

Because ownership is explicit, we can often group owned objects in memory immediately next to the objects that own them. This means we jump around to different memory locations less often, and we have to do less work to find and delete regions of memory. This makes our programs faster.

Here are some things I like:

  • Modern C++’s clarity about who owns what. By expressing ownership explicitly we make clear our intentions, and avoid memory leaks.
  • Modern C++’s fast and cache-friendly memory handling. Allocating memory for several objects together reduces time spent looking for space, and means caches are more likely to be used.

In my experience, the most frequent performance problems I have had to solve have really been memory problems. Explicit ownership can reduce unnecessary memory management overhead by taking back the work from the system (the garbage collector) and allowing programmers to be explicit about who owns what.

C++14 “Terse” Templates – an argument against the proposed syntax

Today I attended two excellent talks by Bjarne Stroustrup at the ACCU Conference 2013. The first was an inspiring explanation of the recent C++11 standard, and the second, “C++14 Early thoughts” was an exciting description of some of the features that might go into the next standard.

One of those features, which Bjarne called “Terse” Templates, might be a good idea, but the syntax Bjarne proposed seems like a bad idea to me, because it leaks unwanted names into the namespace containing the function you are writing.

Allow me to explain.

Background – Concepts Lite

I attended another excellent talk before Bjarne’s, called “Concepts Lite-Constraining Templates with Predicates” by Andrew Sutton, introducing “Concepts Lite”, which is an attempt to salvage a manageable language feature from the very large “Concepts” feature that failed to make it into C++11.

My (so far very basic) understanding of Concepts Lite is that it is a way of defining conditions that state whether a template will be expanded for a given type.

So, in C++11 (and C++98), we can declare a (stupid) template function like so:

template<typename ListOfInt>
int first( ListOfInt& list ) { return list.size() > 0 ? list[0] : 0; }

The code in this function template assumes that list has a size method, and an operator[] method. We tried to “suggest” this, by naming our template parameter ListOfInt, but the poor programmer may not realise exactly what we meant.

If we do the wrong thing, and try to use the first function with an int argument:

int i = 3;
first( i );

It goes wrong, because ints don’t have a size method:

In function 'int first(ListOfInt&) [with ListOfInt = int]':
error: subscripted value is neither array nor pointer
error: request for member 'size' in 'list', which is of non-class type 'int'

This error is not too obscure, but in complex cases the errors can be extremely long, and point to problems that appear to be unrelated to the code we are writing.

Really what we want to know is that int is not a ListOfInt.

Concepts Lite give us the ability to define what a ListOfInt means, and only expand the template for types that match that definition.

In our example we would do something like this:

template<typename ListOfInt> requires SizeAndIndex<ListOfInt>()
int first( ListOfInt& list ) { return list.size() > 0 ? list[0] : 0; }

(There is actually a neater syntax, but we’ll do it like this for now because we need the more verbose form later.)

What this means is that this template function will only be expanded for types that satisfy the constraint.

The definition of SizeAndIndex is outside the scope of this article – it allows us to check whether types satisfy some conditions. In this case we assume it checks that the type contains the methods we use.

Now when we do the wrong thing:

int i = 3;
first( i );

We get a simple error message, that properly tells us what’s wrong:

error: no matching call to ‘first(int list)’
note: candidate is ‘first(ListOfInt& list)’
note: where ListOfInt = int
note: template constraints not satisfied
note: ‘ListOfInt’ is not a/an ‘SizeAndIndex’ type since
note: ‘list.size()’ is not valid syntax

(The above is fiction, but Andrew assures us he gets real errors like this with his prototype.)

So Concepts Lite gives us the optional ability to check that our template parameters are what we expected them to be, giving a decent error message, instead of waiting for something to fail much later when we compile the instantiated template.

So far so utterly cool. (And, in my ill-informed opinion, the only bit of Concepts I really wanted anyway.)

There’s more information on this feature here: Concepts Lite: Constraining Templates with Predicates and here: Concepts-Lite.

Constraints on multiple types

The Concepts Lite feature as proposed allows us to specify constraints that describe how multiple types relate to each other, by doing something like this:

template<typename Victim1, typename Victim2> requires Lakosable<Victim1, Victim2>
void lakos( Victim1 a, Victim2 b );

Here the Lakosable constraint can specify conditions that describe how the two types relate to each other, for example that Victim1::value_type is equal to the type of Victim2.

This is very good.

Now, the bit I want to argue against.

“Terse” Templates – the syntax I don’t like

Bjarne gave us an example of the std::merge function, which has lots of arguments, and very complex constraints on them. He showed us that these could all be nicely wrapped into a single Mergeable constraint (similar to the Lakosable constraint above) but he argued that there was still too much repetition. The repetition comes from the fact that several functions in the standard library have the exact same template parameters, with the exact same constraints on them, and that you have to mention the whole list of template parameters twice: once after the template keyword, and once in the requires condition.

This led him to look for a terser syntax.

So, he proposed a modest new construct that looks like this:

using Lakosable{Victim1,Victim2}; // (1)

that allows a radical departure from everything that has gone before in terms of declaring templates. After we’ve made the declaration (1), we can declare the exact function we declared above with this little line:

void lakos( Victim1 a, Victim2 b ); // (2)

The using declaration in (1) makes the names Victim1 and Victim2 available in the current namespace, and gives them special powers that mean functions taking parameters of type Victim1 or Victim2 are automatically function templates, even though the template keyword is nowhere to be seen.

There was some resistance in the room to this proposal. Most of it focussed on (2), and the fact that templates were being declared without it being visible because of the lack of the template keyword.

I’m actually ok with (2). In fact, my ficticious programming language Pepper (which represents everything I think is Right in programming languages) provides a feature very much like this – all non-definite parameter types act as “implicit” templates in Pepper (see “implicit_templates.pepper” on the Examples page).

Bjarne made a reasonable defence of (2), arguing that we often want new features to be “signposted” by new keywords (he cited user-defined types as an example – apparently some people wanted to require “class MyClass” instead of just “MyClass” every time we referred to a user-defined type) but later when they are familiar we want less verbose syntax. (Presumably the “new” feature he was talking about here is templates.)

My problem is with (1).

As my neighbour in the talk (whose name I missed, sorry) pointed out, what (1) does is dump 2 new names Victim1 and Victim2 in the namespace containing the lakos function template.

No-one wants these names.

In fact, why are we doing any of this?

The sole purpose of the exercise is to constrain the lakos function template. Why is the result putting 2 names into the namespace?

More seriously, in the case of the standard library, these names will go into the std:: namespace, and there could easily be clashes. If the std::merge function uses the name For for one of its template parameters (a Forward_iterator), and std::copy wants to use one with the same name, but with different constraints, it will override the definition of For.

I.e. If we do this:

namespace std {
using Mergeable{For,For2,Out};
// define std::merge
}

// and somewhere else:

namespace std {
using Copyable{For,Out};
// define std::copy
}

then the (useless) value of std::For will be different depending on the order in which we import the header files.

I Think

I think.

Please correct me if I’m wrong.

Conclusion

If I’m right, this all seems bad and Wrong.

What was wrong with:

template<typename Victim1, typename Victim2> requires Lakosable<Victim1, Victim2>
void lakos( Victim1 a, Victim2 b );

anyway?