Absolute Truth in programming languages

Is enforcing truthfulness the opposite of beauty?

Can 2 + 2 = 5?

Improvements, corrections, further contributions are welcome.

$ cat five.cpp 
#include <iostream>
int operator+( int x, int y ) { return 5; }
int main() {
    std::cout << 2 + 2 << std::endl;
}
$ g++ five.cpp 
five.cpp:2:29: error: ‘int operator+(int, int)’ must have an argument of class or enumerated type
$ python
>>> int.__add__ = lambda y: 5
TypeError: can't set attributes of built-in/extension type 'int'
$ cat five.hs
import Prelude hiding ((+))
x + y = 5
main = print ( 2 + 2 )
$ ghc five.hs && ./five
5
$ cat five.rb
class Fixnum
    def +(y)
        5
    end
end
print 2 + 2
$ ruby five.rb
5
$ mzscheme 
> (define (+ x y) 5)
> (+ 2 2)
5

Best GCC warning flags for compiling C++

A recent discussion on ACCU-general gave people an opportunity to share the warning flags they like to use with g++.

I thought I’d write down the consensus as I understood it, mainly for my own reference:

-Wredundant-decls
-Wcast-align
-Wmissing-declarations
-Wmissing-include-dirs
-Wswitch-enum
-Wswitch-default
-Wextra
-Wall
-Werror
-Winvalid-pch
-Wredundant-decls
-Wmissing-prototypes
-Wformat=2
-Wmissing-format-attribute
-Wformat-nonliteral

We were advised by Jonathan Wakely that -Weffc++ is not very useful since it is mostly based on the first edition of the book Effective C++, many of whose recommendations were improved in the second edition, and also apparently GCC doesn’t do a great job of warning about them.

Update: thanks to an article[1] by Roger Orr in CVu these flags are highly recommended in GCC 5.2+:

-flto
-Wodr

[1] Orr, Roger “One Definition Rule“, in CVu Volume 27, Issue 5 p16 (editor: Steve Love)

Simple template programming

C++ template meta-programming sounds harder than it is.

So long as you can fight through some horrific syntax and understand recursive functions, you can write any algorithm you like and have it run at compile time in your C++ programs.

Slides: Simple Template Programming

Andrei Alexandrescu’s amazing book on using template meta-programming for really useful, cool stuff is: Modern C++ Design.

Avoid mocks by refactoring to functional

At this week’s ACCU Conference I went to an excellent talk by Sven Rosvall entitled “Unit Testing Beyond Mock Objects”.

The talk covered the newer Java and C# unit testing frameworks that allow inserting mock objects even where legacy code is using some undesired dependency directly, meaning there is no seam where you can insert a different implementation.

These tools solve a real problem, and could be useful.

However, I want to move the discussion in a different direction: can we avoid mocking altogether, and end up with better code?

Sven gave us an example of some legacy code a little bit like this. (I translated into C++ just for fun.)

// Return true if it's between 12:00 and 14:00 now.
bool is_it_lunch_time_now()
{
    system_clock::time_point now = system_clock::now();
    time_t tt = system_clock::to_time_t( now );
    tm local_tm = ( *localtime( &tt ) );
    int hour = local_tm.tm_hour;

    return (
        12 <= hour      &&
              hour < 14
    );
}

To test this code, we would have something like this:

TEST_CASE( "Now: It's lunchtime now (hopefully)" )
{
    // REQUIRE( is_it_lunch_time_now() ); // NOTE: only  run at lunch time!
    REQUIRE( !is_it_lunch_time_now() );   // NOTE: never run at lunch time!
}

Which is agony. But the production code is fine:

if ( is_it_lunch_time_now() )
{
    eat( sandwich );
}

So, the normal way to allow mocking out a dependency like this would be to add an interface, making our lunch-related code take in a TimeProvider. To avoid coupling the choice of which TimeProvider is used to the calling code, we pass it into the constructor of a LunchCalculator that we plan to make early on:

class TimeProvider
{
public:
    virtual tm now_local() const = 0;
};

class RealTimeProvider : public TimeProvider
{
public:
    RealTimeProvider()
    {
    }
    virtual tm now_local() const
    {
        system_clock::time_point now = system_clock::now();
        time_t tt = system_clock::to_time_t( now );
        return ( *localtime( &tt ) );
    }
};

class HardCodedHourTimeProvider : public TimeProvider
{
public:
    HardCodedHourTimeProvider( int hour )
    {
        tm_.tm_hour = hour;
    }
    virtual tm now_local() const
    {
        return tm_;
    }
private:
    tm tm_;
};

class LunchCalc
{
public:
    LunchCalc( const TimeProvider& prov )
    : prov_( prov )
    {
    }
    bool is_it_lunch_time()
    {
        int hour = prov_.now_local().tm_hour;
        return (
            12 <= hour      &&
                  hour < 14
        );
    }
private:
    const TimeProvider& prov_;
};

and now we can write tests like this:

TEST_CASE( "TimeProvider: Calculate lunch time when it is" )
{
    HardCodedHourTimeProvider tp( 13 ); // 1pm (lunch time!)
    REQUIRE( LunchCalc( tp ).is_it_lunch_time() );
}

TEST_CASE( "TimeProvider: Calculate lunch time when it isn't" )
{
    HardCodedHourTimeProvider tp( 10 ); // 10am (not lunch :-( )
    REQUIRE( ! LunchCalc( tp ).is_it_lunch_time() );
}

Innovatively, these tests will pass at all times of day.

However, look at the price we’ve had to pay: 4 new classes, inheritance, and class names ending with unKevlinic words like “Provider” and “Calculator”. (Even worse, in a vain attempt to hide my embarrassment I abbreviated to “Calc”.)

We’ve paid a bit of a price in our production code too:

    // Near the start of the program:
    config.time_provider = RealTimeProvider();

    // Later, with config passed via PfA
    if ( LunchCalc( config.time_provider ).is_it_lunch_time() )
    {
        eat( sandwich );
    }

The code above where we create the RealTimeProvider is probably not usefully testable, and the class RealTimeProvider is also probably not usefully testable (or safely testable in the case of some more dangerous dependency). The rest of this code is testable, but there is a lot of it, isn’t there?

The advantage of this approach is that we have been driven to a better structure. We can now switch providers on startup by providing config, and even code way above this stuff can be fully exercised safe in the knowledge that no real clocks were poked at any time.

But. Sometimes don’t you ever think this might be better?

tm tm_local_now()  // Not testable
{
    system_clock::time_point now = system_clock::now();
    time_t tt = system_clock::to_time_t( now );
    return ( *localtime( &tt ) );
}

bool is_lunch_time( const tm& time )  // Pure functional - eminently testable
{
    int hour = time.tm_hour;
    return (
        12 <= hour      &&
              hour < 14
    );
}

The tests look like this:

TEST_CASE( "Functional: Calculate lunch time when it is" )
{
    tm my_tm;
    my_tm.tm_hour = 13; // 1pm (lunch time!)
    REQUIRE( is_lunch_time( my_tm ) );
}

TEST_CASE( "Functional: Calculate lunch time when it isn't" )
{
    tm my_tm;
    my_tm.tm_hour = 10; // 10am (not lunch time :-( )
    REQUIRE( ! is_lunch_time( my_tm ) );
}

and the production code looks very similar to what we had before:

if ( is_lunch_time( tm_local_now() ) )
{
    eat( sandwich );
}

Where possible, let’s not mock stuff we can just test as pure, functional code.

Of course, the production code shown here is now untestable, so there may well be work needed to avoid calling it in a test. That work may involve a mock. Or we may find a nicer way to avoid it.

Goodness in programming languages, part 4 – Ownership & Memory

Posts in this series: Syntax, Deployment, Metaprogramming, Ownership

There is often a trade-off between programming language features and how fast (and predictably) the programs run. From web sites that serve millions of visitors to programs running on small devices we need to be able to make our programs run quickly.

One trade-off that is made in many modern programming languages (including Python, Ruby, C#, Java and JVM-based languages) is that the system owns all the memory. This avoids the need for the programmer to think about how long pieces of memory need to live, but it means a lot of memory can hang around a lot longer than it really needs to. In addition, it can mean the CPU has to jump around to lots of different memory locations to find pieces of dynamically-allocated memory in different locations. Where this jumping around causes caches to be invalidated that can really slow things down.

While these garbage collection-based languages have been evolving, C++ has been developing along a different track. C++ allows the programmer to allocate and free up memory manually (as in C), but over time the community of C++ programmers has been developing a new way of thinking about memory, and developing tools in the C++ language to make it easier to work in this way.

Modern C++ code rarely or never uses “delete” or “free” to deallocate memory, but instead defines clearly which object owns each other object. When the owning object is no longer needed, everything it owns can be deleted, immediately freeing their memory. The top-level objects are owned by the current scope, so when the function or block of code we are in ends, the system knows these objects and the ones they own can be deleted. Objects that last for the whole life of the program are owned by the scope of the main function or equivalent.

One advantage of explicit ownership is that the right thing happens automatically when something unexpected happens (e.g. an exception is thrown, or we return early from a function). Because the objects are owned by a scope, as soon as we exit that scope they are automatically deleted, and no memory is “leaked”.

Because ownership is explicit, we can often group owned objects in memory immediately next to the objects that own them. This means we jump around to different memory locations less often, and we have to do less work to find and delete regions of memory. This makes our programs faster.

Here are some things I like:

  • Modern C++’s clarity about who owns what. By expressing ownership explicitly we make clear our intentions, and avoid memory leaks.
  • Modern C++’s fast and cache-friendly memory handling. Allocating memory for several objects together reduces time spent looking for space, and means caches are more likely to be used.

In my experience, the most frequent performance problems I have had to solve have really been memory problems. Explicit ownership can reduce unnecessary memory management overhead by taking back the work from the system (the garbage collector) and allowing programmers to be explicit about who owns what.