Talk in code

Last week we had an extended discussion at work about how we were going to implement a specific feature.

This discussion hijacked our entire Scrum sprint planning meeting (yes, I know, we should have time-boxed it). It was painful, but the guy who was going to implement it (yes, I know, we should all collectively own our tasks) needed the discussion: otherwise it wasn’t going to get implemented. It certainly wasn’t going to get broken into short tasks until we knew how we were going to do it.

Anyway, asides aside, I came out of that discussion bruised but triumphant. We had a plan not only on how to write the code, but also how to test it. I believe the key thing that slowly led the discussion from a FUD-throwing contest into a constructive dialogue was the fact that we began to talk in code.

There are two facets to this principle:

1. Show me the code

As Linus once said, “Talk is cheap. Show me the code.“.

If you are at all disagreeing about how what you’re doing will work, open up the source files in question. Write example code – modify the existing methods or sketch a new one. Outline the classes you will need. Code is inherently unambiguous. White board diagrams and hand-waving are not.

Why wouldn’t you do this? Fear you might be wrong? Perhaps you should have phrased your argument a little less strongly?

Is this slower than drawing boxes on a whiteboard? Not if you include time spent resolving the confusion caused by the ambiguities inherent in line drawings.

Does UML make whiteboards less ambiguous? Yes, if all your developers can be bothered to learn it. But why learn a new language when you can communicate using the language you all speak all day – code?

2. Create a formal language to describe the problem

If your problem is sufficiently complex, you may want to codify the problem into a formal (text-based) language.

In last week’s discussion we were constantly bouncing back and forth between different corner cases until we started writing them down in a formal language.

The language I chose was an adaptation of a Domain-specific language I wrote to test a different part of our program. I would love to turn the cases we wrote down in that meeting into real tests that run after every build (in fact I am working on it) but their immediate value was to turn very confusing “what-if”s into concrete cases we could discuss.

Before we started using the formal language, the conversations went something like this:

Developer: “If we implement it like that, this bad thing will happen.”

Manager: “That’s fine – it’s a corner case that we can tidy up later if we need it.”

Developer: (Muttering) “He clearly doesn’t understand what I mean.”

Repeat

After we started using the formal language they went something like this:

Developer: “If we implement it like that, this bad thing will happen.”

Me: “Write it down, I tell you.”

Developer: (Typing) “See, this will happen!”

Manager: “That’s fine – it’s a corner case that we can tidy up later if we need it.”

Developer: (Muttering) “Flipping managers.”

Summary

The conversation progresses if all parties believe the others understand what they are saying. It is not disagreement that paralyses conversations – it is misunderstanding.

To avoid misunderstanding, talk in code – preferably a real programming language, but if that’s too verbose, a text-based code that is unambiguous and understood by everyone involved.

Note on whiteboards

You can’t copy and paste them, and you can’t (easily) keep what you did with them, and you can’t use them to communicate over long distances.

And don’t even try and suggest an electronic whiteboard. In a few years they may solve all of the above problems, but not now. They fail the “can I draw stuff?” test at the moment.

Even when electronic whiteboards solve those problems, they won’t solve the fact that lines and boxes are more ambiguous and less detailed than code in text form.

If you all know and like UML, that makes your diagrams less ambiguous, but still they often don’t allow enough detail: why bother?

Debugging memory use and fragmentation on Windows using Address Space Monitor

At work at the moment we are putting a lot of effort into making our program not crash. Sensible, eh?

It’s crashing because a) it uses an enormous amount of memory and b) it tends to fragment your remaining memory. One of the characteristics of this program is that from time to time it needs a large contiguous chunk of memory so that it can pass a big bit of XML to someone else. This tends to be a problem when your memory is fragmented.

For example today I was running it, and it was using 1GB of memory (+about 0.3GB reserved), leaving 0.7GB free in the 2GB address space. However, the largest available contiguous block of memory was about 60MB.

The long-term answer is obviously to use less memory, but in the shorter term we have had quite a bit of success by writing code that works around the fragmentation. For example, we have a custom mutable string-like class that can store its chars in several separate blocks of memory instead of all in one contiguous block.

Debugging this kind of thing is always a very difficult process. For me it has transformed from an almost-impossible task into a tractible one because of Charles Bailey‘s Address Space Monitor tool. This gives you a clear picture of your process’ address space, and version 0.6a adds recording functionality that makes reports to impress managers a lot easier to produce.

Without it, we wouldn’t know where to start.

C++ is an expert language

Update: I should point out at the beginning that I love C++. Anything below which sounds bitter or critical is borne of a deep and growing love. C++ is a journey into worlds of beauty and strength.

I assert that C++ is an expert language. What do I mean?

You shouldn’t be allowed to use C++ in anger unless you’ve used it for 2 years in anger.

Aside: what should we do about this?

In practice this means that no-one should recruit newbie C++ developers.

The only alternative is to have some kind of apprenticeship system, where all the code written by a newbie is re-written by their mentor for about 2 years. This is a great learning experience, and could weed out people with insufficient capacity for humility to be a C++ programmer. (Note I say capacity for humility, because the actual humility will be provided by the constant crushing of your spirit provided by someone tearing your code apart every day.)

In Java, to create an “array”, and add an object to it, you do this:

MyObj obj = new MyObj();
ArrayList<MyObj> arr = new ArrayList<MyObj>();
arr.add( obj );

In Python, you do this:

obj = MyObj()
arr = []
arr.append( obj )

In C, you do this:

MyStruct obj;
MyStruct arr[50] = { obj };

In Perl, you do this:

my $obj = MyObj->new();
my @arr;
push(@arr,$obj); 

In Pascal, you do this:

var
    arr : ARRAY [1..50] of int;
begin
    arr[1] := 7;
end

In Haskell, you might do something like this (thanks to Neil Mitchell):

[myObj]

Don’t get het up about the fact that these examples do different things: my point is, they are reasonable ways of performing the task (add something to an array) in the languages chosen. (Please do send in corrections, though – none of these were checked and they are probably wrong.)

Note that the C example hides a little more complexity because you need to make sure you tidy up your memory afterwards.

Now, what do you do in C++? It’s just as easy, right?

MyObj obj;
std::vector<MyObj> arr;
arr.push_back( obj );

WRONG!

In the examples above, you don’t need to know what is going on under the covers.

In fact, in general I suggest there are broadly two types of programming language around at the moment: those where you have to know how things work, but where how things work is quite simple (e.g. C, assembly languages, maybe FORTRAN and COBOL) and those which isolate you from how things work (all the rest?).

Where does C++ fit in to this scheme? It is in the unique position of being a language which has incredibly complex things going on under the covers, and you have to know about it!

What do I mean by saying you have to know what’s going on under the covers?

Let’s look at our example again, and ask this question: what types of object can you put in the array? In the other programming languages above, you can essentially put any “normal” objects (where normal is different for each example) into the array.

In C++, here are some of the rules you need to understand about objects you can put into std::vector. You should understand these before you try using std::vector. If you can’t understand them, you should think hard until you do.

Default constructor

If you want to give the size of the vector when you create it (or resize it later), your object must have a default constructor [Stroustrup §16.3.4].

(Note: if you don’t define any other constructors, the compiler will automatically define a default constructor for you (which may or may not do what you want). If you do, the default constructor is the one that can be called without any arguments [Stroustrup §10.4.2].

Example:

$ cat default_constructor_required.cpp
#include <vector>

class MyObject
{
public:
        MyObject( int num )
        {
        }
};

int main( int argc, const char* argv[] )
{
        std::vector<MyObject> arr( 5 );
}

$ g++ default_constructor_required.cpp
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:
In constructor ‘std::vector<_Tp, _Alloc>::vector(size_t) [with
_Tp = MyObject, _Alloc = std::allocator<MyObject>]’:
default_constructor_required.cpp:12:   instantiated from here
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:219:
error: no matching function for call to ‘MyObject::MyObject()’
default_constructor_required.cpp:5: note: candidates are: MyObject::MyObject(int)
default_constructor_required.cpp:4: note:                 MyObject::MyObject(const MyObject&)

Nice error message, eh?

(Note: if your array contains a built-in type (e.g. int) it will be initialised to 0 in its default constructor [Stroustrup §4.9.5].

If you don’t want to give the size of the vector when you create it (not recommended), then you don’t need a default constructor in your object [Stroustrup §16.3.4].

Copy constructor

Your object must also have a copy constructor. Example:

$ cat copy_constructor_required.cpp
#include <vector>

class MyObject
{
public:
        MyObject()
        {
        }
private:
        MyObject( const MyObject& );
};

int main( int argc, const char* argv[] )
{
        std::vector<MyObject> arr( 5 );
}

$ g++ copy_constructor_required.cpp
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h: In
constructor ‘std::vector<_Tp, _Alloc>::vector(size_t) [with _Tp = MyObject,
_Alloc = std::allocator<MyObject>]’:
copy_constructor_required.cpp:15:   instantiated from here
copy_constructor_required.cpp:10: error: ‘MyObject::MyObject(const MyObject&)’ is private
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:219:
error: within this context
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_construct.h:
In function ‘void std::_Construct(_T1*, const _T2&) [with _T1 = MyObject, _T2 = MyObject]’:
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_uninitialized.h:194:
   instantiated from ‘void std::__uninitialized_fill_n_aux(_ForwardIterator, _Size,
 const _Tp&, __false_type) [with _ForwardIterator = MyObject*, _Size = unsigned int, _Tp = MyObject]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_uninitialized.h:218:
   instantiated from ‘void std::uninitialized_fill_n(_ForwardIterator, _Size,
 const _Tp&) [with _ForwardIterator = MyObject*, _Size = unsigned int, _Tp = MyObject]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_uninitialized.h:310:
   instantiated from ‘void std::__uninitialized_fill_n_a(_ForwardIterator, _Size,
 const _Tp&, std::allocator<_Tp2>) [with _ForwardIterator = MyObject*, _Size
 = unsigned int, _Tp = MyObject, _Tp2 = MyObject]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:219:
   instantiated from ‘std::vector<_Tp, _Alloc>::vector(size_t) [with _Tp =
 MyObject, _Alloc = std::allocator<MyObject>]’
copy_constructor_required.cpp:15:   instantiated from here
copy_constructor_required.cpp:10: error: ‘MyObject::MyObject(const
 MyObject&)’ is private
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_construct.h:81:
 error: within this context

(Aside: this program is 181 bytes, and the error message is 1950 bytes.)

Assignment operator

You also need operator=. Example:

$ cat assignment_operator_required.cpp
#include <vector>

class MyObject
{
private:
        MyObject& operator=( const MyObject& );
};

int main( int argc, const char* argv[] )
{
        MyObject obj;
        std::vector<MyObject> arr;
        arr.push_back( obj );
}

$ g++ assignment_operator_required.cpp
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/vector.tcc:
In member function ‘void std::vector<_Tp,
_Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename _Alloc::pointer,
std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = MyObject, _Alloc =
std::allocator<MyObject>]’:
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:610:
instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with
_Tp = MyObject, _Alloc = std::allocator<MyObject>]’
assignment_operator_required.cpp:13: instantiated from here
assignment_operator_required.cpp:6: error: ‘MyObject&
MyObject::operator=(const MyObject&)’ is private
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/vector.tcc:260:
error: within this context
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_algobase.h:
In static member function ‘static _BI2 std::__copy_backward<_BoolType,
std::random_access_iterator_tag>::copy_b(_BI1, _BI1, _BI2) [with _BI1 =
MyObject*, _BI2 = MyObject*, bool _BoolType = false]’:
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_algobase.h:443:
instantiated from ‘_BI2 std::__copy_backward_aux(_BI1, _BI1, _BI2) [with _BI1
= MyObject*, _BI2 = MyObject*]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_algobase.h:482:
instantiated from ‘static _BI2 std::__copy_backward_normal<true,
true>::copy_b_n(_BI1, _BI1, _BI2) [with _BI1 =
__gnu_cxx::__normal_iterator<MyObject*, std::vector<MyObject,
std::allocator<MyObject> > >, _BI2 = __gnu_cxx::__normal_iterator<MyObject*,
std::vector<MyObject, std::allocator<MyObject> > >]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_algobase.h:517:
instantiated from ‘_BI2 std::copy_backward(_BI1, _BI1, _BI2) [with _BI1 =
__gnu_cxx::__normal_iterator<MyObject*, std::vector<MyObject,
std::allocator<MyObject> > >, _BI2 = __gnu_cxx::__normal_iterator<MyObject*,
std::vector<MyObject, std::allocator<MyObject> > >]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/vector.tcc:257:
instantiated from ‘void std::vector<_Tp,
_Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename _Alloc::pointer,
std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = MyObject, _Alloc =
std::allocator<MyObject>]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:610:
instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with
_Tp = MyObject, _Alloc = std::allocator<MyObject>]’
assignment_operator_required.cpp:13: instantiated from here
assignment_operator_required.cpp:6: error: ‘MyObject&
MyObject::operator=(const MyObject&)’ is private
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_algobase.h:412:
error: within this context

(Aside: program is 202 bytes, error is 2837 bytes.)

Excuses

Don’t give me that “the compiler will provide them for you” excuse. What the compiler provides is often wrong, unless you’ve been careful to ensure you don’t own any members by holding pointers to them: i.e. if you’ve fully understood the problem I am setting out and avoided the pitfalls.

Conclusion

I assert that C++ is an expert language. Quite apart from the fact that the method names on STL objects use archane phrases like “push_back” rather than “add”, and the error messages you get from popular compilers are huge and almost incomprehensible, my main point is that you have to understand the basics of how the standard library is implemented, before you can use it. This is expert behaviour.

I have illustrated this point by showing what you need to know to use the standard resizeable array type in C++. You need to know a lot.

More on whether the fact that C++ is an expert language is a bad thing, later.

Update: simplified the C example thanks to Edmund’s suggestion.

Update 2: corrected the Java example thanks to Anon’s comment.

Update 3: corrected the Haskell example thanks to Neil Mitchell’s comment.

Templated test code?

At work at the moment, as part of an initiative to get with the 21st century, we are waking up to testing our code.

Thus, I am writing a lot of unit tests for old code, which can be soul-destroyingly repetitive and very pointless-feeling (even though really I do see a great value in the end result – tested code is refactorable code).

Often, tests have a lot in common with each other, so it feels right to reduce code repetition, and factor things into functions etc. The Right Way of doing this is to leave your tests as straightforward as possible, with preferably no code branches at all, just declarative statements.

Contemplating writing unit tests for the same method on 20+ very similar classes, using a template function “feels” right, for normal code values of “feel”. However, for test code, maybe it’s wrong?

My question is: is it ok to write a test function like this?:

void test_all_thingies()
{
    test_One_Thingy<Thingy1>();
    test_One_Thingy<Thingy2>();
    test_One_Thingy<Thingy3>();
    test_One_Thingy<Thingy4>();
}

template< class T >
void test_One_Thingy()
{
    T thingy;
    thingy.doSomething();
    TEST_ASSERT( thingy.isSomething() );
}

Worse still, is this ok?

void test_all_thingies()
{
    test_One_Thingy<Thingy1>( "Thingy1 expected output" );
    test_One_Thingy<Thingy2>( "Thingy2 expected output" );
    test_One_Thingy<Thingy3>( "Thingy3 expected output" );
    test_One_Thingy<Thingy4>( "Thingy4 expected output" );
}

template< class T >
void test_One_Thingy( std::string expected_output )
{
    T thingy;
    thingy.doSomething();
    TEST_ASSERT( thingy.getOutput() == expected_output );
}

Reasons for: otherwise I’m going to be writing huge amounts of copy-pasted code (unless someone can suggest a better way?).

Reasons against: how clear is it going to be which class failed the test when it fails?

Update: fixed unescaped diagonal brackets.