FreeGuide 0.10.8

I am still working slowly on moving FreeGuide forward. Somehow it seems my itches for FreeGuide are all about making it less annoying for people who are trying it the first time. I guess this is motivated by my desire for world domination.

Anyway, we are one small step closer to my mum being able to use FreeGuide – when the “Choose channels” step (i.e. the XMLTV grabber configuration) goes wrong, you can now see a real genuine error message, and hopefully figure out what went wrong.

Actually, it always used to work that way but the error-catching got refactored away at some point. Anyway, I am slowly taking the ground back…

As I do more and more test-driven development at work I am becoming completely addicted. For this FreeGuide code I wrote a couple of unit tests but they are not within a proper framework, and can’t be launched easily as a test suite. I am considering JUnit.

I also want to set up some component-level tests e.g. for downloading listings for each country and checking everything works as expected. It’s brilliant fun having tests in place, but when you have as little time as I have for FreeGuide at the moment, it’s difficult to decide to spend a long time working on a test framework when I could be fixing a “real user problem” or adding a cool new feature.

But I’ve got the testing bug badly, so watch this space.

Debugging memory use and fragmentation on Windows using Address Space Monitor

At work at the moment we are putting a lot of effort into making our program not crash. Sensible, eh?

It’s crashing because a) it uses an enormous amount of memory and b) it tends to fragment your remaining memory. One of the characteristics of this program is that from time to time it needs a large contiguous chunk of memory so that it can pass a big bit of XML to someone else. This tends to be a problem when your memory is fragmented.

For example today I was running it, and it was using 1GB of memory (+about 0.3GB reserved), leaving 0.7GB free in the 2GB address space. However, the largest available contiguous block of memory was about 60MB.

The long-term answer is obviously to use less memory, but in the shorter term we have had quite a bit of success by writing code that works around the fragmentation. For example, we have a custom mutable string-like class that can store its chars in several separate blocks of memory instead of all in one contiguous block.

Debugging this kind of thing is always a very difficult process. For me it has transformed from an almost-impossible task into a tractible one because of Charles Bailey‘s Address Space Monitor tool. This gives you a clear picture of your process’ address space, and version 0.6a adds recording functionality that makes reports to impress managers a lot easier to produce.

Without it, we wouldn’t know where to start.

C++ is an expert language

Update: I should point out at the beginning that I love C++. Anything below which sounds bitter or critical is borne of a deep and growing love. C++ is a journey into worlds of beauty and strength.

I assert that C++ is an expert language. What do I mean?

You shouldn’t be allowed to use C++ in anger unless you’ve used it for 2 years in anger.

Aside: what should we do about this?

In practice this means that no-one should recruit newbie C++ developers.

The only alternative is to have some kind of apprenticeship system, where all the code written by a newbie is re-written by their mentor for about 2 years. This is a great learning experience, and could weed out people with insufficient capacity for humility to be a C++ programmer. (Note I say capacity for humility, because the actual humility will be provided by the constant crushing of your spirit provided by someone tearing your code apart every day.)

In Java, to create an “array”, and add an object to it, you do this:

MyObj obj = new MyObj();
ArrayList<MyObj> arr = new ArrayList<MyObj>();
arr.add( obj );

In Python, you do this:

obj = MyObj()
arr = []
arr.append( obj )

In C, you do this:

MyStruct obj;
MyStruct arr[50] = { obj };

In Perl, you do this:

my $obj = MyObj->new();
my @arr;
push(@arr,$obj); 

In Pascal, you do this:

var
    arr : ARRAY [1..50] of int;
begin
    arr[1] := 7;
end

In Haskell, you might do something like this (thanks to Neil Mitchell):

[myObj]

Don’t get het up about the fact that these examples do different things: my point is, they are reasonable ways of performing the task (add something to an array) in the languages chosen. (Please do send in corrections, though – none of these were checked and they are probably wrong.)

Note that the C example hides a little more complexity because you need to make sure you tidy up your memory afterwards.

Now, what do you do in C++? It’s just as easy, right?

MyObj obj;
std::vector<MyObj> arr;
arr.push_back( obj );

WRONG!

In the examples above, you don’t need to know what is going on under the covers.

In fact, in general I suggest there are broadly two types of programming language around at the moment: those where you have to know how things work, but where how things work is quite simple (e.g. C, assembly languages, maybe FORTRAN and COBOL) and those which isolate you from how things work (all the rest?).

Where does C++ fit in to this scheme? It is in the unique position of being a language which has incredibly complex things going on under the covers, and you have to know about it!

What do I mean by saying you have to know what’s going on under the covers?

Let’s look at our example again, and ask this question: what types of object can you put in the array? In the other programming languages above, you can essentially put any “normal” objects (where normal is different for each example) into the array.

In C++, here are some of the rules you need to understand about objects you can put into std::vector. You should understand these before you try using std::vector. If you can’t understand them, you should think hard until you do.

Default constructor

If you want to give the size of the vector when you create it (or resize it later), your object must have a default constructor [Stroustrup §16.3.4].

(Note: if you don’t define any other constructors, the compiler will automatically define a default constructor for you (which may or may not do what you want). If you do, the default constructor is the one that can be called without any arguments [Stroustrup §10.4.2].

Example:

$ cat default_constructor_required.cpp
#include <vector>

class MyObject
{
public:
        MyObject( int num )
        {
        }
};

int main( int argc, const char* argv[] )
{
        std::vector<MyObject> arr( 5 );
}

$ g++ default_constructor_required.cpp
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:
In constructor ‘std::vector<_Tp, _Alloc>::vector(size_t) [with
_Tp = MyObject, _Alloc = std::allocator<MyObject>]’:
default_constructor_required.cpp:12:   instantiated from here
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:219:
error: no matching function for call to ‘MyObject::MyObject()’
default_constructor_required.cpp:5: note: candidates are: MyObject::MyObject(int)
default_constructor_required.cpp:4: note:                 MyObject::MyObject(const MyObject&)

Nice error message, eh?

(Note: if your array contains a built-in type (e.g. int) it will be initialised to 0 in its default constructor [Stroustrup §4.9.5].

If you don’t want to give the size of the vector when you create it (not recommended), then you don’t need a default constructor in your object [Stroustrup §16.3.4].

Copy constructor

Your object must also have a copy constructor. Example:

$ cat copy_constructor_required.cpp
#include <vector>

class MyObject
{
public:
        MyObject()
        {
        }
private:
        MyObject( const MyObject& );
};

int main( int argc, const char* argv[] )
{
        std::vector<MyObject> arr( 5 );
}

$ g++ copy_constructor_required.cpp
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h: In
constructor ‘std::vector<_Tp, _Alloc>::vector(size_t) [with _Tp = MyObject,
_Alloc = std::allocator<MyObject>]’:
copy_constructor_required.cpp:15:   instantiated from here
copy_constructor_required.cpp:10: error: ‘MyObject::MyObject(const MyObject&)’ is private
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:219:
error: within this context
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_construct.h:
In function ‘void std::_Construct(_T1*, const _T2&) [with _T1 = MyObject, _T2 = MyObject]’:
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_uninitialized.h:194:
   instantiated from ‘void std::__uninitialized_fill_n_aux(_ForwardIterator, _Size,
 const _Tp&, __false_type) [with _ForwardIterator = MyObject*, _Size = unsigned int, _Tp = MyObject]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_uninitialized.h:218:
   instantiated from ‘void std::uninitialized_fill_n(_ForwardIterator, _Size,
 const _Tp&) [with _ForwardIterator = MyObject*, _Size = unsigned int, _Tp = MyObject]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_uninitialized.h:310:
   instantiated from ‘void std::__uninitialized_fill_n_a(_ForwardIterator, _Size,
 const _Tp&, std::allocator<_Tp2>) [with _ForwardIterator = MyObject*, _Size
 = unsigned int, _Tp = MyObject, _Tp2 = MyObject]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:219:
   instantiated from ‘std::vector<_Tp, _Alloc>::vector(size_t) [with _Tp =
 MyObject, _Alloc = std::allocator<MyObject>]’
copy_constructor_required.cpp:15:   instantiated from here
copy_constructor_required.cpp:10: error: ‘MyObject::MyObject(const
 MyObject&)’ is private
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_construct.h:81:
 error: within this context

(Aside: this program is 181 bytes, and the error message is 1950 bytes.)

Assignment operator

You also need operator=. Example:

$ cat assignment_operator_required.cpp
#include <vector>

class MyObject
{
private:
        MyObject& operator=( const MyObject& );
};

int main( int argc, const char* argv[] )
{
        MyObject obj;
        std::vector<MyObject> arr;
        arr.push_back( obj );
}

$ g++ assignment_operator_required.cpp
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/vector.tcc:
In member function ‘void std::vector<_Tp,
_Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename _Alloc::pointer,
std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = MyObject, _Alloc =
std::allocator<MyObject>]’:
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:610:
instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with
_Tp = MyObject, _Alloc = std::allocator<MyObject>]’
assignment_operator_required.cpp:13: instantiated from here
assignment_operator_required.cpp:6: error: ‘MyObject&
MyObject::operator=(const MyObject&)’ is private
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/vector.tcc:260:
error: within this context
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_algobase.h:
In static member function ‘static _BI2 std::__copy_backward<_BoolType,
std::random_access_iterator_tag>::copy_b(_BI1, _BI1, _BI2) [with _BI1 =
MyObject*, _BI2 = MyObject*, bool _BoolType = false]’:
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_algobase.h:443:
instantiated from ‘_BI2 std::__copy_backward_aux(_BI1, _BI1, _BI2) [with _BI1
= MyObject*, _BI2 = MyObject*]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_algobase.h:482:
instantiated from ‘static _BI2 std::__copy_backward_normal<true,
true>::copy_b_n(_BI1, _BI1, _BI2) [with _BI1 =
__gnu_cxx::__normal_iterator<MyObject*, std::vector<MyObject,
std::allocator<MyObject> > >, _BI2 = __gnu_cxx::__normal_iterator<MyObject*,
std::vector<MyObject, std::allocator<MyObject> > >]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_algobase.h:517:
instantiated from ‘_BI2 std::copy_backward(_BI1, _BI1, _BI2) [with _BI1 =
__gnu_cxx::__normal_iterator<MyObject*, std::vector<MyObject,
std::allocator<MyObject> > >, _BI2 = __gnu_cxx::__normal_iterator<MyObject*,
std::vector<MyObject, std::allocator<MyObject> > >]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/vector.tcc:257:
instantiated from ‘void std::vector<_Tp,
_Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename _Alloc::pointer,
std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = MyObject, _Alloc =
std::allocator<MyObject>]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:610:
instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with
_Tp = MyObject, _Alloc = std::allocator<MyObject>]’
assignment_operator_required.cpp:13: instantiated from here
assignment_operator_required.cpp:6: error: ‘MyObject&
MyObject::operator=(const MyObject&)’ is private
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_algobase.h:412:
error: within this context

(Aside: program is 202 bytes, error is 2837 bytes.)

Excuses

Don’t give me that “the compiler will provide them for you” excuse. What the compiler provides is often wrong, unless you’ve been careful to ensure you don’t own any members by holding pointers to them: i.e. if you’ve fully understood the problem I am setting out and avoided the pitfalls.

Conclusion

I assert that C++ is an expert language. Quite apart from the fact that the method names on STL objects use archane phrases like “push_back” rather than “add”, and the error messages you get from popular compilers are huge and almost incomprehensible, my main point is that you have to understand the basics of how the standard library is implemented, before you can use it. This is expert behaviour.

I have illustrated this point by showing what you need to know to use the standard resizeable array type in C++. You need to know a lot.

More on whether the fact that C++ is an expert language is a bad thing, later.

Update: simplified the C example thanks to Edmund’s suggestion.

Update 2: corrected the Java example thanks to Anon’s comment.

Update 3: corrected the Haskell example thanks to Neil Mitchell’s comment.

Templated test code?

At work at the moment, as part of an initiative to get with the 21st century, we are waking up to testing our code.

Thus, I am writing a lot of unit tests for old code, which can be soul-destroyingly repetitive and very pointless-feeling (even though really I do see a great value in the end result – tested code is refactorable code).

Often, tests have a lot in common with each other, so it feels right to reduce code repetition, and factor things into functions etc. The Right Way of doing this is to leave your tests as straightforward as possible, with preferably no code branches at all, just declarative statements.

Contemplating writing unit tests for the same method on 20+ very similar classes, using a template function “feels” right, for normal code values of “feel”. However, for test code, maybe it’s wrong?

My question is: is it ok to write a test function like this?:

void test_all_thingies()
{
    test_One_Thingy<Thingy1>();
    test_One_Thingy<Thingy2>();
    test_One_Thingy<Thingy3>();
    test_One_Thingy<Thingy4>();
}

template< class T >
void test_One_Thingy()
{
    T thingy;
    thingy.doSomething();
    TEST_ASSERT( thingy.isSomething() );
}

Worse still, is this ok?

void test_all_thingies()
{
    test_One_Thingy<Thingy1>( "Thingy1 expected output" );
    test_One_Thingy<Thingy2>( "Thingy2 expected output" );
    test_One_Thingy<Thingy3>( "Thingy3 expected output" );
    test_One_Thingy<Thingy4>( "Thingy4 expected output" );
}

template< class T >
void test_One_Thingy( std::string expected_output )
{
    T thingy;
    thingy.doSomething();
    TEST_ASSERT( thingy.getOutput() == expected_output );
}

Reasons for: otherwise I’m going to be writing huge amounts of copy-pasted code (unless someone can suggest a better way?).

Reasons against: how clear is it going to be which class failed the test when it fails?

Update: fixed unescaped diagonal brackets.