NNDB 0.1

I’ve managed to get NNDB, my C++ data storage library which is almost, but not entirely unlike SQL, into a fit state for a release.

You can create tables, set indices on columns, insert data, retrieve data using something like a SELECT, filter it using something like WHERE (which uses indices where available), and order it using something like an ORDER BY.

So far it has been a fantastic way to get my hands dirty with some Template metaprogramming, and some C++ as it should be*, but the reason why I started this was to help me think about how databases work, so I’m really looking forward to getting into how to implement JOINs. At the moment I have only very vague ideas.

NNDB is based heavily on the STL (part of C++’s standard library), BOOST (a playground for things that might one day be in C++’s standard library, and hang-out for some of the cleverest people alive), and Loki (the continuation of Andrei Alexandrescu’s Template metaprogramming (but used for good, not evil?) library written for and explained in Modern C++ Design. This book ranks in the top 5 most exciting books I have read). I continue to be more impressed by all three the more I learn.

I have even been having discussions with the Loki devs about some code I needed for NNDB that I think might be helpful for other people using Loki. It’s called ForEachType and it allows you to loop (at runtime) through all the types in a Typelist and do something for each one.

The project is already working in terms of helping me think about databases. For example, I really hadn’t thought before about how expensive ORDER BY is. To implement it I needed to create a temporary std::map covering the entire result set – in a real database this obviously requires reading every single row before we can even begin to return any results. The way to avoid this is to have an index. Which reminds me: the next thing I need to do is make ORDER BY able to use indices (at the moment it’s only WHEREs that take advantage of them).

So next on my list are:

  • ORDER By uses indices
  • Non-unique indices (presumably implemented with a std::multimap)
  • Joins

I am still very excited so you may see more releases over the next few months.

[* NNDB so far contains zero (0) calls to new and zero (0) calls to delete. Obviously the code it uses (e.g. std::vector) calls them, but that code manages all the memory for me, and most of it uses custom allocators to make it very fast. I have no idea how fast NNDB is, but maybe it could be quite fast. I am pretty confident it doesn’t contain any memory errors. Famous last words…]

NNDB’s Not a Database

My latest project is called NNDB.

I’ve worked with databases for quite a long time now, and for a while I’ve been thinking about how they work under the hood. I know very little about it, but I thought I could learn a bit by trying to implement something similar myself.

I’m interested in how queries work against joined tables, how to implement indices and so on.

I’ve also been feeling that I want to do some C++ as an open source project. I do it all day at work, and for some problems it feels like the right tool for the job.

NNDB is sort-of like an in-memory database, but it works with C++ types for its columns, instead of a fixed set like varchar, int etc. You can put your own value-typed classes in the columns, and all values are type-checked at compile time.

It’s always struck me as strange that with a traditional code+SQL setup you have to keep your SQL in sync with your code manually. Of course, there are lots of trendy Object-Relational-Mapping thingies that solve that problem, but I felt it could be approached from another direction: instead of generating code to match your data, or generating SQL to match your code, why not specify your data structure in code?

In NNDB you define a table something like this:

typedef nndb::Values< unsigned long, std::string, std::string, MyDate >
    PersonValues;

class PersonTable : public nndb::Table
{
public:
    enum Columns
    {
        id,
        first_name,
        last_name,
        date_of_birth
    };
};

Actually, defining your own class is unnecessary, but it’s nice to have an enum to name your columns, and making a class gives you a nice place to put it.

To insert a row you do something like this:

PersonTable person_table;
person_table.Insert( PersonValues( 0,
    "Andy", "Balaam", MyDate( 12000000 ) ) );

You can do simple queries with WHERE and ORDER BY clauses, and I’m working on indexes.

After that will come JOINs, and anything else that takes my fancy.

I don’t anticipate NNDB being useful to anyone – it’s really for me to understand why things are as they are in the world of databases. However, you never know – it may turn out to be a fast and convenient way to store data in the C++ world. I think some of the applications that use databases don’t really need the kind of concurrent multi-user network-accessible features they have, but really just want to search, join and store reliably, and NNDB might one day grow into something that can find a niche.

To explore more, check out the complete example.

Firefox keyword search for finding C++ keywords

I often want to search the SGI C++ reference for a keyword. The best way I have found to jump straight to the page I want is to use Google’s “I’m Feeling Lucky” search limited to searching within sgi.com.

You can create a Firefox keyword search to allow you to do this quickly from the location bar. Now I just type Ctrl-L then e.g. “c vector” to jump straight to the page about std::vector.

To do this, make a bookmark (to anything) and then right-click it in the Bookmarks menu and choose Properties. Edit it to look like this:

Firefox C++ search keyword bookmark

The Location field is set to http://www.google.com/search?as_q=%s&as_sitesearch=sgi.com&btnI=1, and the Keyword field is just c.

This post was largely for my own benefit to remember this next time I need to set it up, but I thought it might be useful to others as the information about how to set these searches up is not easy to find.

If you want to go to a google search results page, instead of jumping to the “I’m Feeling Lucky” result, remove the &btnI=1 part.

IGCC – a real-eval-print loop for C/C++

When you first hear about the Read-Eval-Print Loop you might well think “So what?” as I did.

What’s so great about being able to type commands interactively?

But the thing is that it creeps up on you.

Everyone already knows programming is an interactive thing – we need constant feedback to validate our ideas. Programming on paper is incredibly frustrating because you have to plough on with assumptions that are probably wrong.

It’s just so comfortable to be able to try out ideas in an interactive interpreter.

Foosball

I mean, it’s really not much hassle to create a new directory, make a new file, edit the file to contain the code you want to try, remember the right command to compile it, then run the program and see the results, is it?

Well, no, it isn’t, but it’s enough of a hassle that sometimes you don’t bother and you try it out in the code you are really working on, and if your work is like mine that means a minimum of 5 minutes to compile and link, and there you are playing foosball again when you could be getting something done.

The REPL gives you a place to try throwaway things extremely quickly, and when you’re working with something beautiful like Python it’s easy to get addicted.

So my mind started to wander and it struck me that a pale imitation of the REPL could be made for us poor C++ programmers, and it would generally serve the purposes I’ve described above.

So IGCC was born. Its name means “Interactive GCC” and it’s a read-eval-print loop for C++ (and, for most cases it will work for C too).

It uses the real GCC underneath, so you know you are running the exact code you would be (and it’s somewhat easier to write than a custom C/C++ interpreter) and all it does is take away the hassle of creating a simple program and compiling it with GCC.

It wraps your code in a standard C program, includes some common dependencies, and compiles it, printing the results of running them immediately. Using it looks like this:

$ ./igcc 
g++> int a = 5;
g++> a += 2;
g++> cout << a << endl;
7
g++> --a;
g++> cout << a << endl;
6
g++> 

Apart from all the sugar that I’d love to add, the main missing features are some kind of equivalent of the Python dir command, and code completion.

It’s not rocket science, but it might make you a little bit more interactive in your C and C++ coding, which might save you valuable foosball time.

Enjoy, improve, etc. IGCC.

Foosball image taken from http://en.wikipedia.org/wiki/File:Baby_foot_artlibre_jnl.jpg

C++ is an expert language

Update: I should point out at the beginning that I love C++. Anything below which sounds bitter or critical is borne of a deep and growing love. C++ is a journey into worlds of beauty and strength.

I assert that C++ is an expert language. What do I mean?

You shouldn’t be allowed to use C++ in anger unless you’ve used it for 2 years in anger.

Aside: what should we do about this?

In practice this means that no-one should recruit newbie C++ developers.

The only alternative is to have some kind of apprenticeship system, where all the code written by a newbie is re-written by their mentor for about 2 years. This is a great learning experience, and could weed out people with insufficient capacity for humility to be a C++ programmer. (Note I say capacity for humility, because the actual humility will be provided by the constant crushing of your spirit provided by someone tearing your code apart every day.)

In Java, to create an “array”, and add an object to it, you do this:

MyObj obj = new MyObj();
ArrayList<MyObj> arr = new ArrayList<MyObj>();
arr.add( obj );

In Python, you do this:

obj = MyObj()
arr = []
arr.append( obj )

In C, you do this:

MyStruct obj;
MyStruct arr[50] = { obj };

In Perl, you do this:

my $obj = MyObj->new();
my @arr;
push(@arr,$obj); 

In Pascal, you do this:

var
    arr : ARRAY [1..50] of int;
begin
    arr[1] := 7;
end

In Haskell, you might do something like this (thanks to Neil Mitchell):

[myObj]

Don’t get het up about the fact that these examples do different things: my point is, they are reasonable ways of performing the task (add something to an array) in the languages chosen. (Please do send in corrections, though – none of these were checked and they are probably wrong.)

Note that the C example hides a little more complexity because you need to make sure you tidy up your memory afterwards.

Now, what do you do in C++? It’s just as easy, right?

MyObj obj;
std::vector<MyObj> arr;
arr.push_back( obj );

WRONG!

In the examples above, you don’t need to know what is going on under the covers.

In fact, in general I suggest there are broadly two types of programming language around at the moment: those where you have to know how things work, but where how things work is quite simple (e.g. C, assembly languages, maybe FORTRAN and COBOL) and those which isolate you from how things work (all the rest?).

Where does C++ fit in to this scheme? It is in the unique position of being a language which has incredibly complex things going on under the covers, and you have to know about it!

What do I mean by saying you have to know what’s going on under the covers?

Let’s look at our example again, and ask this question: what types of object can you put in the array? In the other programming languages above, you can essentially put any “normal” objects (where normal is different for each example) into the array.

In C++, here are some of the rules you need to understand about objects you can put into std::vector. You should understand these before you try using std::vector. If you can’t understand them, you should think hard until you do.

Default constructor

If you want to give the size of the vector when you create it (or resize it later), your object must have a default constructor [Stroustrup §16.3.4].

(Note: if you don’t define any other constructors, the compiler will automatically define a default constructor for you (which may or may not do what you want). If you do, the default constructor is the one that can be called without any arguments [Stroustrup §10.4.2].

Example:

$ cat default_constructor_required.cpp
#include <vector>

class MyObject
{
public:
        MyObject( int num )
        {
        }
};

int main( int argc, const char* argv[] )
{
        std::vector<MyObject> arr( 5 );
}

$ g++ default_constructor_required.cpp
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:
In constructor ‘std::vector<_Tp, _Alloc>::vector(size_t) [with
_Tp = MyObject, _Alloc = std::allocator<MyObject>]’:
default_constructor_required.cpp:12:   instantiated from here
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:219:
error: no matching function for call to ‘MyObject::MyObject()’
default_constructor_required.cpp:5: note: candidates are: MyObject::MyObject(int)
default_constructor_required.cpp:4: note:                 MyObject::MyObject(const MyObject&)

Nice error message, eh?

(Note: if your array contains a built-in type (e.g. int) it will be initialised to 0 in its default constructor [Stroustrup §4.9.5].

If you don’t want to give the size of the vector when you create it (not recommended), then you don’t need a default constructor in your object [Stroustrup §16.3.4].

Copy constructor

Your object must also have a copy constructor. Example:

$ cat copy_constructor_required.cpp
#include <vector>

class MyObject
{
public:
        MyObject()
        {
        }
private:
        MyObject( const MyObject& );
};

int main( int argc, const char* argv[] )
{
        std::vector<MyObject> arr( 5 );
}

$ g++ copy_constructor_required.cpp
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h: In
constructor ‘std::vector<_Tp, _Alloc>::vector(size_t) [with _Tp = MyObject,
_Alloc = std::allocator<MyObject>]’:
copy_constructor_required.cpp:15:   instantiated from here
copy_constructor_required.cpp:10: error: ‘MyObject::MyObject(const MyObject&)’ is private
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:219:
error: within this context
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_construct.h:
In function ‘void std::_Construct(_T1*, const _T2&) [with _T1 = MyObject, _T2 = MyObject]’:
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_uninitialized.h:194:
   instantiated from ‘void std::__uninitialized_fill_n_aux(_ForwardIterator, _Size,
 const _Tp&, __false_type) [with _ForwardIterator = MyObject*, _Size = unsigned int, _Tp = MyObject]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_uninitialized.h:218:
   instantiated from ‘void std::uninitialized_fill_n(_ForwardIterator, _Size,
 const _Tp&) [with _ForwardIterator = MyObject*, _Size = unsigned int, _Tp = MyObject]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_uninitialized.h:310:
   instantiated from ‘void std::__uninitialized_fill_n_a(_ForwardIterator, _Size,
 const _Tp&, std::allocator<_Tp2>) [with _ForwardIterator = MyObject*, _Size
 = unsigned int, _Tp = MyObject, _Tp2 = MyObject]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:219:
   instantiated from ‘std::vector<_Tp, _Alloc>::vector(size_t) [with _Tp =
 MyObject, _Alloc = std::allocator<MyObject>]’
copy_constructor_required.cpp:15:   instantiated from here
copy_constructor_required.cpp:10: error: ‘MyObject::MyObject(const
 MyObject&)’ is private
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_construct.h:81:
 error: within this context

(Aside: this program is 181 bytes, and the error message is 1950 bytes.)

Assignment operator

You also need operator=. Example:

$ cat assignment_operator_required.cpp
#include <vector>

class MyObject
{
private:
        MyObject& operator=( const MyObject& );
};

int main( int argc, const char* argv[] )
{
        MyObject obj;
        std::vector<MyObject> arr;
        arr.push_back( obj );
}

$ g++ assignment_operator_required.cpp
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/vector.tcc:
In member function ‘void std::vector<_Tp,
_Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename _Alloc::pointer,
std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = MyObject, _Alloc =
std::allocator<MyObject>]’:
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:610:
instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with
_Tp = MyObject, _Alloc = std::allocator<MyObject>]’
assignment_operator_required.cpp:13: instantiated from here
assignment_operator_required.cpp:6: error: ‘MyObject&
MyObject::operator=(const MyObject&)’ is private
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/vector.tcc:260:
error: within this context
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_algobase.h:
In static member function ‘static _BI2 std::__copy_backward<_BoolType,
std::random_access_iterator_tag>::copy_b(_BI1, _BI1, _BI2) [with _BI1 =
MyObject*, _BI2 = MyObject*, bool _BoolType = false]’:
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_algobase.h:443:
instantiated from ‘_BI2 std::__copy_backward_aux(_BI1, _BI1, _BI2) [with _BI1
= MyObject*, _BI2 = MyObject*]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_algobase.h:482:
instantiated from ‘static _BI2 std::__copy_backward_normal<true,
true>::copy_b_n(_BI1, _BI1, _BI2) [with _BI1 =
__gnu_cxx::__normal_iterator<MyObject*, std::vector<MyObject,
std::allocator<MyObject> > >, _BI2 = __gnu_cxx::__normal_iterator<MyObject*,
std::vector<MyObject, std::allocator<MyObject> > >]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_algobase.h:517:
instantiated from ‘_BI2 std::copy_backward(_BI1, _BI1, _BI2) [with _BI1 =
__gnu_cxx::__normal_iterator<MyObject*, std::vector<MyObject,
std::allocator<MyObject> > >, _BI2 = __gnu_cxx::__normal_iterator<MyObject*,
std::vector<MyObject, std::allocator<MyObject> > >]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/vector.tcc:257:
instantiated from ‘void std::vector<_Tp,
_Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename _Alloc::pointer,
std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = MyObject, _Alloc =
std::allocator<MyObject>]’
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_vector.h:610:
instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with
_Tp = MyObject, _Alloc = std::allocator<MyObject>]’
assignment_operator_required.cpp:13: instantiated from here
assignment_operator_required.cpp:6: error: ‘MyObject&
MyObject::operator=(const MyObject&)’ is private
/usr/lib/gcc/i486-linux-gnu/4.0.3/../../../../include/c++/4.0.3/bits/stl_algobase.h:412:
error: within this context

(Aside: program is 202 bytes, error is 2837 bytes.)

Excuses

Don’t give me that “the compiler will provide them for you” excuse. What the compiler provides is often wrong, unless you’ve been careful to ensure you don’t own any members by holding pointers to them: i.e. if you’ve fully understood the problem I am setting out and avoided the pitfalls.

Conclusion

I assert that C++ is an expert language. Quite apart from the fact that the method names on STL objects use archane phrases like “push_back” rather than “add”, and the error messages you get from popular compilers are huge and almost incomprehensible, my main point is that you have to understand the basics of how the standard library is implemented, before you can use it. This is expert behaviour.

I have illustrated this point by showing what you need to know to use the standard resizeable array type in C++. You need to know a lot.

More on whether the fact that C++ is an expert language is a bad thing, later.

Update: simplified the C example thanks to Edmund’s suggestion.

Update 2: corrected the Java example thanks to Anon’s comment.

Update 3: corrected the Haskell example thanks to Neil Mitchell’s comment.