Archive for the ‘Test Driven’ Category

Anatomy of an interpreter: the Evaluator

Saturday, February 12th, 2011

Posts in this series: Lexer, Parser, Evaluator

I’m still really enjoying writing my Scheme interpreter Subs, which can now succesfully run all the example code from SICP up to section 2.3.4. I’ve made the changes I mentioned I would in the Lexer article, so now the Lexer returns Tokens that contain information about their basic types, and I’ve gone through a significant refactoring to replace one of the several massive switch statements with a virtual function call (Martin Fowler would be proud).

Last time I explained how the Parser takes the stream of tokens coming from the Lexer and returns a hierarchical tree of Values, each of which represents an operation or thing in the program.

The Evaluator takes in a tree of Values, “evaluates” it, and returns another Value object, which is the answer. The Evaluator class is by far the most complex part of Subs, so in this post we’ll start with an overview of how it works. Future posts will break down the different parts in more detail.

The most interesting parts of the Evaluator class interface look like this:

class Evaluator
{
public:
    std::auto_ptr<Value> EvalInContext( const Value* value,
        boost::shared_ptr<Environment>& environment );
};

The EvalInContext method takes in a Value to evaluate, and an “environment” *in which to evaluate it. Note that the in the real code it takes a couple more arguments, including a mysterious and annoying boolean called is_tail_call which will be explained later.

* More on environments later. All you need to know for now is that they provide a way of keeping hold of all the things we currently know about, identified by name.

A very simplified version of EvalInContext would look like this:

std::auto_ptr<Value> Evaluator::EvalInContext( const Value* value,
    boost::shared_ptr<Environment>& environment )
{
    if( is_symbol( value ) )
    {
        return eval_symbol( value, environment );
    }

    if( !is_combination( value ) )
    {
        return auto_ptr<Value>( value->Clone() );
    }

    const CombinationValue* combo = to_combination( value );
    CombinationValue::const_iterator it = combo->begin();

    auto_ptr<Value> evaldoptr = EvalInContext( *it, environment );

    if( special_symbol( evaldoptr ) )
    {
        return process_special_symbol( evaldoptr, combo );
    }
    else
    {
        ++it;

        CombinationValue argvalues;
        for( ; it != combo->end(); ++it )
        {
            argvalues.push_back( EvalInContext( *it, environment ).release() );
        }

        return run_procedure( evaldoptr.get(), &argvalues, *this, environment );
    }
}

If the Value to be evaluated is just a symbol, we call eval_symbol which basically looks up the symbol’s name in the environment and returns the value it finds.

If the Value is not a combination (i.e. the root of a tree of other values) it must be a basic type such as a string or an integer. In this case we simple copy the Value and return it.

Otherwise, it’s a combination. To evaluate a combination, we follow the “eval-apply” pattern. The principle is to evaluate all the Values in the combination separately, and then “apply” (run) the first value (the “operator”) as a procedure, using the other values as arguments. The first value must evaluate to something that is recognisable as a procedure, or this doesn’t make sense and we will throw an error.

In practice it’s a tiny bit more complicated. We evaluate the first Value in the combination (by calling EvalInContext recursively), then we check whether it’s a special symbol such as if or let and if so, deal with it separately. Otherwise, we evaluate all the other Values (calling EvalInContext recursively again) and put them into a new CombinationValue, and pass the operator and the arguments to run_procedure, which looks something like this:

std::auto_ptr<Value> run_procedure( const Value* operator,
    const CombinationValue* args, Evaluator& ev,
    boost::shared_ptr<Environment>& environment )
{
    if( is_builtin_procedure( operator ) )
    {
        return handle_builtin_procedure( operator, args, environment );
    }
    else
    {
        std::auto_ptr<Value> ret;

        const CompoundProcedureValue* proc = to_compound_procedure( operator );

        boost::shared_ptr<Environment> new_env =
            proc->ExtendEnvironmentWithArgs( args );

        for( CombinationValue::const_iterator it = proc->GetBody()->begin();
            it != proc->GetBody()->end(); ++it )
        {
            ret = ev.EvalInContext( *it, new_env );
        }

        return ret;
    }
}

Running a procedure means either doing something built-in (for example adding up two numbers and returning the result) or evaluating some other code, which comes from the definition of the procedure being run. First we call ExtendEnvironmentWithArgs to create a new Environment, which contains the argument Values that were supplied, and then we loop through all the sections of the body of the procedure, evaluating each one. Note that we throw away the returned Values for all sections except the last one (this is how Scheme works).

Once we’ve evaluated and applied our procedure, which of course potentially includes numerous recursive calls to EvalInContext, we end up with a Value that is returned, and we’re done.

Simple eh?

But now I must make a confession: almost everything I have written above is lie. Why would I lie to you? Because I missed out something wonderful and strange called “tail-call optimisation”. I’ll explain next time.

Anatomy of an interpreter: the Parser

Thursday, September 30th, 2010

Posts in this series: Lexer, Parser, Evaluator

Subs has reached version 1.3.4, which means that it can successfully run all the tests from chapter 1 of SICP. This is very exciting.

Last time I explained a bit about the Lexer, which takes in a stream of characters and emits a stream of tokens: individual elements of code such as a bracket, a keyword or a symbol.

Generally, parsers emit some kind of tree structure – they understand the raw tokens as a hierarchical structure which (conceptually, at least) will be executed from the bottom up, with each branch-point in the tree being an operation of some kind.

Our parser takes in a stream of tokens, and emits a stream of parsed trees.

Parsing Scheme is very easy, because (except for a couple of exceptions I haven’t implemented yet) there is essentially one rule: start with an open bracket, see a list of things, and then find a close bracket. Of course, one of the “things” you see may itself be another bracketted list, so after parsing you get a tree structure of nested lists.

The parser in Subs looks like this:

class Parser
{
public:
    Parser( ILexer& lexer );
    std::auto_ptr<Value> NextValue();
private:
    ILexer& lexer_;
};

We supply a Lexer in the constructor, which we know will provide us with tokens when we need them via its NextToken() method. The Parser’s NextValue() method returns a pointer to a Value, which is the base class for all the “things” in the Subs interpreter.

There are lots of types of things that inherit from the Value class, but the “parse tree” (the output of the parser) will only consist of a very small subset of them:

  • CombinationValue
  • DecimalValue
  • IntegerValue
  • StringValue
  • SymbolValue

The CombinationValue class forms the tree structure. Its declaration looks like this:

class CombinationValue : public Value, public std::vector<Value*>
{
    // ...
};

It is simply a list of other Values.

Note that it “owns” those Values in the sense that it deletes them when it is deleted. I have recently made the jump to make Subs depend on BOOST, so it’s on my TODO list to make containers like this use the BOOST smart containers to manage this job for me.

DecimalValue, IntegerValue and StringValue are relatively self-explanatory: they contain numbers and strings that were found as literals in the source code.

SymbolValue is essentially everything else – if the code that recognises the type of a token can’t recognise it as a bracket, a number or a string, we assume it is a symbol, and tuck it away in a SymbolValue to be understood later.

The core of the Parser looks like this (with some error-checking removed):

std::auto_ptr<Value> next_value( ILexer& lexer, Token token )
{
    if( token.Name() == "(" )
    {
        auto_ptr<CombinationValue> ret( new CombinationValue );
        while( true )
        {
            token = lexer.NextToken();
            if( token.Name() == ")" )
            {
                break;
            }
            // Recursive call
            ret->push_back( next_value( lexer, token ).release() );
        }
        return auto_ptr<Value>( ret.release() );
    }
    else
    {
        return ValueFactory::CreateValue( token );
    }
}

(Full code here: Parser.cpp) It’s a simple recursive function that creates a CombinationValue whenever it finds a bracket, and otherwise uses a ValueFactory to create an individual value.

Side note: the wisdom of using recursion could certainly be questioned, since it limits the depth of bracketting we can handle to the size of the C++ stack, but the only other way to get the same result would be to keep our own manual stack of unfinished combinations, and it just seems perverse to re-implement language features like that. What might well be more interesting would be to consider whether we can actually evaluate parts of the tree as we go, without parsing it all at once. This might make the whole setup scale rather better, but would most likely be quite complex. The implementation presented here will work fine for almost any imaginable program – remember we would need not just code whose execution is deeply nested, but whose expression in code had thousands of levels of nesting before the parser would fail.

The ValueFactory uses some basic rules such as “starts and ends with a quote” or “consists of only numbers and a decimal point” to recognise what type of Value to create, and if no rules match it defaults to a SymbolValue.

When we have completed a bracketted expression, we return a complete tree of numbers, strings and symbols, and it is ready to be evaluated, which you can think of as simply expanding the tree we already have into the full expression of itself, and then reducing it back down again to an answer.

Next time, the Evaluator and the famous eval-apply loop.

NNDB’s Not a Database

Saturday, October 10th, 2009

My latest project is called NNDB.

I’ve worked with databases for quite a long time now, and for a while I’ve been thinking about how they work under the hood. I know very little about it, but I thought I could learn a bit by trying to implement something similar myself.

I’m interested in how queries work against joined tables, how to implement indices and so on.

I’ve also been feeling that I want to do some C++ as an open source project. I do it all day at work, and for some problems it feels like the right tool for the job.

NNDB is sort-of like an in-memory database, but it works with C++ types for its columns, instead of a fixed set like varchar, int etc. You can put your own value-typed classes in the columns, and all values are type-checked at compile time.

It’s always struck me as strange that with a traditional code+SQL setup you have to keep your SQL in sync with your code manually. Of course, there are lots of trendy Object-Relational-Mapping thingies that solve that problem, but I felt it could be approached from another direction: instead of generating code to match your data, or generating SQL to match your code, why not specify your data structure in code?

In NNDB you define a table something like this:

typedef nndb::Values< unsigned long, std::string, std::string, MyDate >
    PersonValues;

class PersonTable : public nndb::Table
{
public:
    enum Columns
    {
        id,
        first_name,
        last_name,
        date_of_birth
    };
};

Actually, defining your own class is unnecessary, but it’s nice to have an enum to name your columns, and making a class gives you a nice place to put it.

To insert a row you do something like this:

PersonTable person_table;
person_table.Insert( PersonValues( 0,
    "Andy", "Balaam", MyDate( 12000000 ) ) );

You can do simple queries with WHERE and ORDER BY clauses, and I’m working on indexes.

After that will come JOINs, and anything else that takes my fancy.

I don’t anticipate NNDB being useful to anyone – it’s really for me to understand why things are as they are in the world of databases. However, you never know – it may turn out to be a fast and convenient way to store data in the C++ world. I think some of the applications that use databases don’t really need the kind of concurrent multi-user network-accessible features they have, but really just want to search, join and store reliably, and NNDB might one day grow into something that can find a niche.

To explore more, check out the complete example.

Talk in code

Thursday, March 5th, 2009

Last week we had an extended discussion at work about how we were going to implement a specific feature.

This discussion hijacked our entire Scrum sprint planning meeting (yes, I know, we should have time-boxed it). It was painful, but the guy who was going to implement it (yes, I know, we should all collectively own our tasks) needed the discussion: otherwise it wasn’t going to get implemented. It certainly wasn’t going to get broken into short tasks until we knew how we were going to do it.

Anyway, asides aside, I came out of that discussion bruised but triumphant. We had a plan not only on how to write the code, but also how to test it. I believe the key thing that slowly led the discussion from a FUD-throwing contest into a constructive dialogue was the fact that we began to talk in code.

There are two facets to this principle:

1. Show me the code

As Linus once said, “Talk is cheap. Show me the code.“.

If you are at all disagreeing about how what you’re doing will work, open up the source files in question. Write example code – modify the existing methods or sketch a new one. Outline the classes you will need. Code is inherently unambiguous. White board diagrams and hand-waving are not.

Why wouldn’t you do this? Fear you might be wrong? Perhaps you should have phrased your argument a little less strongly?

Is this slower than drawing boxes on a whiteboard? Not if you include time spent resolving the confusion caused by the ambiguities inherent in line drawings.

Does UML make whiteboards less ambiguous? Yes, if all your developers can be bothered to learn it. But why learn a new language when you can communicate using the language you all speak all day – code?

2. Create a formal language to describe the problem

If your problem is sufficiently complex, you may want to codify the problem into a formal (text-based) language.

In last week’s discussion we were constantly bouncing back and forth between different corner cases until we started writing them down in a formal language.

The language I chose was an adaptation of a Domain-specific language I wrote to test a different part of our program. I would love to turn the cases we wrote down in that meeting into real tests that run after every build (in fact I am working on it) but their immediate value was to turn very confusing “what-if”s into concrete cases we could discuss.

Before we started using the formal language, the conversations went something like this:

Developer: “If we implement it like that, this bad thing will happen.”

Manager: “That’s fine – it’s a corner case that we can tidy up later if we need it.”

Developer: (Muttering) “He clearly doesn’t understand what I mean.”

Repeat

After we started using the formal language they went something like this:

Developer: “If we implement it like that, this bad thing will happen.”

Me: “Write it down, I tell you.”

Developer: (Typing) “See, this will happen!”

Manager: “That’s fine – it’s a corner case that we can tidy up later if we need it.”

Developer: (Muttering) “Flipping managers.”

Summary

The conversation progresses if all parties believe the others understand what they are saying. It is not disagreement that paralyses conversations – it is misunderstanding.

To avoid misunderstanding, talk in code – preferably a real programming language, but if that’s too verbose, a text-based code that is unambiguous and understood by everyone involved.

Note on whiteboards

You can’t copy and paste them, and you can’t (easily) keep what you did with them, and you can’t use them to communicate over long distances.

And don’t even try and suggest an electronic whiteboard. In a few years they may solve all of the above problems, but not now. They fail the “can I draw stuff?” test at the moment.

Even when electronic whiteboards solve those problems, they won’t solve the fact that lines and boxes are more ambiguous and less detailed than code in text form.

If you all know and like UML, that makes your diagrams less ambiguous, but still they often don’t allow enough detail: why bother?

An actual difficult bug fixed

Wednesday, October 8th, 2008

Of course, I am bound to get a bug report immediately I have posted this telling me my fix breaks everything, but for the moment I am chuffed that I found, tested, and fixed a genuinely difficult bug.

I am particularly proud because I wrote an automated test to ensure it can never happen again, and I used that test to make the debugging process much easier than it otherwise would be. The code that reads, processes and stores listings in FreeGuide is a spider’s web of interfaces and helper classes (because of the arguably over-engineered plugins framework used for all the moving parts), and tracking this down with plain old-fashioned debugging would have been a huge job.

Anyway, I bet you are dying to hear what the bug was, aren’t you?

When you have a programme already stored in FreeGuide, and then a new one comes along that overlaps it, the old programme is deleted. For example if we start off with:

... 19:00 Top Gear ................... 20:00 Charlie and Lola .............

but then later download listings again and get these programmes:

... 19:00 Pocoyo .. 19:15 Round the World ...... 

Then Top Gear will disappear, and be replaced by the 2 new programmes. In fact, any old programme that overlaps any new incoming programme will be automatically deleted.

At least, that is what is supposed to happen. In fact, the real situation is a little more complex because the programmes are stored in separate files (.ser files) for different days and times. Actually, there are 4 files for each day, named things like “day-2008-09-15-A.ser”, where the suffix A, B, C and D indicate which part of each day a file is for.

So imagine what happens when the first set of programmes comes in looking like this:

19:00 Programme 1A ......... 21:15 Programme 1B .. 21:30 Programme 1C ........... 22:00

and then the second comes in like this:

19:00 Programme 2A......................................... 21:45 Programme 2C .. 22:00

So obviously the old 3 programmes should be completey deleted, and the new 2 should be what you see.

But you don’t. In fact what you see is programme 1B and programme 2C, before 1B and between the two. Weird huh?

“Why?” I hear you ask. Well, it’s simple when you consider how the programmes are split into files.

Programme 1A goes into file day-2008-09-14-D.ser, and programmes 1B and 1C go into day-2008-09-15-A.ser.

[Side note: this is true in this case because the bug reporter is in the GMT -0400 timezone and the file boundaries are quarters of a day in GMT.]

Then, when the new programmes come along, 2A goes into 14-D – wiping out 1A, and 2C goes into 15-A – wiping out 1C but not 1B.

Then, when the files get read back in again later, 2A is read from 14-D, but then 1B is read from 15-A, wiping out 2A, and finally 2C is read in as well from 15-A, so we end up with 1B and 2C.

How to fix it? Well, what I did was leave everything as it is, and then do the final read in the reverse order. This means we read in 1B and 2C, but then we read 2A later, and it wipes out 1B, leaving 2A and 2C as we would expect.

Neat fix eh? It works because this kind of wrongness in the .ser files will only exist when a programme hanging off the end should have wiped out something in a later file. Because programmes are classified into files by their start time, they can only hang off the end of a file, not the beginning, so reading the files in backwards will always read the hanging-over file last, wiping out anything which should have been wiped out earlier.

There is a little bug/feature remaining, but it only applies when you get some really weird listings from your provider. If you had a programme like 1A (19:00 – 21:15), and downloaded new listings, which ONLY contained a programme overlapping it, but falling into a later file (so maybe it starts at 21:00), and didn’t contain any programme starting at 19:00, then the backwards reading would mean you would never see your new programme because it would be wiped out by 1A.

This is a very unusual case though, since normally if you get a new programme at 21:00, you will also get new programmes leading up to it, if only to reflect the fact that 1A is now a different length. So this is really a theoretical bug, which explains why I’ve decided not to fix it…

Anyway, by the time I’d fiddled with my test for this to get the bug to trigger (which took a long time – working out which bits to fake out and which to test at all was tricky), the actual fix was easily implemented (1 line of code I chose to break out into 3), and then validated in a single click.

Just in case I hadn’t mentioned it, I love tests.

FreeGuide 0.10.8

Monday, July 28th, 2008

I am still working slowly on moving FreeGuide forward. Somehow it seems my itches for FreeGuide are all about making it less annoying for people who are trying it the first time. I guess this is motivated by my desire for world domination.

Anyway, we are one small step closer to my mum being able to use FreeGuide – when the “Choose channels” step (i.e. the XMLTV grabber configuration) goes wrong, you can now see a real genuine error message, and hopefully figure out what went wrong.

Actually, it always used to work that way but the error-catching got refactored away at some point. Anyway, I am slowly taking the ground back…

As I do more and more test-driven development at work I am becoming completely addicted. For this FreeGuide code I wrote a couple of unit tests but they are not within a proper framework, and can’t be launched easily as a test suite. I am considering JUnit.

I also want to set up some component-level tests e.g. for downloading listings for each country and checking everything works as expected. It’s brilliant fun having tests in place, but when you have as little time as I have for FreeGuide at the moment, it’s difficult to decide to spend a long time working on a test framework when I could be fixing a “real user problem” or adding a cool new feature.

But I’ve got the testing bug badly, so watch this space.

Templated test code?

Wednesday, March 19th, 2008

At work at the moment, as part of an initiative to get with the 21st century, we are waking up to testing our code.

Thus, I am writing a lot of unit tests for old code, which can be soul-destroyingly repetitive and very pointless-feeling (even though really I do see a great value in the end result – tested code is refactorable code).

Often, tests have a lot in common with each other, so it feels right to reduce code repetition, and factor things into functions etc. The Right Way of doing this is to leave your tests as straightforward as possible, with preferably no code branches at all, just declarative statements.

Contemplating writing unit tests for the same method on 20+ very similar classes, using a template function “feels” right, for normal code values of “feel”. However, for test code, maybe it’s wrong?

My question is: is it ok to write a test function like this?:

void test_all_thingies()
{
    test_One_Thingy<Thingy1>();
    test_One_Thingy<Thingy2>();
    test_One_Thingy<Thingy3>();
    test_One_Thingy<Thingy4>();
}

template< class T >
void test_One_Thingy()
{
    T thingy;
    thingy.doSomething();
    TEST_ASSERT( thingy.isSomething() );
}

Worse still, is this ok?

void test_all_thingies()
{
    test_One_Thingy<Thingy1>( "Thingy1 expected output" );
    test_One_Thingy<Thingy2>( "Thingy2 expected output" );
    test_One_Thingy<Thingy3>( "Thingy3 expected output" );
    test_One_Thingy<Thingy4>( "Thingy4 expected output" );
}

template< class T >
void test_One_Thingy( std::string expected_output )
{
    T thingy;
    thingy.doSomething();
    TEST_ASSERT( thingy.getOutput() == expected_output );
}

Reasons for: otherwise I’m going to be writing huge amounts of copy-pasted code (unless someone can suggest a better way?).

Reasons against: how clear is it going to be which class failed the test when it fails?

Update: fixed unescaped diagonal brackets.