Dry run mode for Ant (ant -n, ant –dry-run)

I am working on the problem of writing Ant build files in a test-driven way. One thing I found myself needing was a “dry run” mode, like many Unix tools have. For example, make has the -n or –dry-run option, which shows what it would have done, but doesn’t really do it.

Today I found a partial solution to this problem, so that you can at least see which dependencies will be run when you run a particular ant target.

It’s an horrific hack, but it’s the best I can do at the moment.

We write some code in a <script> tag to hack all the targets in our project (at runtime). We modify the targets so they all have an “unless” attribute, set to a property name of “DRY-RUN”. Then we set the “DRY-RUN” property, and execute our target.

Ant prints out the names of all the targets in the dependency chain, even if they are not executed because of an unless attribute.

Note: this code makes use of the Ant <script> script tag, which is an Ant 1.8+ feature. Using JavaScript inside this tag seems to be supported in Oracle, OpenJDK and IBM versions of Java, but is not guaranteed.

<?xml version="1.0" encoding="UTF-8"?>
<project default="build">

    <target name="targetA"/>
    <target name="targetB" depends="targetA">
        <echo message="DON'T RUN ME"/>
    </target>
    <target name="targetC" depends="targetB"/>

    <target name="build" depends="targetB"/>

    <target name="dry-run">
        <do-dry-run target="build"/>
    </target>

    <macrodef name="do-dry-run">
        <attribute name="target"/>
        <sequential>
            <script language="javascript"><![CDATA[

                var targs = project.getTargets().elements();
                while( targs.hasMoreElements() ) {
                    var targ = targs.nextElement();
                    targ.setUnless( "DRY.RUN" );
                }
                project.setProperty( "DRY.RUN", "1" );
                project.executeTarget( "@{target}" );

            ]]></script>
        </sequential>
    </macrodef>

</project>

Running this build file normally, the tasks in the targets execute, so we can see that the <echo> happens:

$ ant
Buildfile: build.xml

targetA:

targetB:
     [echo] DON'T RUN ME

build:

BUILD SUCCESSFUL
Total time: 0 seconds

But when we run the dry-run target, only the target names are printed, and the <echo> task (and any other tasks) don’t:

$ ant dry-run
Buildfile: build.xml

dry-run:

targetA:

targetB:

build:

BUILD SUCCESSFUL
Total time: 0 seconds

A lot of pain, for a partial implementation of very simple functionality that you’d expect to be a built-in feature? I couldn’t possibly comment.

Running Dojo DOH tests in a browser without a web server

Dojo’s DOH requires a web server to run tests in a browser. But never fear:

$ cd ~/code/dojo
$ ls
docs  dojo  util
$ python -m SimpleHTTPServer &
$ xdg-open http://localhost:8000/util/doh/runner.html

Note that you will see some test failures, because the python web server doesn’t do PHP.

When finished:

$ kill %1

to stop your web server.

On Python 3, use this instead of the SimpleHTTPServer line:

python3 -m http.server &

Yes, Python includes a little web server that serves files in your current directory. Batteries included. Thanks to Lyle Backenroth and commandlinefu for making me aware of this.

Running Dojo 1.7+ DOH unit tests on the command line with Rhino

To run your own DOH-based unit tests on the command line using Rhino:

NOTE: this is Dojo 1.7 and above. For 1.6, there was a whole other cryptic incantation.

Project layout

Imagine your code is somewhere different from dojo, and another library you use is somewhere else:

C:/code/mycode/org/me/mytests/
                             ...
                             mytestmodule.js
                             ...
C:/code/mycode/org/them/nicelib/
                             ...
C:/libs/dojo/dojo/
                 ...
                 dojo.js
                 ...
             dijit/
                 ...
             dojox/
                 ...
             util/doh/
                     ...
                     main.js
                     ...

Config file

Yes, you need a config file. Imagine it’s at C:/code/mycode/dohconfig.js and it looks like this:

require({
    paths: {
        "org/me" : "../../../code/mycode/org/me",
        "org/them" : "../../../code/mycode/org/them/nicelib"
    }
});

Command line

Now you can run your tests like this:

java -jar C:/libs/dojo/util/shrinksafe/js.jar C:/libs/dojo/dojo/dojo.js baseUrl=file:///C:/libs/dojo/dojo load=file:///C:/code/mycode/dohconfig.js load=doh test=org/me/mytests/mytestmodule

Explanation

  • java -jar – run Java and execute a JAR file.
  • C:/libs/dojo/util/shrinksafe/js.jar – path to the JAR file that is Rhino, a JavaScript interpreter written in Java (and included in Dojo’s source distribution).
  • C:/libs/dojo/dojo/dojo.js – the Dojo “loader” – unlike in 1.6 and earlier, you don’t run DOH’s runner.js. Instead you run dojo.js and pass “load=doh” as an argument.
  • baseUrl=file:///C:/libs/dojo/dojo – a URL form of the location of the directory that contains dojo.js.
  • load=file:///C:/code/mycode/dohconfig.js – the location of your config file, which defines the “paths” variable, previously (in 1.6) known as registerModulePaths. This variable helps Dojo find your code based on its module name (here “org/me”).
  • load=doh – after you’ve read (actually, executed) the config file, execute DOH.
  • test=org/me/mytests/mytestmodule – the module name of your test (not the path – a module name which can be found using the lookups in the paths defined in your config file).

Anatomy of an interpreter: the Evaluator

Posts in this series: Lexer, Parser, Evaluator

I’m still really enjoying writing my Scheme interpreter Subs, which can now succesfully run all the example code from SICP up to section 2.3.4. I’ve made the changes I mentioned I would in the Lexer article, so now the Lexer returns Tokens that contain information about their basic types, and I’ve gone through a significant refactoring to replace one of the several massive switch statements with a virtual function call (Martin Fowler would be proud).

Last time I explained how the Parser takes the stream of tokens coming from the Lexer and returns a hierarchical tree of Values, each of which represents an operation or thing in the program.

The Evaluator takes in a tree of Values, “evaluates” it, and returns another Value object, which is the answer. The Evaluator class is by far the most complex part of Subs, so in this post we’ll start with an overview of how it works. Future posts will break down the different parts in more detail.

The most interesting parts of the Evaluator class interface look like this:

class Evaluator
{
public:
    std::auto_ptr<Value> EvalInContext( const Value* value,
        boost::shared_ptr<Environment>& environment );
};

The EvalInContext method takes in a Value to evaluate, and an “environment” *in which to evaluate it. Note that the in the real code it takes a couple more arguments, including a mysterious and annoying boolean called is_tail_call which will be explained later.

* More on environments later. All you need to know for now is that they provide a way of keeping hold of all the things we currently know about, identified by name.

A very simplified version of EvalInContext would look like this:

std::auto_ptr<Value> Evaluator::EvalInContext( const Value* value,
    boost::shared_ptr<Environment>& environment )
{
    if( is_symbol( value ) )
    {
        return eval_symbol( value, environment );
    }

    if( !is_combination( value ) )
    {
        return auto_ptr<Value>( value->Clone() );
    }

    const CombinationValue* combo = to_combination( value );
    CombinationValue::const_iterator it = combo->begin();

    auto_ptr<Value> evaldoptr = EvalInContext( *it, environment );

    if( special_symbol( evaldoptr ) )
    {
        return process_special_symbol( evaldoptr, combo );
    }
    else
    {
        ++it;

        CombinationValue argvalues;
        for( ; it != combo->end(); ++it )
        {
            argvalues.push_back( EvalInContext( *it, environment ).release() );
        }

        return run_procedure( evaldoptr.get(), &argvalues, *this, environment );
    }
}

If the Value to be evaluated is just a symbol, we call eval_symbol which basically looks up the symbol’s name in the environment and returns the value it finds.

If the Value is not a combination (i.e. the root of a tree of other values) it must be a basic type such as a string or an integer. In this case we simple copy the Value and return it.

Otherwise, it’s a combination. To evaluate a combination, we follow the “eval-apply” pattern. The principle is to evaluate all the Values in the combination separately, and then “apply” (run) the first value (the “operator”) as a procedure, using the other values as arguments. The first value must evaluate to something that is recognisable as a procedure, or this doesn’t make sense and we will throw an error.

In practice it’s a tiny bit more complicated. We evaluate the first Value in the combination (by calling EvalInContext recursively), then we check whether it’s a special symbol such as if or let and if so, deal with it separately. Otherwise, we evaluate all the other Values (calling EvalInContext recursively again) and put them into a new CombinationValue, and pass the operator and the arguments to run_procedure, which looks something like this:

std::auto_ptr<Value> run_procedure( const Value* operator,
    const CombinationValue* args, Evaluator& ev,
    boost::shared_ptr<Environment>& environment )
{
    if( is_builtin_procedure( operator ) )
    {
        return handle_builtin_procedure( operator, args, environment );
    }
    else
    {
        std::auto_ptr<Value> ret;

        const CompoundProcedureValue* proc = to_compound_procedure( operator );

        boost::shared_ptr<Environment> new_env =
            proc->ExtendEnvironmentWithArgs( args );

        for( CombinationValue::const_iterator it = proc->GetBody()->begin();
            it != proc->GetBody()->end(); ++it )
        {
            ret = ev.EvalInContext( *it, new_env );
        }

        return ret;
    }
}

Running a procedure means either doing something built-in (for example adding up two numbers and returning the result) or evaluating some other code, which comes from the definition of the procedure being run. First we call ExtendEnvironmentWithArgs to create a new Environment, which contains the argument Values that were supplied, and then we loop through all the sections of the body of the procedure, evaluating each one. Note that we throw away the returned Values for all sections except the last one (this is how Scheme works).

Once we’ve evaluated and applied our procedure, which of course potentially includes numerous recursive calls to EvalInContext, we end up with a Value that is returned, and we’re done.

Simple eh?

But now I must make a confession: almost everything I have written above is lie. Why would I lie to you? Because I missed out something wonderful and strange called “tail-call optimisation”. I’ll explain next time.

Anatomy of an interpreter: the Parser

Posts in this series: Lexer, Parser, Evaluator

Subs has reached version 1.3.4, which means that it can successfully run all the tests from chapter 1 of SICP. This is very exciting.

Last time I explained a bit about the Lexer, which takes in a stream of characters and emits a stream of tokens: individual elements of code such as a bracket, a keyword or a symbol.

Generally, parsers emit some kind of tree structure – they understand the raw tokens as a hierarchical structure which (conceptually, at least) will be executed from the bottom up, with each branch-point in the tree being an operation of some kind.

Our parser takes in a stream of tokens, and emits a stream of parsed trees.

Parsing Scheme is very easy, because (except for a couple of exceptions I haven’t implemented yet) there is essentially one rule: start with an open bracket, see a list of things, and then find a close bracket. Of course, one of the “things” you see may itself be another bracketted list, so after parsing you get a tree structure of nested lists.

The parser in Subs looks like this:

class Parser
{
public:
    Parser( ILexer& lexer );
    std::auto_ptr<Value> NextValue();
private:
    ILexer& lexer_;
};

We supply a Lexer in the constructor, which we know will provide us with tokens when we need them via its NextToken() method. The Parser’s NextValue() method returns a pointer to a Value, which is the base class for all the “things” in the Subs interpreter.

There are lots of types of things that inherit from the Value class, but the “parse tree” (the output of the parser) will only consist of a very small subset of them:

  • CombinationValue
  • DecimalValue
  • IntegerValue
  • StringValue
  • SymbolValue

The CombinationValue class forms the tree structure. Its declaration looks like this:

class CombinationValue : public Value, public std::vector<Value*>
{
    // ...
};

It is simply a list of other Values.

Note that it “owns” those Values in the sense that it deletes them when it is deleted. I have recently made the jump to make Subs depend on BOOST, so it’s on my TODO list to make containers like this use the BOOST smart containers to manage this job for me.

DecimalValue, IntegerValue and StringValue are relatively self-explanatory: they contain numbers and strings that were found as literals in the source code.

SymbolValue is essentially everything else – if the code that recognises the type of a token can’t recognise it as a bracket, a number or a string, we assume it is a symbol, and tuck it away in a SymbolValue to be understood later.

The core of the Parser looks like this (with some error-checking removed):

std::auto_ptr<Value> next_value( ILexer& lexer, Token token )
{
    if( token.Name() == "(" )
    {
        auto_ptr<CombinationValue> ret( new CombinationValue );
        while( true )
        {
            token = lexer.NextToken();
            if( token.Name() == ")" )
            {
                break;
            }
            // Recursive call
            ret->push_back( next_value( lexer, token ).release() );
        }
        return auto_ptr<Value>( ret.release() );
    }
    else
    {
        return ValueFactory::CreateValue( token );
    }
}

(Full code here: Parser.cpp) It’s a simple recursive function that creates a CombinationValue whenever it finds a bracket, and otherwise uses a ValueFactory to create an individual value.

Side note: the wisdom of using recursion could certainly be questioned, since it limits the depth of bracketting we can handle to the size of the C++ stack, but the only other way to get the same result would be to keep our own manual stack of unfinished combinations, and it just seems perverse to re-implement language features like that. What might well be more interesting would be to consider whether we can actually evaluate parts of the tree as we go, without parsing it all at once. This might make the whole setup scale rather better, but would most likely be quite complex. The implementation presented here will work fine for almost any imaginable program – remember we would need not just code whose execution is deeply nested, but whose expression in code had thousands of levels of nesting before the parser would fail.

The ValueFactory uses some basic rules such as “starts and ends with a quote” or “consists of only numbers and a decimal point” to recognise what type of Value to create, and if no rules match it defaults to a SymbolValue.

When we have completed a bracketted expression, we return a complete tree of numbers, strings and symbols, and it is ready to be evaluated, which you can think of as simply expanding the tree we already have into the full expression of itself, and then reducing it back down again to an answer.

Next time, the Evaluator and the famous eval-apply loop.