Tech – Page 59 – Andy Balaam's Blog

Analog literals

I love this: C++ Multi-Dimensional Analog Literals.

I quote:

Have you ever felt that integer literals like “4” don’t convey the true size of the value they denote? If so, use an analog integer literal instead:
unsigned int b = I---------I;

It goes on to explain that you can use 2- and 3-dimensional “analog literals”. Genius. Read the article. Try to read the code :)

Isn’t C++ … erm … powerful?

You’ll notice that there are 9 dashes used to denote 4. This is because the trick it is using uses operator--. I’m sure the original author did this in his/her sleep and thought it was too trivial to post (or posted it before?) but I thought: if we can use operator! instead, can’t we create analog literals that use the same number of symbols as the number we want?

The answer is yes, and it’s pretty simple:

notliterals.h:

class NotLiteral
{
public:
	NotLiteral( unsigned int ival )
	: val_( ival )
	{
	}

	NotLiteral operator!() const
	{
		return NotLiteral( val_ + 1 );
	}

	operator unsigned int() const
	{
		return val_;
	}

	unsigned int val_;
};


const NotLiteral NL( 0 );

test_notliterals.cpp:

#include "notliterals.h"
#include <cassert>

int main()
{
	assert( !!!!NL == 4 );
	assert( !!NL == 2 );

	assert( !!!!!!!!!!!!!!!NL == 15 );
}

With this simpler form, it’s almost believable that there might be some kind of useful application?

Extending this to 3 dimensions is left as an exercise for the reader. For 2 dimensions, if you just want the area (not the width and height), how about this?:

	assert( !!!
	        !!!
	        !!!NL == 9 );

Update: By the way, if you don’t like all the emphasis! of! using! exclamation! marks! you can do the same thing with the unary one’s complement operator, ~. Just replace “!” everywhere above with “~” and you’re done. Unfortunately, you can’t do the same with – or + because the parser recognises “–” as the decrement operator instead of seeing that it is clearly two calls to the unary negation operator.

Separate regular expressions, or one more complex one?

I have asked myself this question several times, so I thought it was about time I did a test and found an answer.

If the user of your program can supply you with a list of regular expressions to match against some text, should you combine those expressions into one big one, or treat them separately?

In my case I need an OR relationship, so combining them just means putting a pipe symbol between them.*

So: one expression made by ORing, or looping through several – which is better? There’s only one way to find out:

import re, sys

line_with_match_foo = "This line contains foo."
line_with_match_baz = "This line contains baz."
line_without_match = "This line does not contain it."

re_strings = ( "foo", "bar1", "bar2", "baz", "bar3", "bar4", )

piped_re = re.compile( "|".join( re_strings ) )

separate_res = list( re.compile( r ) for r in re_strings )

NUM_ITERATIONS = 1000000

def piped( line ):
    for i in range( NUM_ITERATIONS ):
        if piped_re.search( line ):
            print "match!" # do something

def separate( line ):
    for i in range( NUM_ITERATIONS ):
        for s in separate_res:
            if s.search( line ):
                print "match!" # do something
                break # stop looping because we matched

arg = sys.argv[1]

if arg == "--piped-nomatch":
    piped( line_without_match )
elif arg == "--piped-match-begin":
    piped( line_with_match_foo )
elif arg == "--piped-match-middle":
    piped( line_with_match_baz )
elif arg == "--separate-nomatch":
    separate( line_without_match )
elif arg == "--separate-match-begin":
    separate( line_with_match_foo )
elif arg == "--separate-match-middle":
    separate( line_with_match_baz )

And here are the results:

$ time python re_timings.py --piped-nomatch > /dev/null

real    0m0.987s
user    0m0.943s
sys     0m0.032s
$ time python re_timings.py --separate-nomatch > /dev/null

real    0m3.695s
user    0m3.641s
sys     0m0.037s

So when no regular expressions match, the combined expression is 3.6 times faster.

$ time python re_timings.py --piped-match-middle > /dev/null

real    0m1.900s
user    0m1.858s
sys     0m0.033s
$ time python re_timings.py --separate-match-middle > /dev/null

real    0m3.543s
user    0m3.439s
sys     0m0.042s

And when an expression near the middle of the list matches, the combined expression is 1.8 times faster.

$ time python re_timings.py --piped-match-begin > /dev/null

real    0m1.847s
user    0m1.797s
sys     0m0.035s
$ time python re_timings.py --separate-match-begin > /dev/null

real    0m1.649s
user    0m1.597s
sys     0m0.032s

But in the (presumably much rarer) case where all lines match the first expression in the list, the separate expressions are marginally faster.

A clear win for combing the expressions, unless you think it’s likely that most lines will match expressions early in the list.

Note also if you combine the expressions the performance is similar when the matching expression is at different positions in the list (whereas in the other case list order matters a lot), so there is probably no need for you or your user to second-guess what order to put the expressions in, which makes life easier for everyone.

I would guess the results would be similar in other programming languages. I certainly found it to be similar in C# on .NET when I tried it a while ago.

By combining the expressions we ask the regular expression engine to do the heavy lifting for us, and it is specifically designed to be good at that job.

Open questions:

1. Have I made a mistake that makes these results invalid?

2. * Can arbitrary regular expressions be ORed together simply by concatenating them with a pipe symbol in between?

3. Can we do something similar if the problem requires us to AND expressions?

Talk in code

Last week we had an extended discussion at work about how we were going to implement a specific feature.

This discussion hijacked our entire Scrum sprint planning meeting (yes, I know, we should have time-boxed it). It was painful, but the guy who was going to implement it (yes, I know, we should all collectively own our tasks) needed the discussion: otherwise it wasn’t going to get implemented. It certainly wasn’t going to get broken into short tasks until we knew how we were going to do it.

Anyway, asides aside, I came out of that discussion bruised but triumphant. We had a plan not only on how to write the code, but also how to test it. I believe the key thing that slowly led the discussion from a FUD-throwing contest into a constructive dialogue was the fact that we began to talk in code.

There are two facets to this principle:

1. Show me the code

As Linus once said, “Talk is cheap. Show me the code.“.

If you are at all disagreeing about how what you’re doing will work, open up the source files in question. Write example code – modify the existing methods or sketch a new one. Outline the classes you will need. Code is inherently unambiguous. White board diagrams and hand-waving are not.

Why wouldn’t you do this? Fear you might be wrong? Perhaps you should have phrased your argument a little less strongly?

Is this slower than drawing boxes on a whiteboard? Not if you include time spent resolving the confusion caused by the ambiguities inherent in line drawings.

Does UML make whiteboards less ambiguous? Yes, if all your developers can be bothered to learn it. But why learn a new language when you can communicate using the language you all speak all day – code?

2. Create a formal language to describe the problem

If your problem is sufficiently complex, you may want to codify the problem into a formal (text-based) language.

In last week’s discussion we were constantly bouncing back and forth between different corner cases until we started writing them down in a formal language.

The language I chose was an adaptation of a Domain-specific language I wrote to test a different part of our program. I would love to turn the cases we wrote down in that meeting into real tests that run after every build (in fact I am working on it) but their immediate value was to turn very confusing “what-if”s into concrete cases we could discuss.

Before we started using the formal language, the conversations went something like this:

Developer: “If we implement it like that, this bad thing will happen.”

Manager: “That’s fine – it’s a corner case that we can tidy up later if we need it.”

Developer: (Muttering) “He clearly doesn’t understand what I mean.”

Repeat

After we started using the formal language they went something like this:

Developer: “If we implement it like that, this bad thing will happen.”

Me: “Write it down, I tell you.”

Developer: (Typing) “See, this will happen!”

Manager: “That’s fine – it’s a corner case that we can tidy up later if we need it.”

Developer: (Muttering) “Flipping managers.”

Summary

The conversation progresses if all parties believe the others understand what they are saying. It is not disagreement that paralyses conversations – it is misunderstanding.

To avoid misunderstanding, talk in code – preferably a real programming language, but if that’s too verbose, a text-based code that is unambiguous and understood by everyone involved.

Note on whiteboards

You can’t copy and paste them, and you can’t (easily) keep what you did with them, and you can’t use them to communicate over long distances.

And don’t even try and suggest an electronic whiteboard. In a few years they may solve all of the above problems, but not now. They fail the “can I draw stuff?” test at the moment.

Even when electronic whiteboards solve those problems, they won’t solve the fact that lines and boxes are more ambiguous and less detailed than code in text form.

If you all know and like UML, that makes your diagrams less ambiguous, but still they often don’t allow enough detail: why bother?

Fixing the vertical panel window list on Ubuntu Hardy

For a long time now I’ve been bugged by GNOME Bug 86382.

Recent versions of GNOME have featured a window list applet which behaves badly when situated on a vertical panel (which is where I like to have mine). Older versions were ok except the buttons would expand and contract to make enough vertical space to hold the window title if written vertically, even though the window titles were written horizontally. Recent versions have much worse behaviour where once you get over about 7 windows open the window list will flicker rapidly between displaying in 1 or two columns making it almost impossible to click on the button you want.

There have been various patches attached to the bug, and today I managed to get the latest installed on my Ubuntu Hardy 8.04 machine by building custom .debs for the relevant components. I’m posting exactly what I did here in case it’s useful to anyone who wants to get this patch installed, and also as a general reference for what you need to do to apply a patch and install it using Ubuntu’s (or Debian’s) package management system as a well-behaved package.

The patches are to libwnck and gnome-panel, and they implement the behaviour I want, which is horizontal text, and buttons of fixed height in a window list which expands as needed to fill the available space. With this patch installed, I find a vertical panel is the best way to manage a large number of open windows.

I don’t recommend running this script – better to read and understand it yourself. Nevertheless (and with no guarantees whatsoever) if you do run it as is on a Hardy machine, it should install patched versions of libwnck and gnome-panel which work better with vertical panels.

# Get all the libraries and source code we need
sudo apt-get install build-essential fakeroot dpkg-dev
sudo apt-get build-dep libwnck gnome-panel

# Create a directory to do all our messing about in
mkdir build-libwnck
cd build-libwnck/

# Get the source code for the 2 components we are building
apt-get source libwnck
apt-get source gnome-panel

# Do libwnck first
cd libwnck-2.22.3

# Create a changelog entry
cat <<END > new-entry.txt
libwnck (2.22.3-0ubuntu2) hardy-proposed; urgency=low

  * applied libwnck vertical window list fix from Alexey Rusakov and Vincent Untz

 -- Anon <anon@example.com>  Wed, 10 Dec 2008 20:50:00 +0000

END

cat new-entry.txt debian/changelog > debian/changelog-new
rm debian/changelog
mv debian/changelog-new debian/changelog
rm new-entry.txt

# Get the libwnck patch and apply it
wget -O libwnck-fix-vertical.patch "http://bugzilla.gnome.org/attachment.cgi?id=124112"
patch -p2 < libwnck-fix-vertical.patch

# Build the package
aclocal
automake
dpkg-buildpackage -rfakeroot -b

# Install the libwnck packages
cd ..
sudo dpkg -i libwnck*.deb

# Now do gnome-panel
cd gnome-panel-2.22.2

# Create a changelog entry
cat <<END > new-entry.txt
gnome-panel (1:2.22.2-0ubuntu1.2) hardy-proposed; urgency=low

 * applied libwnck vertical window list fix from Alexey Rusakov and Vincent Untz

 -- Anon <anon@example.com> Wed, 10 Dec 2008 20:50:00 +0000

END

cat new-entry.txt debian/changelog > debian/changelog-new
rm debian/changelog
mv debian/changelog-new debian/changelog
rm new-entry.txt

# Get the gnome-panel patch and apply it
wget -O gnome-panel-fix-vertical.patch "http://bugzilla.gnome.org/attachment.cgi?id=107133"
patch -p2 < gnome-panel-fix-vertical.patch

# Build the package
aclocal
automake
dpkg-buildpackage -rfakeroot -b

# Install the gnome-panel packages
cd ..
sudo dpkg -i *panel*.deb

Update: fixed incorrect comment – thanks Stefan.

Update: changed -p0 to -p2 in the patch commands.

An actual difficult bug fixed

Of course, I am bound to get a bug report immediately I have posted this telling me my fix breaks everything, but for the moment I am chuffed that I found, tested, and fixed a genuinely difficult bug.

I am particularly proud because I wrote an automated test to ensure it can never happen again, and I used that test to make the debugging process much easier than it otherwise would be. The code that reads, processes and stores listings in FreeGuide is a spider’s web of interfaces and helper classes (because of the arguably over-engineered plugins framework used for all the moving parts), and tracking this down with plain old-fashioned debugging would have been a huge job.

Anyway, I bet you are dying to hear what the bug was, aren’t you?

When you have a programme already stored in FreeGuide, and then a new one comes along that overlaps it, the old programme is deleted. For example if we start off with:

... 19:00 Top Gear ................... 20:00 Charlie and Lola .............

but then later download listings again and get these programmes:

... 19:00 Pocoyo .. 19:15 Round the World ......

Then Top Gear will disappear, and be replaced by the 2 new programmes. In fact, any old programme that overlaps any new incoming programme will be automatically deleted.

At least, that is what is supposed to happen. In fact, the real situation is a little more complex because the programmes are stored in separate files (.ser files) for different days and times. Actually, there are 4 files for each day, named things like “day-2008-09-15-A.ser”, where the suffix A, B, C and D indicate which part of each day a file is for.

So imagine what happens when the first set of programmes comes in looking like this:

19:00 Programme 1A ......... 21:15 Programme 1B .. 21:30 Programme 1C ........... 22:00

and then the second comes in like this:

19:00 Programme 2A......................................... 21:45 Programme 2C .. 22:00

So obviously the old 3 programmes should be completey deleted, and the new 2 should be what you see.

But you don’t. In fact what you see is programme 1B and programme 2C, before 1B and between the two. Weird huh?

“Why?” I hear you ask. Well, it’s simple when you consider how the programmes are split into files.

Programme 1A goes into file day-2008-09-14-D.ser, and programmes 1B and 1C go into day-2008-09-15-A.ser.

[Side note: this is true in this case because the bug reporter is in the GMT -0400 timezone and the file boundaries are quarters of a day in GMT.]

Then, when the new programmes come along, 2A goes into 14-D – wiping out 1A, and 2C goes into 15-A – wiping out 1C but not 1B.

Then, when the files get read back in again later, 2A is read from 14-D, but then 1B is read from 15-A, wiping out 2A, and finally 2C is read in as well from 15-A, so we end up with 1B and 2C.

How to fix it? Well, what I did was leave everything as it is, and then do the final read in the reverse order. This means we read in 1B and 2C, but then we read 2A later, and it wipes out 1B, leaving 2A and 2C as we would expect.

Neat fix eh? It works because this kind of wrongness in the .ser files will only exist when a programme hanging off the end should have wiped out something in a later file. Because programmes are classified into files by their start time, they can only hang off the end of a file, not the beginning, so reading the files in backwards will always read the hanging-over file last, wiping out anything which should have been wiped out earlier.

There is a little bug/feature remaining, but it only applies when you get some really weird listings from your provider. If you had a programme like 1A (19:00 – 21:15), and downloaded new listings, which ONLY contained a programme overlapping it, but falling into a later file (so maybe it starts at 21:00), and didn’t contain any programme starting at 19:00, then the backwards reading would mean you would never see your new programme because it would be wiped out by 1A.

This is a very unusual case though, since normally if you get a new programme at 21:00, you will also get new programmes leading up to it, if only to reflect the fact that 1A is now a different length. So this is really a theoretical bug, which explains why I’ve decided not to fix it…

Anyway, by the time I’d fiddled with my test for this to get the bug to trigger (which took a long time – working out which bits to fake out and which to test at all was tricky), the actual fix was easily implemented (1 line of code I chose to break out into 3), and then validated in a single click.

Just in case I hadn’t mentioned it, I love tests.