Archive for the ‘Python’ Category

My Address Book 1.9.0 – rewritten from scratch

Wednesday, May 11th, 2011

Ordinarily, my motivations for doing open source work are clear: peer recognition and the satisfaction of knowing people are using my work.

However, I’ve been distracted from that stuff recently because of my desire to scratch my own itch, by re-writing My Address Book from scratch to allow sharing the same address book between the web interface and the various email programs I use on different computers.

The only way to make it work with those email programs is to store the data in an LDAP server, instead of the custom MySQL database I had used in the original version (which is still available, of course: My Address Book version 1).

While I was doing it, I took the opportunity (or made the mistake) of re-writing in my favourite language, Python. Joel would not approve, but that’s the fun of open source: I can do whatever I like, no matter how unwise.

I learned a lot of things on the way:

  1. I like web.py. It is a simple, and helpful library in the Python tradition of being concise but powerful.
  2. Setting up Python running as FastCGI on either Apache or lighttpd is much harder than creating a page of PHP. However, once it’s done, it’s done. (The results of my research are shown at the bottom of the Install Guide.)
  3. Setting up an LDAP server to act as a little address book is unbelievably complicated. See the Install Guide for how to do it. One day, I or someone else will turn all those instructions into the preinstall step of a .deb, and no-one will ever have to worry about it again. Volunteers, please.
  4. Templetor, web.py’s templating system is fine, but too slow for large pages. I had to reimplement the main address list rendering in plain Python, which made me sad.
  5. I am compulsive about getting my work out into the world.

It has been a struggle to write all the documentation and think through the installation procedure and all the other stuff that comes with making a public release, but I have been unable to move on to other things until I have got it done. I hope I’ve done an ok job – the installation procedure is far too complex and should be automated, but if the volume of documentation is any indicator of how helpful it is, it should be reasonable.

I think I found it more difficult than normal because, unlike normal, getting it out to other people was not my main motivation for doing the project. I’m glad I’ve worked through it though, and I really hope some people get interested enough to help me make it much easier to use.

I think this project is a good example of how it can be much better to live in the open source world than the proprietory one. If I want to create a small address book for my family in the open source world, I end up with an industrial-strength LDAP server providing it, meaning that so long as someone does the work to make it simple to use for my simple use case, it can scale to solve pretty much any problem I might have in the future. So if I start a small business that eventually grows and needs to track millions of addresses, I can keep using the same LDAP server, on the same hardware, as my original address book.

Of course, My Address Book would have broken long before that. It doesn’t even do paging, and the home page lists all your addresses in one page.

I must write a rant one day about how I hate paging.

Diffident 0.3

Saturday, October 10th, 2009

My original plan for Diffident, the side-by-side diff viewer and editor that works in a terminal, was to implement basic editing capabilities before making another release.

Of course, that turned out to be quite ambitious. It involves essentially implementing a full text editor, which is not really what I want to do. I may actually implement a “jump out to $EDITOR” option before the basic text editing facilities.

What I have implemented for this release is the ability to add and remove lines, and copy lines from one side to the other. For my personal use, this covers about 90% of cases, so I think it’s worthy of a release.

There is no undo/redo as yet, but the framework for that is in place, so I may make another release sometime soonish that is just for that.

So the dream of a diff viewer and editor is starting to come true…

Random background in GNOME

Friday, July 31st, 2009

I’ve written a little script to show a different random background in GNOME every day, by looking inside your Photos directory.

It will rotate them to be the right way up, if you have exiftran installed.

Get it here: Random Background.

Separate regular expressions, or one more complex one?

Friday, April 3rd, 2009

I have asked myself this question several times, so I thought it was about time I did a test and found an answer.

If the user of your program can supply you with a list of regular expressions to match against some text, should you combine those expressions into one big one, or treat them separately?

In my case I need an OR relationship, so combining them just means putting a pipe symbol between them.*

So: one expression made by ORing, or looping through several – which is better? There’s only one way to find out:

import re, sys

line_with_match_foo = "This line contains foo."
line_with_match_baz = "This line contains baz."
line_without_match = "This line does not contain it."

re_strings = ( "foo", "bar1", "bar2", "baz", "bar3", "bar4", )

piped_re = re.compile( "|".join( re_strings ) )

separate_res = list( re.compile( r ) for r in re_strings )

NUM_ITERATIONS = 1000000

def piped( line ):
    for i in range( NUM_ITERATIONS ):
        if piped_re.search( line ):
            print "match!" # do something

def separate( line ):
    for i in range( NUM_ITERATIONS ):
        for s in separate_res:
            if s.search( line ):
                print "match!" # do something
                break # stop looping because we matched

arg = sys.argv[1]

if arg == "--piped-nomatch":
    piped( line_without_match )
elif arg == "--piped-match-begin":
    piped( line_with_match_foo )
elif arg == "--piped-match-middle":
    piped( line_with_match_baz )
elif arg == "--separate-nomatch":
    separate( line_without_match )
elif arg == "--separate-match-begin":
    separate( line_with_match_foo )
elif arg == "--separate-match-middle":
    separate( line_with_match_baz )

And here are the results:

$ time python re_timings.py --piped-nomatch > /dev/null

real    0m0.987s
user    0m0.943s
sys     0m0.032s
$ time python re_timings.py --separate-nomatch > /dev/null

real    0m3.695s
user    0m3.641s
sys     0m0.037s

So when no regular expressions match, the combined expression is 3.6 times faster.

$ time python re_timings.py --piped-match-middle > /dev/null

real    0m1.900s
user    0m1.858s
sys     0m0.033s
$ time python re_timings.py --separate-match-middle > /dev/null

real    0m3.543s
user    0m3.439s
sys     0m0.042s

And when an expression near the middle of the list matches, the combined expression is 1.8 times faster.

$ time python re_timings.py --piped-match-begin > /dev/null

real    0m1.847s
user    0m1.797s
sys     0m0.035s
$ time python re_timings.py --separate-match-begin > /dev/null

real    0m1.649s
user    0m1.597s
sys     0m0.032s

But in the (presumably much rarer) case where all lines match the first expression in the list, the separate expressions are marginally faster.

A clear win for combing the expressions, unless you think it’s likely that most lines will match expressions early in the list.

Note also if you combine the expressions the performance is similar when the matching expression is at different positions in the list (whereas in the other case list order matters a lot), so there is probably no need for you or your user to second-guess what order to put the expressions in, which makes life easier for everyone.

I would guess the results would be similar in other programming languages. I certainly found it to be similar in C# on .NET when I tried it a while ago.

By combining the expressions we ask the regular expression engine to do the heavy lifting for us, and it is specifically designed to be good at that job.

Open questions:

1. Have I made a mistake that makes these results invalid?

2. * Can arbitrary regular expressions be ORed together simply by concatenating them with a pipe symbol in between?

3. Can we do something similar if the problem requires us to AND expressions?

duckmaze 0.2

Saturday, November 10th, 2007

I’ve finally released to the world the secret extra duckmaze levels. They’re contained in the 0.2 release, which is also faster, and doesn’t freeze time.

(As far as I know.)

It’s got 18 levels, and a level editor, which means you can make more.

  • Do it.
  • Send them to me.
  • ???
  • Profit.

Announcing Record TV

Monday, September 24th, 2007

Last night I uploaded the first public version of my latest project, Record TV. Record TV is a system for recording TV (on a Linux desktop computer) that is designed to allow lots of different user interfaces all to use the same back end. It is currently only useful for people who are quite familiar with the Linux command line. It essentially has no user interface at all, but the back end stuff works for recording TV.

Perhaps more excitingly, I have also managed to get my recorded programmes to play back on my Nintendo Wii, so I can watch them on my TV.

Find out more on the project page linked above. I’ve released this code very early, in the spirit of “release early, release often,” so expect to hack on it a bit to get it working.

If you think MythTV just goes about things the wrong way, and you’d like to help do it right, it might be of interest.

It’s mostly Python, with some PHP and shell scripts.

A job I’d like to do is to be able to use FreeGuide as a UI for selecting programmes to record.