How to write a programming language – Part 2, The Parser

Series: Lexer, Parser, Evaluator

My little programming language, Cell (Cell Elementary Learning Language) is designed to be simple. I want to use it to explain how to write a programming language. The parser is only 81 lines long, so hopefully it’s not too hard to understand.

Here’s the explanation of the parser, which is the second part of a compiler or interpreter.

Slides: How to write a programming language – Part 2, The Parser

If you want to, you can Support me on Patreon.

How to write a programming language – Part 1, The Lexer

Series: Lexer, Parser, Evaluator

I wrote a little programming language, Cell which is supposed to be simple enough to help explain how a programming language works.

Here’s the explanation of the lexer, which is the first part of a compiler or interpreter.

Slides: How to write a programming language – Part 1, The Lexer

If you want to, you can Support me on Patreon.

Ambiguous names in Java due to non-normalised unicode – but all OK in Python

In Java and several other languages, identifiers (e.g. method names) are allowed to contain unicode characters.

Unfortunately, some combinations of unicode characters are logically identical. For example, á (one character: Latin Small Letter a with Acute U+00E1) is the same as á (two characters: Latin Small Letter A U+0061, and Non-spacing Acute Accent U+0301). These combinations are not just similar – they are identical by definition.

Java does not do any normalisation on your code before compiling it, so two identifiers containing equivalent but different unicode combinations are considered different (ref: JLS 7 section 3.8).

$ cat U.java 
public class U {
    static String \u00e1() { return "A WITH ACUTE"; }
    static String a\u0301() { return "A + NON-SPACING ACUTE"; }
    public static void main(String[] a) {
        System.out.println(á());
        System.out.println(á());
    }
}
$ javac U.java && java U
A WITH ACUTE
A + NON-SPACING ACUTE

We can define and use two functions called á and á and they are totally independent entities.

But don’t do this.

Python 3 also allows unicode characters in identifiers, but it avoids the above problem by normalising them (ref: Python 3 Reference, section 2.3):

$ cat U.py 
#!/usr/bin/env python3

def á():
    print("A WITH ACUTE")

def á():
    print("A + NON-SPACING ACUTE")

á()
á()

$ hexdump -C U.py 
23 21 2f 75 73 72 2f 62  69 6e 2f 65 6e 76 20 70  |#!/usr/bin/env p|
79 74 68 6f 6e 33 0a 0a  64 65 66 20 c3 a1 28 29  |ython3..def ..()|
3a 0a 20 20 20 20 70 72  69 6e 74 28 22 41 20 57  |:.    print("A W|
49 54 48 20 41 43 55 54  45 22 29 0a 0a 64 65 66  |ITH ACUTE")..def|
20 61 cc 81 28 29 3a 0a  20 20 20 20 70 72 69 6e  | a..():.    prin|
74 28 22 41 20 2b 20 4e  4f 4e 2d 53 50 41 43 49  |t("A + NON-SPACI|
4e 47 20 41 43 55 54 45  22 29 0a 0a c3 a1 28 29  |NG ACUTE")....()|
0a 61 cc 81 28 29 0a 0a                           |.a..()..|
$ ./U.py 
A + NON-SPACING ACUTE
A + NON-SPACING ACUTE

(Legend: A WITH ACUTE, A + NON-SPACING ACUTE)

The second definition overwrites the first because they are considered identical. You can call it via either way of saying its name.

Both ways of working are scary, but I’d definitely choose the Python 3 way if I had to.

Snake in Python 3 + Qt 5

Series: Groovy, Ruby, BASIC, Dart, Elm, Python3+Qt5

I’m writing the game Snake in lots of programming languages, for fun, and to try out new languages.

Python 3 broke compatibility to fix some mistakes – was it worth it? Qt 5 continues to offer more and more features – can it win me over?

Slides: Snake in Python 3 + Qt 5

If you want to, you can Support me on Patreon.

Which Raspberry Pi photo was funniest?

We had a great day at the Egham Raspberry Pi Jam, and Rabbit Escape and our Photo Booth:

seemed to go down well:


But which photo was funniest? Here are some of the entries (I had to choose kids’ ones without faces to go on here, but there were some other great ones!):





But the winner has to be the eyeball wearing a hat!

Thanks everyone, see you next time!