How to write a programming language – Part 1, The Lexer

Series: Lexer, Parser, Evaluator

I wrote a little programming language, Cell which is supposed to be simple enough to help explain how a programming language works.

Here’s the explanation of the lexer, which is the first part of a compiler or interpreter.

Slides: How to write a programming language – Part 1, The Lexer

If you want to, you can Support me on Patreon.

Simple example of Netty 4 usage

I feel the title of this post over-promises, since I was not able to make an example that seemed simple to me.

Anyway, here is a near-minimal example of how to use Netty to make a server that shouts back at you whatever you say:

NettyExample.java:

import io.netty.bootstrap.ServerBootstrap;
import io.netty.buffer.ByteBuf;
import io.netty.buffer.Unpooled;
import io.netty.channel.ChannelHandlerContext;
import io.netty.channel.ChannelInboundHandlerAdapter;
import io.netty.channel.ChannelInitializer;
import io.netty.channel.EventLoopGroup;
import io.netty.channel.nio.NioEventLoopGroup;
import io.netty.channel.socket.nio.NioServerSocketChannel;
import io.netty.channel.socket.SocketChannel;
import io.netty.util.CharsetUtil;
import io.netty.util.ReferenceCountUtil;
import java.nio.charset.Charset;

class NettyExample
{
    public static void main( String[] args ) throws Exception
    {
        EventLoopGroup bossGroup = new NioEventLoopGroup();
        EventLoopGroup workerGroup = new NioEventLoopGroup();
        try
        {
            new ServerBootstrap()
                .group( bossGroup, workerGroup )
                .channel( NioServerSocketChannel.class )
                .childHandler( new Init() )
                .bind( 1337 ).sync().channel().closeFuture().sync();
        }
        finally
        {
            bossGroup.shutdownGracefully();
            workerGroup.shutdownGracefully();
        }
    }

    private static class Init extends ChannelInitializer
    {
        @Override
        public void
        initChannel( SocketChannel ch ) throws Exception
        {
            ch.pipeline().addLast( new ShoutyHandler() );
        }
    }

    private static class ShoutyHandler extends ChannelInboundHandlerAdapter
    {
        @Override
        public void channelRead( ChannelHandlerContext ctx, Object msg )
        {
            try
            {
                Charset utf8 = CharsetUtil.UTF_8;
                String in = ( (ByteBuf)msg ).toString( utf8 );
                String out = in.toUpperCase(); // Shout!
                ctx.writeAndFlush( Unpooled.copiedBuffer( out, utf8 ) );
            }
            finally
            {
                ReferenceCountUtil.release( msg );
            }
        }

        @Override
        public void exceptionCaught(
            ChannelHandlerContext ctx, Throwable cause )
        {
            cause.printStackTrace();
            ctx.close();
        }
    }
}

The lines that actually do something useful are highlighted in red. If anyone knows how to make it shorter, please comment below. It seems a lot to me.

To run this, do:

sudo apt-get install openjdk-8-jdk
wget 'http://search.maven.org/remotecontent?filepath=io/netty/netty-all/4.1.5.Final/netty-all-4.1.5.Final.jar -O netty-all-4.1.5.Final.jar'
javac -Werror -cp netty-all-4.1.5.Final.jar:. NettyExample.java && java -cp netty-all-4.1.5.Final.jar:. NettyExample

Then in another terminal:

echo "Hello, world" | nc localhost 1337

and observe the response:

HELLO, WORLD

Comparison with Node.js

Just for comparison, here is an approximate equivalent in Node.js:

shouty.js:

var net = require('net');

var server = net.createServer(
    function( socket ) {
        socket.setEncoding('utf8');
        socket.on(
            'data',
            function( data ) {
                socket.end( data.toUpperCase() );
            }
        )
    }
);

server.listen( 1337, "localhost" );

To run it, do:

sudo apt-get install nodejs-legacy
node shouty.js

Then in another terminal:

echo "Hello, world" | nc localhost 1337

and observe the response:

HELLO, WORLD

Elm resizeable SVG canvas filling the screen

I am toying with writing an SVG-based game in (exciting-looking JavaScript-replacement) Elm, and I wanted an SVG that filled the whole screen and resized when the screen resized. I found it harder than I expected, so here is what I came up with for your information and comment.

Try the demo.

Because I was using Html.App.programWithFlags I was not able to shortcut the process and use just elm-reactor – I needed to create an HTML file and compile my code with elm-make.

index.html sets up a full-screen app and passes in the window size:

<!DOCTYPE HTML>
<html>
<head>
    <meta charset="UTF-8"/>
    <title>Sootl</title>
    <script src="sootl.js"></script>
    <style>
        html, body, svg
        {
            margin: 0px;
            padding: 0px;
            border: 0px;
            overflow: hidden;
        }
    </style>
</head>
<body>
</body>
<script>
var app = Elm.Main.fullscreen(
    {
        width:  window.innerWidth,
        height: window.innerHeight
    }
);
</script>
</html>

elm-package.json requires the Html, Svg and Window packages:

{
    "version": "1.0.0",
    "summary": "Stay out of the light!",
    "repository": "https://github.com/andybalaam/sootl.git",
    "license": "GPL2",
    "source-directories": [
        "."
    ],
    "exposed-modules": [],
    "dependencies": {
        "elm-lang/core": "4.0.5 <= v < 5.0.0",
        "elm-lang/html": "1.1.0 <= v < 2.0.0",
        "elm-lang/svg": "1.1.1 <= v < 2.0.0",
        "elm-lang/window": "1.0.0 <= v < 2.0.0"
    },
    "elm-version": "0.17.1 <= v < 0.18.0"
}

Main.elm contains the Elm code, which starts off with the window size from JavaScript, and then listens to resize events using the Window module.

import Html exposing (Html)
import Html.App exposing (programWithFlags)
import Svg exposing (..)
import Svg.Attributes exposing (..)
import Window


type alias Flags =
    { width : Int
    , height : Int
    }


type Msg = Resize Int Int


type alias Model =
    { screen :
        { width : Int
        , height : Int
        }
    }


init : Flags -> (Model, Cmd Msg)
init flags =
    (
        { screen =
            { width = flags.width
            , height = flags.height
            }
        }
    , Cmd.none
    )



view : Model -> Html Msg
view model =
    let
        sw = model.screen.width  - 0
        sh = model.screen.height - 0
    in
        svg
        [ width  <| toString sw
        , height <| toString sh
        ]
        [ rect
            [ x "0"
            , y "0"
            , width (toString model.screen.width)
            , height (toString model.screen.height)
            , fill "#eeffee"
            ]
            []
        , text'
            [ x <| toString <| sw / 2
            , y <| toString <| sh / 2
            , fontSize <| toString <| sh / 10
            , textAnchor "middle"
            ]
            [ text
                ((toString model.screen.width)
                ++ ", "
                ++ (toString model.screen.height))
            ]
        ]


update : Msg -> Model -> (Model, Cmd Msg)
update msg model =
    let m =
        case msg of
            Resize w h -> {model | screen = {width = w, height = h}}
    in
        (m, Cmd.none)


subscriptions : Model -> Sub Msg
subscriptions model =
    Window.resizes (\size -> Resize size.width size.height)


main =
   programWithFlags
     { init = init
     , view = view
     , update = update
     , subscriptions = subscriptions
     }

I installed all the packages with:

elm-package install

Then I compiled the code with:

elm-make --output=sootl.js Main.elm

Now I launched elm-reactor:

elm-reactor

And navigated my browser to http://localhost:8000/index.html to see it working.

Ambiguous names in Java due to non-normalised unicode – but all OK in Python

In Java and several other languages, identifiers (e.g. method names) are allowed to contain unicode characters.

Unfortunately, some combinations of unicode characters are logically identical. For example, á (one character: Latin Small Letter a with Acute U+00E1) is the same as á (two characters: Latin Small Letter A U+0061, and Non-spacing Acute Accent U+0301). These combinations are not just similar – they are identical by definition.

Java does not do any normalisation on your code before compiling it, so two identifiers containing equivalent but different unicode combinations are considered different (ref: JLS 7 section 3.8).

$ cat U.java 
public class U {
    static String \u00e1() { return "A WITH ACUTE"; }
    static String a\u0301() { return "A + NON-SPACING ACUTE"; }
    public static void main(String[] a) {
        System.out.println(á());
        System.out.println(á());
    }
}
$ javac U.java && java U
A WITH ACUTE
A + NON-SPACING ACUTE

We can define and use two functions called á and á and they are totally independent entities.

But don’t do this.

Python 3 also allows unicode characters in identifiers, but it avoids the above problem by normalising them (ref: Python 3 Reference, section 2.3):

$ cat U.py 
#!/usr/bin/env python3

def á():
    print("A WITH ACUTE")

def á():
    print("A + NON-SPACING ACUTE")

á()
á()

$ hexdump -C U.py 
23 21 2f 75 73 72 2f 62  69 6e 2f 65 6e 76 20 70  |#!/usr/bin/env p|
79 74 68 6f 6e 33 0a 0a  64 65 66 20 c3 a1 28 29  |ython3..def ..()|
3a 0a 20 20 20 20 70 72  69 6e 74 28 22 41 20 57  |:.    print("A W|
49 54 48 20 41 43 55 54  45 22 29 0a 0a 64 65 66  |ITH ACUTE")..def|
20 61 cc 81 28 29 3a 0a  20 20 20 20 70 72 69 6e  | a..():.    prin|
74 28 22 41 20 2b 20 4e  4f 4e 2d 53 50 41 43 49  |t("A + NON-SPACI|
4e 47 20 41 43 55 54 45  22 29 0a 0a c3 a1 28 29  |NG ACUTE")....()|
0a 61 cc 81 28 29 0a 0a                           |.a..()..|
$ ./U.py 
A + NON-SPACING ACUTE
A + NON-SPACING ACUTE

(Legend: A WITH ACUTE, A + NON-SPACING ACUTE)

The second definition overwrites the first because they are considered identical. You can call it via either way of saying its name.

Both ways of working are scary, but I’d definitely choose the Python 3 way if I had to.

Snake in Python 3 + Qt 5

Series: Groovy, Ruby, BASIC, Dart, Elm, Python3+Qt5

I’m writing the game Snake in lots of programming languages, for fun, and to try out new languages.

Python 3 broke compatibility to fix some mistakes – was it worth it? Qt 5 continues to offer more and more features – can it win me over?

Slides: Snake in Python 3 + Qt 5

If you want to, you can Support me on Patreon.