Creative Commons Licence This work is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License

How to write a programming language

Part 1: The Lexer

Andy Balaam
artificialworlds.net/blog

Contents

What is a programming language?

What is a programming language?

What is a programming language?

Lexers emit tokens

Lexers emit tokens

foo = "bar";

becomes:

("symbol", "foo") ("=" , "" ) ("string", "bar") (";" , "" )

Lexers emit tokens

200 - 158

becomes:

("number", "200") ("operator", "-" ) ("number", "158")

Brief intro to Cell

Cell is a programming language with:

There is nothing else good about it.

github.com/andybalaam/cell

Brief intro to Cell

num1 = 3; square = {:(x) x * x;}; num2 = square( num1 );

Cell's Lexer

Lexing in Cell consists of identifying:

Cell's Lexer

def lex(chars): while chars.next is not None: c = chars.move_next() if c in " \n": ... elif c in "+-*/": ... elif c in "(){},;=:": ... elif c in ("'", '"'): ... elif re.match("[.0-9]", c): ... elif re.match("[_a-zA-Z]", c): ... else: ...

Cell's Lexer

def lex(chars): while chars.next is not None: c = chars.move_next() if c in " \n": pass elif c in "+-*/": ... elif c in "(){},;=:": ... elif c in ("'", '"'): ... elif re.match("[.0-9]", c): ... elif re.match("[_a-zA-Z]", c): ... else: ...

Cell's Lexer

def lex(chars): while chars.next is not None: c = chars.move_next() if c in " \n": pass elif c in "+-*/": yield ("operation", c) elif c in "(){},;=:": ... elif c in ("'", '"'): ... elif re.match("[.0-9]", c): ... elif re.match("[_a-zA-Z]", c): ... else: ...

Cell's Lexer

def lex(chars): while chars.next is not None: c = chars.move_next() if c in " \n": pass elif c in "+-*/": yield ("operation", c) elif c in "(){},;=:": yield (c, "") elif c in ("'", '"'): ... elif re.match("[.0-9]", c): ... elif re.match("[_a-zA-Z]", c): ... else: ...

Cell's Lexer

def lex(chars): while chars.next is not None: c = chars.move_next() if c in " \n": pass elif c in "+-*/": yield ("operation", c) elif c in "(){},;=:": yield (c, "") elif c in ("'", '"'): yield ("string", _scan_string(c, chars)) ...

Cell's Lexer

def _scan_string(delim, chars): ret = "" while chars.next != delim: c = chars.move_next() if c is None: raise Exception(...) ret += c chars.move_next() return ret

Cell's Lexer

def lex(chars): while chars.next is not None: c = chars.move_next() if c in " \n": pass elif c in "+-*/": yield ("operation", c) elif c in "(){},;=:": yield (c, "") elif c in ("'", '"'): yield ("string... elif re.match("[.0-9]", c): yield ("number", _scan(c, chars, "[.0-9]")) ...

Cell's Lexer

def _scan(first_char, chars, allowed): ret = first_char p = chars.next while p is not None and re.match(allowed, p): ret += chars.move_next() p = chars.next return ret

Cell's Lexer

def lex(chars): while chars.next is not None: c = chars.move_next() if c in " \n": pass ... elif re.match("[_a-zA-Z]", c): yield ( "symbol", _scan(c, chars, "[_a-zA-Z0-9]") ) ...

Cell's Lexer

def lex(chars): while chars.next is not None: c = chars.move_next() if c in " \n": ... elif c in "+-*/": ... elif c in "(){},;=:": ... elif c in ("'", '"'): ... elif re.match("[.0-9]", c): ... elif re.match("[_a-zA-Z]", c): ... else: raise Exception(...)

Cell's Lexer

assert ( list(lex('print("Hello, world!");')) == [ ("symbol", "print") , ("(", "") , ("string", "Hello, world!") , (")", "") , (";", "") ] )

Discussion

Donate

Donate! patreon.com/andybalaam

Play

Play! artificialworlds.net/rabbit-escape

More info

Videos youtube.com/user/ajbalaam
Twitter @andybalaam
Blog artificialworlds.net/blog
Projects artificialworlds.net
GitHub github.com/andybalaam