Streaming to Twitch and PeerTube simultaneously using nginx on Oracle cloud

Simulcasting RTMP using NGINX

I want people to be able to watch my Matrix and Rust live coding streams using free software, so I’d like to simulcast to PeerTube as well as Twitch.

This is possible using NGINX and its RTMP module. It does involve building NGINX from source, but I actually found that reasonably easy to do.

Why Oracle cloud?

I would never recommend using Oracle for anything, but they do provide up to two virtual machines in their cloud for free, and the one I am using has been consistently available with very good connectivity, in a London data centre since I set it up several months ago.

So, we are making our lives more difficult by trying to do this on Oracle Linux, which is a derivative of RHEL.

Building NGINX and its RTMP module on Oracle Linux

I ran these commands on my Oracle cloud instance (running Oracle Linux):

sudo yum install git pcre-devel openssl-devel
mkdir nginx
cd nginx
wget http://nginx.org/download/nginx-1.21.4.tar.gz
git clone https://github.com/arut/nginx-rtmp-module.git
cd nginx-1.21.4
./configure --add-module=../nginx-rtmp-module/
make
sudo make install

After all this NGINX was installed to /usr/local/nginx/.

Creating the NGINX config file for RTMP simulcasting

Next I edited the NGINX config file by typing:

sudo nano /usr/local/nginx/conf/nginx.conf

And pasted in this config at the bottom of the file:

rtmp {
    server {
        listen 1935;
        chunk_size 4096;
        application live {
            live on;
            record off;
            push rtmp://live.twitch.tv/app/live_INSERT_TWITCH_STREAM_KEY;
            push rtmp://diode.zone:1935/live/INSERT_PEERTUBE_STREAM_KEY;
        }
    }
}

Notice that you will need to get your Twitch stream key from Twitch -> Creator Dashboard -> Settings -> Stream, then Copy next to the Primary Stream Key.

To get a PeerTube stream ID, you will need to go to your PeerTube page and click Publish, then Go Live, choose your channel and choose Go Live. Note that if you want the streams to record and be available later, you have to create a new stream key each time you start a stream, and change it in nginx.conf.

If you use a different PeerTube server (I use diode.zone) then you’ll need to change the server name in the config file above too.

Make sure your config file is saved with the right URLs in it.

Opening ports

To send RTMP traffic to my server, I needed to open the right port to the Oracle cloud instance. That involved creating an ingress rule, and adding a firewall rule.

Creating an ingress rule

In the web interface, I went to the menu in the top left, clicked Compute, then Instances.

I clicked on my instance’s name, then I clicked on the name of the subnet in the details (on the right).

I clicked on Default security list for…, then Add Ingress Rules.

I made an ingress rule with Source Type=CIDR, Source CIDR=0.0.0.0/0, IP Protocol=TCP, Source Port Range=(blank, meaning all), Destination Port Range=1935

Adding a firewall rule

Then I ssh’d into the machine and ran these commands to create a firewall rule allowing the traffic:

sudo firewall-cmd --zone=public --permanent --add-port=1935/tcp
sudo firewall-cmd --reload

Stop and Start NGINX

After creating the config file and opening the right port, I needed to start NGINX.

Every time I change the config file, I need to restart it.

If it’s already running, I stop it with:

sudo /usr/local/nginx/sbin/nginx -s stop

and then I start it up again with

sudo /usr/local/nginx/sbin/nginx

I can check whether it’s happy by looking at the log files, for example to see any errors:

less /usr/local/nginx/logs/error.log

Starting the stream

Now I go into OBS and go to File -> Settings -> Stream and choose the type as Custom, and the Server as rtmp://1.1.1.1/live. (But instead of 1.1.1.1 I put the public IP address of my instance, which I found by clicking the name of the instance in the Oracle cloud management console.)

New game: Tron – frantic multiplayer retro action

My newest game is out now on Smolpxl Games – Tron:

Pixellated lines fight each other to stay alive

Play at smolpxl.gitlab.io/tron.

It’s a frantic multiplayer retro pixellated thingy playable in your browser. Try to stay alive longer than everyone else!

This version allows many players (up to 16 if you can manage it), and is quite pure in its implementation.

There are bots to play against, and you can gather your friends around a keyboard to play together.

Part of the motivation for writing this game was to test my new smolpxl-remote remote-play system, but this is not enabled yet, so watch this space…

I love playing games with other people – preferably at least 3 other people. In theory you could have 8 players around a keyboard playing this – send me a picture if you try!

One feature I worked on in the Smolpxl library for this game: saving configuration to local storage (and asking permission to do so). I ended up with a very ugly hack to do this, so a bit more work is needed before I merge it into the library.

Preventing Virgin Media hijacking my DNS

Yesterday I learned that Virgin Media is inserting itself into some of my DNS requests. Much as I am not a fan of how powerful Cloudflare are, if they are telling the truth about their DNS, then it’s safe, so I followed their instructions on how to use their DNS and then removed the default DNS and hopefully my Internet will work now.

From the serverfault answer by lauc.exon.nod:

nmcli con mod "Wired connection 1" ipv4.dns "1.1.1.1 1.0.0.1"
nmcli con mod "Wired connection 1" ipv4.ignore-auto-dns yes
nmcli con down "Wired connection 1"
nmcli con up "Wired connection 1"

New Job at Element (Matrix)

I started a new job today at Element!

It has been a long-standing ambition of mine to work in Free and Open Source software, and I am very excited to work for a company that is the main developer of a really important project: the Matrix communication network.

I don’t know much about what I’ll be doing yet, but finding an open source company with a decent business model that is prepared to pay me is very exciting. The fact that they have offices that are close enough for me to go for is another huge bonus.

Wish me luck, and I’ll let you know what I’m working on when it becomes more clear.

What to cache when building Rust using Gitlab CI or similar

When building your project with Gitlab CI or a similar build tool, you can end up spending a lot of time watching your build repeat the same steps over and over. This is especially frustrating when it mostly consists of downloading and compiling the same things we downloaded and compiled last time.

To mitigate this, we can ask Gitlab CI to cache things that will be the same next time.

For a Rust project, the most important thing to cache is target in the local directory.

But, if you are installing tools using rustup or cargo, it will really help if you cache those too. Fortunately, Rust knows where those are by using environment variables, and these are defined in the standard Rust Docker image.

We can make sure we’re caching as much as possible by adding a section like this to .gitlab-ci.yml:

    cache:
        key: shared-cache
        paths:
            - target/
            - $CARGO_HOME/
            - $RUSTUP_HOME/

If you add this to all your jobs, they will share a single cache between them, and cache the local target directory as well as any tools installed with rustup or cargo.

Here is a full example from my Evolve SVGs project:

image: rust:latest

before_script:
    - rustup component add rustfmt
    - rustup target add wasm32-unknown-unknown
    - cargo install trunk wasm-bindgen-cli

pages:
    stage: deploy
    script:
        - echo "Publishing pages to" $CI_PAGES_URL
        - make deploy
        - mv dist public
    artifacts:
      paths:
        - public
    only:
        - main
    cache:
        key: shared-cache
        paths:
            - target/
            - $CARGO_HOME/
            - $RUSTUP_HOME/

test:
    stage: test
    script:
        - make test
    cache:
        key: shared-cache
        paths:
            - target/
            - $CARGO_HOME/
            - $RUSTUP_HOME/

Minimal example of a Maven pom for a mixed Kotlin and Java project

The Kotlin docs describe some things you need in your pom.xml to create a project that is a mix of Kotlin and Java code, but there is no complete example, so here is mine:

pom.xml:

<project>
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.example.kj</groupId>
    <artifactId>kotlin-and-java</artifactId>
    <version>1.0.0-SNAPSHOT</version>

    <properties>
        <kotlin.version>1.5.21</kotlin.version>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
    </properties>

    <build>
        <plugins>
            <plugin>
                <groupId>org.jetbrains.kotlin</groupId>
                <artifactId>kotlin-maven-plugin</artifactId>
                <version>${kotlin.version}</version>
                <executions>
                    <execution>
                        <id>compile</id>
                        <goals>
                            <goal>compile</goal>
                        </goals>
                        <configuration>
                            <sourceDirs>
                                <sourceDir>${project.basedir}/src/main/kotlin</sourceDir>
                                <sourceDir>${project.basedir}/src/main/java</sourceDir>
                            </sourceDirs>
                        </configuration>
                    </execution>
                    <execution>
                        <id>test-compile</id>
                        <goals> <goal>test-compile</goal> </goals>
                        <configuration>
                            <sourceDirs>
                                <sourceDir>${project.basedir}/src/test/kotlin</sourceDir>
                                <sourceDir>${project.basedir}/src/test/java</sourceDir>
                            </sourceDirs>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.5.1</version>
                <executions>
                    <!-- Replacing default-compile as it is treated specially by maven -->
                    <execution>
                        <id>default-compile</id>
                        <phase>none</phase>
                    </execution>
                    <!-- Replacing default-testCompile as it is treated specially by maven -->
                    <execution>
                        <id>default-testCompile</id>
                        <phase>none</phase>
                    </execution>
                    <execution>
                        <id>java-compile</id>
                        <phase>compile</phase>
                        <goals>
                            <goal>compile</goal>
                        </goals>
                    </execution>
                    <execution>
                        <id>java-test-compile</id>
                        <phase>test-compile</phase>
                        <goals>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
    <dependencies>
        <dependency>
            <groupId>org.jetbrains.kotlin</groupId>
            <artifactId>kotlin-stdlib</artifactId>
            <version>${kotlin.version}</version>
        </dependency>
    </dependencies>
</project>

src/main/java/MyJava.java:

public class MyJava {
    public static void main(String[] args) {
        MyKotlin k = new MyKotlin();  // Use Kotlin from Java
        System.out.println(k.message());
    }
}

src/main/kotlin/MyKotlin.kt:

class MyKotlin : MyJava() {  // Use Java from Kotlin
    fun message(): String {
        return "Hello from Kotlin!"
    }
}

src/test/java/MadeInJavaTest.java:

class MadeInJavaTest {
    public void testCanUseJava() {
        MyJava j = new MyJava();
    }

    public void testCanUseKotlin() {
        MyKotlin k = new MyKotlin();
        assertEquals(k.message(), "Hello from Kotlin!");
    }

    static void assertEquals(String left, String right) {
        if (!left.equals(right)) {
            throw new AssertionError(left + " != " + right);
        }
    }
}

src/test/kotlin/MadeInKotlinTest.kt:

class MadeInKotlinTest {
    fun testCanUseJava() {
        MyJava()
    }

    fun testCanUseKotlin() {
        val k = MyKotlin();
        assertEquals(k.message(), "Hello from Kotlin!");
    }
}

fun assertEquals(left: String, right: String) {
    if (left != right) {
        throw AssertionError("${left} != ${right}");
    }
}

Importing/migrating from one peertube server to another

My Peertube server is shutting down, so I need to move my videos to another one. The official scripts don’t seem to cover this case very well, so here is what I did.

My script fetches videos and their details and uploads them to the new server via the Peertube API.

Contributions welcome: I was not able to copy video descriptions across, and I was too lazy so I hard-coded the number of tags. Also, I didn’t make a git repo for all this because I felt it needs more thought, but feel free to start one and I will happily contribute this.

This script copies all videos in a single Peertube channel to a different server. You must find the numeric ID of the channel on the new server to copy into, which I did by looking at the responses in the Network tab of Firefox’s developer tools when I clicked on its name in the web interface. It requires bash, curl, youtube-dl and jq.

Here’s peertube-migrate-channel.bash:

#!/bin/bash

set -u
set -e

# Modify these for your setup
OLD_SERVER="INSERT OLD SERVER e.g. https://peertube.social"
OLD_CHANNEL="INSERT CHANNEL URL-NAME e.g. trials_fusion"
NEW_SERVER="INSERT NEW SERVER e.g. https://video.hardlimit.com"
NEW_CHANNEL="INSERT NEW CHANNEL ID e.g. 4103"
USERNAME="INSERT_USERNAME for new server e.g. trials"
PASSWORD="INSERT PASSWORD for new server"
TAG1="INSERT_A_TAG e.g trials"
TAG2="INSERT_A_TAG e.g. gaming"
TAG3="INSERT_A_TAG e.g. gameing"

DIR=$(mktemp -d)
API_PATH="${NEW_SERVER}/api/v1"

# Find out how many videos are in the channel
curl -s \
    "${OLD_SERVER}/api/v1/video-channels/${OLD_CHANNEL}/videos/?count=1" \
    > "${DIR}/videos-total.json"

TOTAL=$(jq .total < "${DIR}/videos-total.json")
CURRENT=0
PAGE_SIZE=10

# Get a list of URLS for all the videos

echo -n "" > "${DIR}/urls.txt"

while (( CURRENT < TOTAL )); do
    FILE="${DIR}/videos-page-${CURRENT}.json"

    curl -s \
        "${OLD_SERVER}/api/v1/video-channels/${OLD_CHANNEL}/videos/?start=${CURRENT}&count=${PAGE_SIZE}&skipCount=true" \
        > "${FILE}"

    jq ".data | map(.uuid) | .[]" -r < "${FILE}" >> "${DIR}/urls.txt"

    CURRENT=$((CURRENT + PAGE_SIZE))
done

# Log in to the new server

client_id=$(curl -s "${API_PATH}/oauth-clients/local" | jq -r ".client_id")
client_secret=$(curl -s "${API_PATH}/oauth-clients/local" | jq -r ".client_secret")
token=$(curl -s "${API_PATH}/users/token" \
  --data client_id="${client_id}" \
  --data client_secret="${client_secret}" \
  --data grant_type=password \
  --data response_type=code \
  --data username="${USERNAME}" \
  --data password="${PASSWORD}" \
  | jq -r ".access_token")

# Download and upload each video

tac "${DIR}/urls.txt" \
    | while read ID; do
        URL="${OLD_SERVER}/api/v1/videos/${ID}"
        curl -s "${URL}" > "${DIR}/info-${ID}.json"
        NAME=$(jq -r .name < "${DIR}/info-${ID}.json")
        CATEGORY=$(jq -r .category.id < "${DIR}/info-${ID}.json")
        LICENCE=$(jq -r .licence.id < "${DIR}/info-${ID}.json")
        LANGUAGE=$(jq -r .language.id < "${DIR}/info-${ID}.json")
        PRIVACY=$(jq -r .privacy.id < "${DIR}/info-${ID}.json")

        OLD_VIDEO="https://peertube.social/videos/watch/${ID}"
	mkdir "${DIR}/dl-${ID}"
	youtube-dl "${OLD_VIDEO}" --output="${DIR}/dl-${ID}/dl.mp4"

        echo "Uploading ${OLD_VIDEO}"
        curl "${API_PATH}/videos/upload" \
	  --silent \
          -H "Authorization: Bearer ${token}" \
          --output "${DIR}/curl-out-${ID}.txt" \
          --max-time 6000 \
	  --form name="${NAME}" \
	  --form videofile=@"${DIR}/dl-${ID}/dl.mp4" \
          --form channelId=${NEW_CHANNEL} \
          --form category=${CATEGORY} \
          --form licence=${LICENCE} \
          --form description="TODO VIDEO DESCRIPTION" \
          --form language=${LANGUAGE} \
          --form privacy=${PRIVACY} \
          --form tags="${TAG1}" \
          --form tags="${TAG2}" \
          --form tags="${TAG3}"
    done

Matrix is the only (chat) game in town

On my phone and computer I use WhatsApp, Signal, Slack, Keybase, Discord, IRC, XMPP/Jabber and Element/Matrix. In addition, I occasionally use the messaging features of Mastodon, Twitter and even LinkedIn. I’ve never used Telegram, Line, WeChat, Session, Wire or Status chat, but they exist too, along with many others.

It would be better if I could chat with people using the app I prefer, rather than the one I am forced to use.

Of course, the only useful chat app is the one your friends and family are on, so it’s pointless to debate the finer points in each service’s favour, but here I go anyway.

Only Matrix is:

The importance of decentralisation has been re-emphasised for me this week after the freenode IRC debacle. A single controlling entity, even when it is currently benign (as some people believe Signal is) is not a guarantee that things will stay this way. Thank goodness you can connect your usual IRC network to libera.chat: imagine what would happen to Signal users if they realised someone unscrupulous had acquired control.

Matrix does not solve all our problems. Notably:

  • Its security is probably not good enough for people threatened by powerful interests – at the moment it’s quite easy to see who’s talking to whom, and when.
  • Not all clients support end-to-end encryption, and not all turn it on by default (but the most-used ones do).

Despite these limitations, Matrix is the only chat network that is even attempting to provide what users need, and it seems to be doing a pretty good job of it.

I think we should work together to address its weaknesses, and adopt it wherever we can.

So, I recommend Matrix (specifically element.io) for group and individual chat.

Suspending the computer using Kupfer

I have recently started using Kupfer again as my application launcher in Ubuntu MATE, and I found it lacked the ability to suspend the computer.

Here is the plugin I wrote to support this.

To install it, quit Kupfer, create a directory in your home dir called .local/share/kupfer/plugins, and create this file suspend.py inside:

__kupfer_name__ = _("Power management")
__kupfer_sources__ = ("PowerManagementItemsSource", )
__description__ = _("Actions to suspend the computer")
__version__ = "2021-05-05"
__author__ = "Andy Balaam "


from kupfer.plugin import session_support as support


class Suspend (support.CommandLeaf):
    def __init__(self, commands):
        support.CommandLeaf.__init__(self, commands, "Suspend")
    def get_description(self):
        return _("Suspend the computer")
    def get_icon_name(self):
        return "system-suspend"


class PowerManagementItemsSource (support.CommonSource):
	def __init__(self):
		support.CommonSource.__init__(self, _("Power management"))
	def get_items(self):
		return (Suspend((["systemctl", "suspend"],)),)

# Copyright 2021 Andy Balaam, released under the MIT license.

Now restart Kupfer, go to Preferences, Plugins, and tick “Power management”.

You should now see a “Suspend” item if you search for it in the Kupfer interface.

Inspired by: Mate Session Management – Kupfer Plugin.

Reference docs: Kupfer Plugin API

Uploading to PeerTube from the command line

PeerTube’s API documentation gives an example of how to upload a video, but it is missing a couple of important aspects, most notably how to provide multiple tags use form-encoded input, so my more complete script is below. Use it like this:

# First, make sure jq is installed
echo "myusername" > username
echo "mypassword" > password
./upload "video_file.mp4"

Downsides:

  1. Your username and password are visible via ps to users on the same machine (tips to avoid this are welcome)
  2. I can’t work out how to include newlines in the video description (again, tips welcome)

You will need to edit the script to provide your own PeerTube server URL, channel ID (a number), video description, tags etc. Output and errors from the script will be placed in curl-out.txt. Read the API docs to see what numbers you need to use for category, license etc.

Here is upload:

#!/bin/bash

set -e
set -u

USERNAME="$(cat username)"
PASSWORD="$(cat password)"
FILE_PATH="$1"
CHANNEL_ID=MY_CHANNEL_ID_EG_1234
NAME="${FILE_PATH%.*}"
NAME="${NAME#*/}"

API_PATH="https://MY_PEERTUBE_SERVER_URL/api/v1"
## AUTH
client_id=$(curl -s "${API_PATH}/oauth-clients/local" | jq -r ".client_id")
client_secret=$(curl -s "${API_PATH}/oauth-clients/local" | jq -r ".client_secret")
token=$(curl -s "${API_PATH}/users/token" \
  --data client_id="${client_id}" \
  --data client_secret="${client_secret}" \
  --data grant_type=password \
  --data response_type=code \
  --data username="${USERNAME}" \
  --data password="${PASSWORD}" \
  | jq -r ".access_token")

echo "Uploading ${FILE_PATH}"
curl "${API_PATH}/videos/upload" \
  -H "Authorization: Bearer ${token}" \
  --output curl-out.txt \
  --max-time 6000 \
  --form videofile=@"${FILE_PATH}" \
  --form channelId=${CHANNEL_ID} \
  --form name="$NAME" \
  --form category=15 \
  --form licence=7 \
  --form description="MY_VIDEO_DESCRIPTION" \
  --form language=en \
  --form privacy=1 \
  --form tags="TAG1" \
  --form tags="TAG2" \
  --form tags="TAG3" \
  --form tags="TAG4"

Republishing Bartosz Milewski’s Category Theory lectures

Category Theory is an incredibly exciting and challenging area of Maths, that (among other things) can really help us understand what programming is on a fundamental level, and make us better programmers.

By far the best explanation of Category Theory that I have ever seen is a series of videos by Bartosz Milewski.

The videos on YouTube have quite a bit of background noise, and they were not available on PeerTube, so I asked for permission to edit and repost them, and Bartosz generously agreed! The conversation was in the comments section of Category Theory 1.1: Motivation and Philosophy and I reproduce it below.

So, I present these awesome videos, with background noise removed using Audacity, for your enjoyment:

Category Theory by Bartosz Milewski

Permission details:

Andy Balaam: Utterly brilliant lecture series.  Is it available under a free license?  I'd like to try and clean up audio and repost it to PeerTube, if that is permitted. Bartosz Milewski: You have my permission. I consider my lectures public domain.

Andy Balaam: Utterly brilliant lecture series. Is it available under a free license? I’d like to try and clean up audio and repost it to PeerTube, if that is permitted.
Bartosz Milewski: You have my permission. I consider my lectures public domain.

Announcing I-DUNNO 1.0 and web-i-dunno

It’s hard to believe it’s already a year since the release of RFC 8771 (The Internationalized Deliberately Unreadable Network NOtation), which for me at least made me think about IP addresses in a whole new way.

So, it seems fitting for the anniversary to be able to release proper support for this standard in the Rust universe, with Rust I-DUNNO version 1.0 released. You can find it on Rust’s crates.io at crates.io/crates/i-dunno and the API documentation is at docs.rs/i-dunno.

Also, because for a standard like this to receive the wide adoption it deserves, it’s important that young people have a chance to interact with it, playing with encodings to get a real feel for what it’s like to use in practice, I’m proud to announce the I-DUNNO Creator. On that page you can enter an IP address (IPv4 or IPv6) and see it transformed immediately into a candidate I-DUNNO, with live information about the Confusion Level of the I-DUNNO, as specified in the standard. You can find the source code for the I-DUNNO Creator in the web-i-dunno repo.

The I-DUNNO Creator is built on the Rust package, making use of Rust’s highly-developed WASM support to compile the code into a form that works naturally in a web browser.

I hope that by offering both systems programmers and the young people of today and their new-fangled web sites the opportunity to create I-DUNNOs, I can contribute a little to spreading the word about deliberately unreadable notations to new audiences.

Note: the current implementation is limited to generate only I-DUNNOs with no padding bits. As specified in the standard, I-DUNNOs may end with arbitrary padding, and adding this functionality to rust-i-dunno is left as an exercise for the reader: merge requests welcome!

Automatically filling in the UK COVID test results page with Selenium IDE

Lots of people are filling in the extremely detailed UK government COVID test result page twice every week.

It asks you to fill in a very large list of details, most of which are the same every time, but it doesn’t remember what you typed last time.

I didn’t want to write a Python script or similar to enter my results, because I wanted to check I’d done it right, and because there is a captcha at the end that is clearly intended to prevent automation like that.

However, with a Selenium IDE script, I can drive my browser, watching what it does and checking the input, and manually filling in the captcha and final double-check page.

In case it’s helpful, here is the script I created: report-covid-test.side.

You can create one for each child if you have several, filling in school name, NHS number, names, date of birth etc. in the script and re-using it, modifying it each time to enter the bar code number for the test itself.

To use it you’ll need the Selenium IDE plugin for firefox, or Selenium IDE plugin for another browser.

I’d recommend loading this script into the Selenium IDE plugin in Firefox, looking through it and editing the values that say “ENTER…HERE”, then clicking Run Script and watching it fill in values.

It doesn’t actually submit the result, so you can always check and modify it manually if something doesn’t work out, before clicking the last couple of buttons to submit.

Toggle window decorations on Linux GTK3 with Python3

The Internet is full of outdated Python code for doing things with windows, so here is what I got working today in a Python 3, GTK 3 environment.

This script toggles the window decorations on the active window on and off. I have it bound to Ctrl+NumPadMinus for easy access.

#!/usr/bin/env python3

import gi
gi.require_version('Gdk', '3.0')
gi.require_version('GdkX11', '3.0')
gi.require_version('Wnck', '3.0')
from gi.repository import Gdk
from gi.repository import GdkX11
from gi.repository import Wnck


def active_window(screen):
    for window in screen.get_windows():
       if window.is_active() == True:
            return window


def toggle_decorations(w):
    if w.get_decorations().decorations == 0:
        w.set_decorations(Gdk.WMDecoration.ALL)
    else:
        w.set_decorations(0)


screen = Wnck.Screen.get_default()
screen.force_update()
display = GdkX11.X11Display.get_default()
window = active_window(screen)
window_id = window.get_xid()

w = GdkX11.X11Window.foreign_new_for_display(display, window_id)
toggle_decorations(w)


window = None
screen = None
Wnck.shutdown()

Questions about RFC 8771

During my work on RFC 8771 The Internationalized Deliberately Unreadable Network NOtation (I-DUNNO) I have come across a number of questions. I am documenting them here so I can send them to the authors and try to improve my understanding of the intention.

This is an excellent RFC, and I thank the authors for their efforts in creating it.

1. Non-printable characters

In 4.2. Satisfactory Confusion Level, the RFC states that encodings may be deemed Satisfactory if they contain ‘At least one non-printable character’ (as well as one other condition in this section).

Both of the existing implementations that I know of ( and ) interpret “printable” to mean the same as Python’s isprintable() function: that is Nonprintable characters are those characters defined in the Unicode character database as “Other” or “Separator”, excepting the ASCII space (0x20) which is considered printable.

However, the definition of this function may be rather Python-specific, since its intention appears to be related to language internals like the repr function.

It would be helpful to find out exactly what is meant by “non-printable character” in the RFC.

2. Are Modifier Symbols, Symbols?

Also in section 4.2, the RFC mentions ‘”A character from the “Symbol” category’.

The Python implementation excludes Modifier Symbols from its definition. I believe this is incorrect, and have logged an issue on the topic: Some Symbol characters not recognised as such.

It would be helpful to have clarification on this point.

3. What does “different directionalities” mean?

Unicode classifies characters into several Bidi_Classes (for example, U+CED6E is Left_To_Right). In section 4.3. Delightful Confusion Level, the RFC refers to ‘Characters from scripts with different directionalities’.

As far as I can see there are two possible interpretations of this phrase:

  1. The encoding should contain characters from at least two different Bidi_Classes, or
  2. The encoding should contain characters that are both left-to-right and right-to-left in direction, either weakly or strongly.

Both current implementations interpret this statement like number 1, but I suspect the intention may actually be something more like number 2.

If number 2 was meant, I think that would mean ignoring characters with Neutral directionality, and treating weakly directional characters as the same directionality as strongly directional ones.

4. What is a Confusable character?

Section 4.3 mentions ‘Character classified as “Confusables”‘. Both implementations interpret this quite loosely, as meaning something like ‘the encoding contains any character or substring which might be confused with any other character or substring’.

This means a lot of “normal” characters are included: all of the ASCII digits, and many of the Latin letters.

Was this is intention?

That’s all my questions. It has been great fun working on this RFC.

Announcing Rust I-DUNNO

At the ACCU Conference last week I learned about RFC 8771 The Internationalized Deliberately Unreadable Network NOtation (I-DUNNO) from Jim Hague, and thought it would be fun to knock up a Rust implementation.

The project is here: gitlab.com/andybalaam/rust-i-dunno and the docs are published at https://docs.rs/i-dunno.

It’s not done yet, but encoding an IP address as I-DUNNO appears to be working:

$ i-dunno 216.58.205.46
lYԮ

$ i-dunno 216.58.205.46 | hexdump -C
00000000  db 81 6b 1a 2e 0a                                 |..k...|

Decoding is still to be done.

The implementation is seriously slow at the moment, so I am looking forward to improving it.

I am hoping it is reasonably correct – I based it on the existing Python I-DUNNO implementation and in the process found several potential bugs in that, and created some merge requests to fix bugs and help with testability.

Speaking of testability, I am building up a collection of test cases that could be a potential resource for other implementors, and would welcome suggestions of how this could be shared between projects. The examples so far were generated using the Python implementation, and then manually corrected where I found bugs in that, so I do not have 100% confidence that they are correct.

Anyway, have a play, and send patches and feedback!

Making Smolpxl work on phones and tablets

I’ve added the first features intended to make Smolpxl games work well on touch interfaces like phones and tablets:

Spring game with touch controls

I’ve added a button bar at the bottom (and moved the navigation buttons to the top).

I’m looking for feedback on this:

  • Does it work on your device?
  • Are the buttons the right size?
  • Do they look ok? If not, how could they look better?
  • For games that require arrow keys, do you need them in the normal arrow-keys layout, or is a simple row fine?

Duckmaze game with touch controls in a single row

If you’re writing a game and you want to add buttons like this, you just need to add a single line like this:

game.showControls(["MENU", "SELECT", "BUTTON1", "BUTTON2"]);

or this:

game.showControls(["MENU", "SELECT", "LEFT", "DOWN", "UP", "RIGHT"]);

and they should appear.

Recommendation against the use of WhatsApp in your company

Here is the email I just sent to the organisation I volunteer for. Feel free to adapt and use in your context.

Dear [organisation leaders],

Much of the tech industry (e.g. [1]) is warning against the use of WhatsApp due to its policy of collecting and sharing user information with third parties and the poor track record of its parent company (Facebook) on ethical issues (see examples [2] and [3], and many more).

The situation was made considerably worse with a recent change to the WhatsApp terms and conditions [4].

So, as your IT person I recommend not using WhatsApp for our work.

We already have an alternative available, and I would be really happy to help anyone who needs help setting it up.

[Details here of the alternative we use (Zulip) and how to use it. The simplest alternative to recommend is Signal.]

Thanks, Andy

[1] What Facebook and WhatsApp’s Data Sharing Plans Really Mean for User Privacy

[2] Facebook experimented with modifying people’s moods

[3] Facebook paid teens for total access to their phone activity

[4] If you’re a WhatsApp user, you’ll have to share your personal data

Streaming video with Owncast on a free Oracle Cloud computer

I just streamed about 40 minutes of me playing Trials Fusion using Owncast. Owncast is a self-hosted alternative to streaming services like Twitch and YouTube live.

Normally, you would need to pay for a computer to self-host it on. Owncast suggest this will cost about $5/month.

But, Oracle Cloud has a “Always Free” tier that includes a “Compute Instance” (a virtual machine running Linux) that is capable of running Owncast.

Here’s how I did it:

Register for Oracle Cloud

This was probably the worst bit.

I went to oraclecloud.com and clicked “Sign up for free cloud tier”. It didn’t work in Firefox(!) so I had to use Chromium.

I had to enter my name, address, email address, phone number and credit card details. The email was verified, the phone number was verified (with a text message), and the credit card was verified (with a real transaction), so there was no getting around any of it.

They promise that they won’t charge my card. I’ll let you know if I discover differently.

Create a Compute Instance

Once I was logged in to the Oracle “console” (web site), I clicked the burger menu in the top left, chose “Compute” and then “Instances” to create a new instance. I followed all the default settings (including using the default “image”, which meant my instance was running Oracle Linux, which I think is similar to Red Hat), and when I got to the ssh keys part, I supplied the public key of my existing SSH key pair. Read the docs there if you don’t have one of these.

As soon as that was done, and I waited for the instance to be created and started, I was able to SSH in to my instance using a username of opc and the Public IP Address listed:

ssh opc@PUBLIC_IP

(Note: here and below, if I say “PUBLIC_IP”, I mean the IP address listed in the information about your compute instance. It should be a list of four numbers separated by dots.)

Allow connecting to the instance on different ports

Owncast listens for HTTP connections on port 8080, and RTMP streams on 1935, so I needed to do two things to make that work.

Modify the Security List to add Ingress Rules

  • On the information about my instance, I clicked on the name of the Subnet (under Primary VNIC).
  • In the subnet, I clicked the name of the Security List (“Default Security List for …”) in the Security Lists list.
  • In the Security List I clicked Add Ingress Rules and entered:
    Stateless: unchecked
    Source Type: CIDR
    Source CIDR: 0.0.0.0/0
    IP Protocol: TCP
    Source Port Range: (blank)
    Destination Port Range: 8080
    Description: (blank)

    and then clicked Add Ingress Rules to create the rule.

  • I then added another Ingress Rule that was identical, except Destination Port Range was 1935.

Allow ports 8080 and 1935 on the instance’s own firewall

It took me a long time to figure out, but it turns out the Oracle Linux running on the Compute Instance has its own firewall. Eventually, thanks to a blog post by meinside: When Oracle Cloud’s Ubuntu instance doesn’t accept connections to ports other than 22, and some Oracle docs on ways to secure resources, I found that I needed to SSH in to the machine (like I showed above) and run these commands:

sudo firewall-cmd --zone=public --permanent --add-port=8080/tcp
sudo firewall-cmd --zone=public --permanent --add-port=1935/tcp
sudo firewall-cmd --reload

Now I was able to connect to the services I ran on the machine on those ports.

Install Owncast

The Owncast install was incredibly easy. I just followed the instructions at Owncast Quickstart. I SSHd in to the instance as before, and ran:

curl -s https://owncast.online/install.sh | bash

and then edited the file owncast/config.yaml to have a custom stream key in it. You can do that by typing:

nano owncast/config.yaml

There is information about this file at: owncast.online/docs/configuration.

Run Owncast

I ran the service like this:

cd owncast
./owncast

In future, if I want to leave it running, I may run it inside screen, or even use systemd or similar.

Open the web site

I could now see the web site by typing this into my browser’s address bar:

http://PUBLIC_IP:8080

(Where PUBLIC_IP is the Public IP copied from the Instance info as before.)

Stream some video

Finally, in OBS‘s Settings I chose the Stream section and entered:

Service: Custom...
Server: rtmp://PUBLIC_IP/live
Stream key: STREAM_KEY

Where “STREAM_KEY” means the stream key I added to config.yaml earlier.

Now, when I clicked “Start Streaming” in OBS, my stream appeared on the web site!

Costs and limits

Oracle stated during sign-up that I would not be charged unless I explicitly chose to use a different tier.

The Compute Instance is part of the “Always Free” tier, so in theory it should stay up and working.

However, if you use lots of resources (which streaming for a long time probably does), I would expect services would be throttled and/or stopped completely. I have no idea whether they will allow enough resources for regular streaming, or whether this is all waste of time. We shall see.

Pinephone update

I got a Pinephone for Christmas!

Here is quick summary of my experience with it. (Originally published on mastodon.)


Update on the pinephone as promised.

I love it, but I would definitely not recommend expecting to use it as your actual phone.

I have the Manjaro Phosh edition. Phosh is GNOME customised for mobile.

It turns on, you can unlock it, and you get a launcher. It has apps, and some of them work.

Firefox works really well. I can use it for Youtube and loads of other sites. I installed uBlock Origin, and it works.

Adding my Nextcloud config to Phosh seamlessly gave me Calendar, Contact and TODO list apps working, with my data in them.

The Maps app found me easily via GPS. I could bring up directions by entering a from and to, but it didn't seem to want to guide me via GPS.

Several apps don't fit properly on screen, and there doesn't seem to be a way to scroll or move the windows.

The camera technically works but the picture looks terrible (squashed, wibbly and blue-coloured).

Scrolling around on the launcher updates at about 5-10 fps, which is fine but would put many people off.

Many of the apps available to install in the Software app don't really work. I assume the list of apps is the standard for GNOME or Manjaro, so many are not adapted for phones.

I _love_ the fact that all the work that has been put into desktop Linux can be re-used on phones. Why wasn't it always this way?

It's great to be able to buy hardware that is specifically designed to run properly free software.

The Terminal app works nicely and presents a keyboard with extra keys that you need in a terminal.

The settings app works nicely.

My biggest frustration was not being able to find software in the Software app that worked nicely.

I was looking for a Youtube app that protected my privacy. On Android I use NewPipe Legacy. On desktop I use Freetube. I couldn't find Freetube in Software. I tried Minitube but it was unusable (window didn't fit).

I haven't tried installing software from the command line. Maybe I can find (or build) Freetube via a Manjaro repo?

Or maybe I should investigate NewPipe Legacy via anbox, although that seems to miss the point a little :-)

Is your program a function or a service?

Maybe everyone knows this already, but for my own clarity, I think there are really two types of computer program:

  • A function: something that you run, and get back a result. Example: a command-line tool like ls
  • A service: something that sits around waiting for things to happen, and responds to them. Example: a web server

How functions work

Programs that are essentially functions should:

  • Validate their input and stop if it is wrong
  • Stop when they have finished their job*
  • Let you know whether they succeeded or failed

*The Halting Problem shows that you can’t prove they stop, so I won’t ask you to do that.

Writing functions is relatively easy.

How services work

Programs that are services should:

  • Start when you tell them to start, even when things are not right
  • Keep running until you tell them to stop, even when bad things happen
  • Tell the user about problems via some communication mechanism

Writing services seems a little harder than writing functions.

What about UIs?

I suggest that programs with UIs are just a special case of services. Do you agree?

What about let-it-crash?

I think that let-it-crash is a good way to build services, but when you build a service that way, I consider the whole system to be the real service: this means the code we are writing, plus the runtime. In this case, the runtime is responsible for keeping the service running (by restarting it), and telling the user about problems.

In effect, let-it-crash allows us to write programs that look like functions (which I claim is easier), and still have them behave like services, because the runtime does the extra work for us. Erlang seems like a good example of this.

What are the implications?

If you are writing a service, your program should start when asked, and keep running until it is asked to stop, even if things are bad.

For example:

  • a service that relies on a data source should keep running when that data source is unavailable, and emit errors saying that it is unable to work. It should start working when the data source becomes available. (Again, if you implement this behaviour by using a runtime that allows you to write in a let-it-crash style, good for you.)
  • a service that relies on the existence of a directory should probably create that directory if it doesn’t exist.
  • a service that needs config might want to start up with sane defaults if the config is not supplied. Or maybe it should complain loudly and poll for the file to be created?

Why not stop when things are wrong?

  • Using this approach, it doesn’t matter the order of starting services. The more services we have, the more painful it is to have an order we must follow.
  • It’s nice when things are predictable. We expect services to keep running under normal circumstances. Using this approach, our expectations are not wrong when things go wrong.

What are the down sides?

  • You must pay attention to the error reporting coming from running services – they may not be working.
  • Services will still stop, due to bugs, or at least due to hardware failures, so you still have to pay attention to whether services are running.

More: 12 Fractured Apps

Shutdown order consistency: how Rust helps

Some Java code with bugs

Here’s my main method (in Java). Can you guess the bug?

Db db = new Db();
Monitoring monitoring = new Monitoring();
Monitoring mon2 = new Monitoring();
Billing billing = new Billing(db, monitoring);
monitoring.setDb(db);

runMainLoop(billing, mon2);

db.stop();
billing.stop();
monitoring.stop();

If you would like to hunt down the 2 bugs manually, try reading the full code here: ShutdownOrder.java

But maybe you have an idea already? Maybe you’ve seen code like this before? If you have, you probably have an instinct that there’s some kind of bug, even if you can’t say for sure what it is. Code like this almost always has bugs!

This code compiles fine, but it contains two bugs.

First, we forgot to setDb() on mon2. This causes a NullPointerException, because Monitoring expects always to have a working Db.

Second, and in general harder to spot, we shut down our services in the wrong order. It turns out that Monitoring uses its Db during shutdown, so we get an exception. Even worse, if some other code needed to run after monitoring.stop(), it won’t, because the exception prevents us getting any further.

Of course, this is toy code, but this kind of problem is common (and much harder to spot) in real-life code. In fact, my team dealt with a similar bug this week.

It’s fundamentally hard to figure out your shutdown order. It’s complicated further if classes have start() methods too, which I have seen in lots of Java code.

Given that this is just a hard problem, maybe there’s no point looking for tools to make it easier?

Some Rust code without those bugs

Let’s try writing this code in Rust. Here’s the main method:

let db = Db::new();
let monitoring = Monitoring::new(&db);
let mon2 = Monitoring::new(&db);
let billing = Billing::new(&db, &monitoring);

run_main_loop(&billing, &mon2);

// drop() is called automatically on all objects here

Here’s the full code: shutdown_order.rs

This code shuts down all the services automatically at the end, and any mistakes we make in the order are compile errors, not things we find later when our code is running.

The code to shut down each service looks like this:

impl Drop for Monitoring<'_> {
    fn drop(&mut self) {
        // [Disconnect from monitoring API]
        self.db.add_record("MonitorShutDown");
    }
}

This is us implementing the Drop trait for the struct Monitoring (traits are a bit like Java Interfaces). The Drop trait is special: it indicates what to do when an instance of this struct is dropped. In Rust, this is guaranteed to happen when the instance goes out of scope, which is why our comment at the end of the main method sounds so confident.

Furthermore, Rust’s compiler shuts down everything in the reverse order in which it was created, and guarantees that nothing gets used after it has been dropped.

Rust’s lovely world gives us two relevant treats: no unexpected nulls, and lifetimes.

Treat number 1: no unexpected nulls

First, in Rust, like in other modern languages like Kotlin, we have to be explicit about items that could be missing. In our example, we were able to re-arrange the code so that db can never be missing (or null), and the compiler encouraged us to do so. If we really needed it to be missing some of the time, we could have used the Option type, and the compiler would have forced us to handle the case when it was missing, instead of unexpectedly getting a NullPointerException like we did in Java. (In fact, if we’d structured our code to use final in as many places as possible, we could have been encouraged towards basically the same solution in Java too.)

Treat number 2: lifetimes

Second, if you look a bit more closely at the full code of shutdown_order.rs you’ll see lots of confusing-looking annotations like <'a> and &'a:

struct Monitoring<'a> {
    db: &'a Db,
}

The approximate meaning of those annotations is: a Monitoring holds a reference to a Db, and that Db must last longer than the Monitoring.

This “lasts longer than” wording is what Rust Lifetimes are for. Lifetimes are a way of saying how long something lasts.

Lifetimes are really confusing when you start with Rust, and have caused me a lot of pain. Code like this is where they are both most painful and most helpful. As I mentioned earlier, the problem of shutdown order is fundamentally hard. Rust gives you that pain at the beginning, and until you understand what’s going on, the pain is very confusing and acute. But, once your code compiles, it is correct, at least as far as problems like this are concerned.

I love the sense of security it gives me to write Rust code and know the compiler has checked my code for this kind of problem, meaning it can’t crop up at 3am on Christmas Day…

Final note/caveat

This Rust code is probably over-simplified, because all the references are immutable (you can’t change the objects they point to). In practice, we may well have mutable references, and if we do we’re going have to deal with the further difficulty that Rust won’t allow two different objects to hold references to an object if any of those references are mutable. So it would object to Billing and Monitoring using the Db object at the same time. We’d need to make it immutable (as we have here), or find a different way of structuring the code: for example, we could hold the Db instance only within the run_main_loop code, and pass it in temporarily to the Billing and Monitoring objects when we called their methods. A large part of the art, fun and pain of learning Rust is finding new patterns for your code that do what you need to do and also keep the compiler happy. When you manage it, you get amazing benefits!

Edge computing providers

I’m looking into Edge computing at work. By Edge computing I mean running WASM programs in lots and lots of smallish computers in places near to actual people (rather than in huge cloud data centres). I think it’s cool because I love Rust, and Rust is the leading language to compile to WASM.

Here are some companies providing Edge computing services:

  • Fastly – good links with WASM community (hired Mozilla devs), and early adopters – custom WASM engine wasmtime.
  • Cloudflare – huge, and early adopters – WASM engine is Google V8.
  • AWS Lambda@Edge – docs are light on detail, but it looks like a real offering, probably.

Also-rans:

Who did I miss?

Schema upgrades should be reversible (also other transformations, actually)

Are you writing schema upgrade code? Then I humbly suggest you take the time to write schema downgrade code too.

“Why would I do that?” you might well ask, “I won’t ever need to downgrade.”

Now, I imagine you’re expecting me to say you actually will need to downgrade, but that isn’t what I’m saying.

Can you please get on with what you are actually saying?

Whevener you write code to transform something, be it a schema upgrade, some serialisation, or something else, I would highly recommend that you write code to transform it in both directions.

Reasons:

  • It makes testing easier. The best kinds of tests for things like this are round-trips, where you transform something in both directions and check it hasn’t changed. It’s really hard to mess up tests like that.
  • It often uncovers bugs, because it enforces clear thinking about what the transformation actually means.
  • It may improve your code, because it gets annoying writing similar-but-different code to transform in both directions, so you are pushed towards some kind of abstraction.

Also:

  • You almost certainly are going to need it. Sometimes things go wrong and you need to back up.
  • It will be incredibly useful for testing other parts of your code.

Bidirectional scheme up/downgrades are not easy in SQL, but probably worth it. If you’re writing transformation code in a normal programming language, it’s really not that difficult, and I predict it will be worth it.

Announcing Smolpxl Scores – a high score table for your game

It’s a very early beta for now, but I’m ready to announce Smolpxl Scores, which provides high-score tables for Free and Open Source games.

Each game can have multiple high-score tables – for example, you might want one for each level.

At the moment it’s deployed in my own web hosting and therefore written using the technologies that are most convenient for me to deploy there, which is PHP+MySQL. If it becomes more widely used and the performance suffers I guess I’ll ask for donations to host it somewhere else, and use more fashionable technologies.

To add a score you make a POST request like this:

curl https://scores.artificialworlds.net/api/v1/myappname/mytablename/ -d \
    '{"appId":"myappid","name":"Megan Tria", "score": 13.5, "notes": ""}'

and to look at some existing scores you can request them by pages:

curl 'https://scores.artificialworlds.net/api/v1/myappname/mytablename/?startRank=11&num=20'

or by name:

curl 'https://scores.artificialworlds.net/api/v1/myappname/mytablename/?startName=David%20Lloyd%20Geo&offset=-5&num=10'

The results are ordered by players’ scores, and are provided as JSON.

Each table stores only one score per player.

Of course, the API will evolve over time, but I hope that what I have now will be good enough to support some real-life games, and provide enough feedback to make it better.

As soon as people are actually using it, I will ensure the current API version (v1) remains stable, and release any incompatible updates as later versions.

If you’d like to use Smolpxl Scores to add a high-score table to your game, please create an issue at gitlab.com/smolpxl/smolpxl-scores/-/issues.

This service is only available to Free and Open Source games. Also, if someone abuses it (accidentally or on purpose) I will talk to them, and may eventually have to remove their access if we can’t fix the problem.

Dovecot not working after upgrade to Ubuntu 20.04.1 (dh key too small)

I upgraded to Ubuntu 20.04.1 and chose to keep my existing config files, and my mail server stopped working. In the log I saw:

Nov 25 09:07:57 machine dovecot: imap-login: Error: Failed to initialize SSL server context: Can't load DH parameters: error:1408518A:SSL routines:ssl3_ctx_ctrl:dh key too small: user=<>, rip=someip, lip=someip, session=<someid>

I was able to fix this by modifying /etc/dovecot/conf.d/10-ssl.conf and adding this line:

ssl_dh = </usr/share/dovecot/dh.pem

Please let me know if I’ve introduced an horrific security bug, won’t you?

Play and create little retro games at Smolpxl

I love simple games: playing them and writing them.

But, it can be overwhelming getting started in the complex ecosystems of modern technology.

So, I am writing the Smolpxl library, which is some JavaScript code that makes it quite simple to write simple, pixellated games. It gives you a fixed-size screen to draw on, and it handles your game loop and frames-per-second, so you can focus on two things: updating your game “model” – the world containing all the things that exist in your game, and drawing onto the screen.

I am also writing some games with this library, and publishing them at smolpxl.artificialworlds.net:

I am hoping this site will gradually gain more and more Free and Open Source games, and start to rival some of the advertising-supported sites for the attention of casual gamers, especially kids.

Smolpxl is done for fun, and for its educational value, so it should be a safer place for kids than a world of advertising, loot boxes and site-wide currencies.

As I write games, I want to show how easy and fun it can be, so I will be streaming myself live doing it on twitch.tv/andybalaam, and putting the recordings up on peertube.mastodon.host/accounts/andybalaam and youtube.com/user/ajbalaam.

I am hoping these videos will serve as tutorials that help other people get into writing simple games.

Would you like to help? If so:

shareon.js.org now has a Share to Mastodon button

I was looking for the right way to make a “Share This”-style button for my tiny games site Smolpxl, and I found shareon which worked exactly the way I wanted (load the JavaScript and call a function to display the buttons, with no privacy concerns), and looked really nice.

The only thing that was missing was a Mastodon button.

“Share to Mastodon” is more complicated than something like Share to Twitter, because Mastodon is not one web site, but lots of web sites that all talk to each other.

So, after someone clicks “Share to Mastodon”, you need to ask them which web site (or Mastodon instance) they meant.

I started out by hacking a Mastodon button in after the shareon ones, and prompting the user for their instance. This was a little unfriendly, but it worked:

Click Share, Mastodon, enter instance URL into ugly browser prompt, and end up at Mastodon

But luckily I didn’t stick with that. Because I think shareon is awesome, and because I want more people to use Mastodon, I decided to try adding a Mastodon button to shareon. I wrote the code to work similarly to my original hack, and submitted a Pull Request.

I am really glad I did that, because what followed was a really positive Free and Open Source Software mini-interaction. Nick Karamov responded with lots of improvements and bug fixes to my original change, and as we discussed the problem more, I expressed the desire for a proper page to choose Mastodon instance, that would be more friendly than a simple prompt. I also said that it would be difficult.

In retrospect, maybe suggesting it would be difficult was a clever trick, because the next thing I knew, Nick had implemented just such a page: toot.karamoff.dev!

Because toot.karamoff.dev now existed, the “Share to Mastodon” button became much simpler: we can send our post information to toot.karamoff.dev, and it asks which Mastodon instance you want to use, and passes it on the correct place.

So my new Pull Request was much simpler than the original, and with a few more improvements suggested by Nick, it’s merged and I have a usable Share to Mastodon button without hacking it in.

The cake has a little more icing too, because I was also able to improve toot.karamoff.dev by adding code that downloads the up-to-date list of Mastodon instances from joinmastodon.org and provides them as suggestions, which can be really helpful if you can’t remember the exact name of your instance. Again, Nick’s suggestions on my Pull Request were incredibly helpful and made the code way better than what I originally wrote. Now it works really smoothly:

Click Share, Mastodon, choose instance from a friendly list on toot, and end up at Mastodon

In a small way, this was a fantastic example of how effective and fun working on Free and Open Source Software can be.

short – command line tool to truncate lines to fit in the terminal

Sometimes I run grep commands that search files with hugely-long lines. If those lines match, they are printed out and spam my terminal with huge amounts of information, that I probably don’t need.

I couldn’t find a tool that limits the line-length of its output, so I wrote a tiny one.

It’s called short.

You use it like this (my typical usage):

grep foo myfile.txt | short

Or specify the column width like this:

short -w 5 myfile.txt

It’s written in Rust. Feel free to add features, fix bugs and package it for your operating system/distribution at gitlab.com/andybalaam/short.

Example Android project with repeatable tests running inside an emulator

I’ve spent the last couple of days fighting the Android command line to set up a simple project that can run automated tests inside an emulator reliably and repeatably.

To make the tests reliable and independent from anything else on my machine, I wanted to store the Android SDK and AVD files in a local directory.

To do this I had to define a lot of inter-related environment variables, and wrap the tools in scripts that ensure they run with the right flags and settings.

The end result of this work is here: gitlab.com/andybalaam/android-skeleton

You need all the utility scripts included in that repo for it to work, but some highlights include:

The environment variables that I source in every script, scripts/paths:

PROJECT_ROOT=$(dirname $(dirname $(realpath ${BASH_SOURCE[${#BASH_SOURCE[@]} - 1]})))
export ANDROID_SDK_ROOT="${PROJECT_ROOT}/android_sdk"
export ANDROID_SDK_HOME="${ANDROID_SDK_ROOT}"
export ANDROID_EMULATOR_HOME="${ANDROID_SDK_ROOT}/emulator-home"
export ANDROID_AVD_HOME="${ANDROID_EMULATOR_HOME}/avd"

Creation of a local.properties file that tells Gradle and Android Studio where the SDK is, by running something like this:

echo "# File created automatically - changes will be overwritten!" > local.properties
echo "sdk.dir=${ANDROID_SDK_ROOT}" >> local.properties

The wrapper scripts for Android tools e.g. scripts/sdkmanager:

#!/bin/bash

set -e
set -u

source scripts/paths

"${ANDROID_SDK_ROOT}/tools/bin/sdkmanager" \
    "--sdk_root=${ANDROID_SDK_ROOT}" \
    "$@"

The wrapper for avdmanager is particularly interesting since it seems we need to override where it thinks the tools directory is for it to work properly – scripts/avdmanager:

#!/bin/bash

set -e
set -u

source scripts/paths

# Set toolsdir to include "bin/" since avdmanager seems to go 2 dirs up
# from that to find the SDK root?
AVDMANAGER_OPTS="-Dcom.android.sdkmanager.toolsdir=${ANDROID_SDK_ROOT}/tools/bin/" \
    "${ANDROID_SDK_ROOT}/tools/bin/avdmanager" "$@"

An installation script that must be run once before using the project scripts/install-android-tools:

#!/bin/bash

set -e
set -u
set -x

source scripts/paths

mkdir -p "${ANDROID_SDK_ROOT}"
mkdir -p "${ANDROID_AVD_HOME}"
mkdir -p "${ANDROID_EMULATOR_HOME}"

# Download sdkmanager, avdmanager etc.
cd "${ANDROID_SDK_ROOT}"
test -f commandlinetools-*.zip || \
    wget -q 'https://dl.google.com/android/repository/commandlinetools-linux-6200805_latest.zip'
unzip -q -u commandlinetools-*.zip
cd ..

# Ask sdkmanager to update itself
./scripts/sdkmanager --update

# Install the emulator and tools
yes | ./scripts/sdkmanager --install 'emulator' 'platform-tools'

# Platforms
./scripts/sdkmanager --install 'platforms;android-21'
./scripts/sdkmanager --install 'platforms;android-29'

# Install system images for our oldest and newest supported API versions
yes | ./scripts/sdkmanager --install 'system-images;android-21;default;x86_64'
yes | ./scripts/sdkmanager --install 'system-images;android-29;default;x86_64'

# Create AVDs to run the system images
echo no | ./scripts/avdmanager -v \
    create avd \
    -f \
    -n "avd-21" \
    -k "system-images;android-21;default;x86_64" \
    -p ${ANDROID_SDK_ROOT}/avds/avd-21
echo no | ./scripts/avdmanager -v \
    create avd \
    -f \
    -n "avd-29" \
    -k "system-images;android-29;default;x86_64" \
    -p ${ANDROID_SDK_ROOT}/avds/avd-29

Please do contribute to the project if you know easier ways to do this stuff.

Creating a tiny Docker image of a Rust project

I am building a toy project in Rust to help me learn how to deploy things in AWS. I’m considering using Elastic Beanstalk (AWS’s platform-as-a-service) and also Kubernetes. Both of these support deploying via Docker containers, so I am learning how to package a Rust executable as a Docker image.

My program is a small web site that uses Redis as a back end database. It consists of some Rust code and a couple of static files.

Because Rust has good support for building executables with very few dependencies, we can actually build a Docker image with almost nothing in it, except my program and the static files.

Thanks to Alexander Brand’s blog post How to Package Rust Applications Into Minimal Docker Containers I was able to build a Docker image that:

  1. Is very small
  2. Does not take too long to build

The main concern for making the build faster is that we don’t download and build all the dependencies every time. To achieve that we make sure there is a layer in the Docker build process that includes all the dependencies being built, and is not re-built when we only change our source code.

Here is the Dockerfile I ended up with:

# 1: Build the exe
FROM rust:1.42 as builder
WORKDIR /usr/src
Creating a tiny Docker image of a Rust project
# 1a: Prepare for static linking
RUN apt-get update && \
    apt-get dist-upgrade -y && \
    apt-get install -y musl-tools && \
    rustup target add x86_64-unknown-linux-musl

# 1b: Download and compile Rust dependencies (and store as a separate Docker layer)
RUN USER=root cargo new myprogram
WORKDIR /usr/src/myprogram
COPY Cargo.toml Cargo.lock ./
RUN cargo install --target x86_64-unknown-linux-musl --path .

# 1c: Build the exe using the actual source code
COPY src ./src
RUN cargo install --target x86_64-unknown-linux-musl --path .

# 2: Copy the exe and extra files ("static") to an empty Docker image
FROM scratch
COPY --from=builder /usr/local/cargo/bin/myprogram .
COPY static .
USER 1000
CMD ["./myprogram"]

The FROM rust:1.42 as build line uses the newish Docker feature multi-stage builds – we create one Docker image (“builder”) just to build the code, and then copy the resulting executable into the final Docker image.

In order to allow us to build a stand-alone executable that does not depend on the standard libraries in the operating system, we use the “musl” target, which is designed to statically linked.

The final Docker image produced is pretty much the same size as the release build of myprogram, and the build is fast, so long as I don’t change the dependencies in Cargo.toml.

A couple more tips to make the build faster:

1. Use a .dockerignore file. Here is mine:

/target/
/.git/

2. Use Docker BuildKit, by running the build like this:

DOCKER_BUILDKIT=1 docker build  .

Keeping track of podcast times with a simple bash script on Linux

I was recording some podcast audio tonight and wanted to be able to press a single key when I reached a significant moment, so I could add the times to the show notes.

I couldn’t find anything that already did this, so I wrote a tiny bash script. I ran this script and pressed Enter whenever I wanted a time recorded:

T=0
echo
while sleep 1; do
    echo -n -e "\e[1A"
    echo $(($T / 60))m $(($T % 60))s
    T=$(($T + 1))
done

The output looks like this:

$ ./times.bash 
0m 41s
6m 16s
9m 59s
13m 30s

The time ticks along, and when you press Enter that time stamp is recorded. You can copy this text out of your terminal to use in show notes. On most terminals you can copy text by selecting it with the mouse and pressing Ctrl-Shift-v.

Custom Bash tab completion for my program

I love Bash tab completion, and I want it for the command I am writing, so it can automatically complete parts of the command line when I run my program.

Code

Here is the script (install-bash-completion) I wrote to set it up (no need to be root – it installs in ~/.local):

#!/bin/bash

set -u
set -e

DIR=${BASH_COMPLETION_USER_DIR:-${XDG_DATA_HOME:-$HOME/.local/share}/bash-completion}/completions

mkdir -p ${DIR}

cp _myprogram ${DIR}

The actual completion script (_myprogram) it installs looks like this:

_myprogram_commands()
{
    local cur prev opts
    COMPREPLY=()
    cur="${COMP_WORDS[COMP_CWORD]}"
    prev="${COMP_WORDS[COMP_CWORD-1]}"
    opts=$(bash -c "./myprogram --commands")

    COMPREPLY=( $(compgen -W "${opts}" -- ${cur}) )
    return 0
}
complete -F _myprogram_commands ./myprogram

Installing

To install it, run:

./install-bash-completion

Then log out and log in again.

Now when you type ./myprogram and press TAB a couple of times, you should see the possible completions listed.

Notes

The completion script must be named to match the program name, with a leading underscore.

If I wanted it to work when it was installed in my PATH, I would need to change ./myprogram to just myprogram in 2 places.

Notice the line opts=$(bash -c "./myprogram --commands") – it actually runs my program to get the list of completions. This means my program needs to accept a --commands option which prints the valid commands. Alternatively, I could have hard-coded it by replacing that line with just:

opts="cmd1 cmd2 --help --etc"

More info:

Struggling with Rust to figure out the right types for a function signature

I am loving writing code in Rust. So many things about the language and its ecosystem feel so right*.

* For example: ownership of objects, expressive type system, compile to native, offline API docs, immutability, high quality libraries.

One of the things I like about it is that I don’t feel like I need to use an IDE, so I can happily code in Vim with no clever plugins.

One thing an IDE might give me would be an “extract function” refactoring. In most languages I am happy to do that manually, because I can let the compile errors guide me on what my function should look like.

However, in Rust I sometimes find it’s hard to find the right signature for a function I want to extract, and I am struggling to persuade the compiler to help me.

Here is an example from my new listsync project, in listsync-client-rust.rs:

use actix_web::{middleware, App, HttpServer};
use listsync_client_rust;
// ...
#[actix_rt::main]
async fn main() -> std::io::Result<()> {
//...
    HttpServer::new(|| {
        App::new()
            .wrap(listsync_client_rust::cookie_session::new_session())
            .wrap(middleware::Logger::default())
            .configure(listsync_client_rust::config)
    })
//...

I would like to extract the code highlighted above, the creation of an App, into a separate function, like this:

fn new_app() -> ??? {
    App::new()
        .wrap(listsync_client_rust::cookie_session::new_session())
        .wrap(middleware::Logger::default())
        .configure(listsync_client_rust::config)
}
//...
    HttpServer::new(|| {
        new_app()
    })

Simple, right? To find out what the return type of the function should be, I can just make a bad guess, and get the compiler to tell me what I did wrong. In this case, I will guess by changing the question marks above into i32, and run cargo test. I get quite a few errors, one of which is:

error[E0277]: the trait bound `i32: actix_service::IntoServiceFactory<_>` is not satisfied
  --> src/bin/listsync-client-rust.rs:27:5
   |
27 | /     HttpServer::new(|| {
28 | |         new_app()
29 | |     })
   | |______^ the trait `actix_service::IntoServiceFactory<_>` is not implemented for `i32`
   |
   = note: required by `actix_web::server::HttpServer`

So the first problem I see is that the error message I am seeing is about the later code, and there are no errors about my new function.

I obviously went a little too fast. Let’s change the HttpServer::new code back to how it was before, and only make a new function new_app. Now I get an error that should help me:

error[E0308]: mismatched types
  --> src/bin/listsync-client-rust.rs:12:5
   |
11 |   fn new_app() -> i32 {
   |                   --- expected `i32` because of return type
12 | /     App::new()
13 | |         .wrap(listsync_client_rust::cookie_session::new_session())
14 | |         .wrap(middleware::Logger::default())
15 | |         .configure(listsync_client_rust::config)
   | |________________________________________________^ expected i32, found struct `actix_web::app::App`
   |
   = note: expected type `i32`
              found type `actix_web::app::App<impl actix_service::ServiceFactory, actix_web::middleware::logger::StreamLog<actix_http::body::Body>>`

So the compiler has told us what type we are returning! Let’s copy that into the type signature of the function:

use actix_service::ServiceFactory;
use actix_http::body::Body;
// ...
fn new_app() -> App<impl ServiceFactory, middleware::logger::StreamLog<Body>> {
// ...

The first error I get from the compiler is a distraction:

error[E0432]: unresolved import `actix_service`
 --> src/bin/listsync-client-rust.rs:1:5
  |
1 | use actix_service::ServiceFactory;
  |     ^^^^^^^^^^^^^ use of undeclared type or module `actix_service`

I can fix it by adding actix-service = "1.0.5" to Cargo.toml. (I found the version by looking in Cargo.lock, since this dependency was already implicitly used – I just need to make it explicit if I am going to use it directly.)

Once I do that I get the next error:

error[E0603]: module `logger` is private
  --> src/bin/listsync-client-rust.rs:13:54
   |
13 | fn new_app() -> App<impl ServiceFactory, middleware::logger::StreamLog<Body>> {
   |                                                      ^^^^^^

This leaves me a bit stuck: I can’t use StreamLog because it’s in a private module.

More importantly, it makes the point that I don’t actually want to be as specific as I am being: I don’t care what the exact type parameters for App are – I just want to return an App of some kind and have the compiler fill in the blanks. Ideally, if I change the body of new_app later, for example to add another wrap call that changes the type of App we are returning, I’d like to leave the return type the same and have it just work.

With that in mind, I took a look at the type that HttpServer::new takes in. Here is impl HttpServer:

impl<F, I, S, B> HttpServer<F, I, S, B> where
    F: Fn() -> I + Send + Clone + 'static,
    I: IntoServiceFactory<S>,
    S: ServiceFactory<Config = AppConfig, Request = Request>,
    S::Error: Into<Error> + 'static,
    S::InitError: Debug,
    S::Response: Into<Response<B>> + 'static,
    <S::Service as Service>::Future: 'static,
    B: MessageBody + 'static, 

and HttpServer::new looks like:

pub fn new(factory: F) -> Self

So it takes in a function which actually makes the App, and the type of that function is F, which is a Fn which returns a I + Send + Clone + 'static. From the declaration of impl HttpServer we can see that the type of I depends on S and B, which have quite complex types. Let’s paste the whole thing in:

use actix_http::{Error, Request, Response};
use actix_service::IntoServiceFactory;
use actix_web::body::MessageBody;
use actix_web::dev::{AppConfig, Service};
use core::fmt::Debug;
// ...
fn new_app<I, S, B>() -> I
where
    I: IntoServiceFactory<S> + Send + Clone + 'static,
    S: ServiceFactory<Config = AppConfig, Request = Request>,
    S::Error: Into<Error> + 'static,
    S::InitError: Debug,
    S::Response: Into<Response<B>> + 'static,
    <S::Service as Service>::Future: 'static,
    B: MessageBody + 'static,
{
    App::new()
        .wrap(listsync_client_rust::cookie_session::new_session())
        .wrap(middleware::Logger::default())
        .configure(listsync_client_rust::config)
}

Note that I had to modify I to include the extra requirements on the return type of F from the definition of HttpServer. (I think I did the right thing, but I’m not sure. If I just remove the + Send + Clone + 'static it seems to behave similarly.)

Now I get this error from the compiler:

error[E0308]: mismatched types
  --> src/bin/listsync-client-rust.rs:27:5
   |
17 |   fn new_app<I, S, B>() -> I
   |                            - expected `I` because of return type
...
27 | /     App::new()
28 | |         .wrap(listsync_client_rust::cookie_session::new_session())
29 | |         .wrap(middleware::Logger::default())
30 | |         .configure(listsync_client_rust::config)
   | |________________________________________________^ expected type parameter, found struct `actix_web::app::App`
   |
   = note: expected type `I`
              found type `actix_web::app::App<impl actix_service::ServiceFactory, actix_web::middleware::logger::StreamLog<actix_http::body::Body>>`
   = help: type parameters must be constrained to match other types
   = note: for more information, visit https://doc.rust-lang.org/book/ch10-02-traits.html#traits-as-parameters

The compiler really tries to help here, suggesting I read a chapter of the Rust Book, but even after reading it I could not figure out how to do what I am trying to do.

Can anyone help me?

Wouldn’t it be amazing if there were some way the compiler could give me easier-to-understand help to figure this out?

Converting HTML slides to a PDF with Firefox

I have not found an automated way to generate a nice PDF from some slides written in HTML – if you know of one please add a comment!

In the meantime, if you create slides using my HTML Slides template, then you can make a decent-ish-looking PDF like this:

  1. View your slides in Firefox and open the Print dialog (press Ctrl-p).
  2. Select Print to File and choose a filename to save to.
  3. Under Options deselect “Ignore Scaling…” and select “–blank–” for all the headers and footers. Ensure “Print Background Colours” and “Print Background Images” are selected.
  4. Under Page Setup set Scale to 70.0 and Orientation to Landscape.
  5. Select the dropdown next to Paper size and choose Manage Custom Sizes…
  6. Create a new custom size called “Screen 16:9” with Width 157.5 mm and Height 280.0 mm. Set all the Paper Margins to 5.0 mm. You can modify the name of the custom size by clicking on it in the list on the left.
    Click Close.
  7. Back in Page Setup, make sure your new custom size is selected next to Paper Size.
  8. Click Print.

If that does not work well for you, try experimenting with different Scale settings.

Support the Software Freedom Conservancy

The Software Freedom Conservancy helps Free/Open Source software projects by providing infrastructure, financial structures, and legal help. It is a not-for-profit organisation that is dedicated to software freedom, something that I think is an important prerequisite for a decent world in the future.

Conservancy looks after lots of projects, including these ones that I personally use: Boost, BusyBox, Etherpad, Git, Godot Engine, Inkscape, phpMyAdmin, QEMU, Racket, Samba, Selenium, Squeak and Wine.

It provides an easy way for projects to accept donations, hold assets and negotiate contracts, as well as mentoring and legal advice. It also leads the difficult and thankless work to persuade (and force) companies to comply with the GPL (a Free Software license). Without this work a great deal more Free Software would be distributed without the required access to source code, meaning it is not free at all.

Further, it is a key supporter of Outreachy, which does important work to address the under-representation, systemic bias, and discrimination in the technology industry.

I encourage you to join me and support the Software Freedom Conservancy.

You might also like to listen to the Free As In Freedom podcast.

Coding workshop example worksheets

This week we did a coding workshop exercise: 2 teams implemented the different sides of the SMPP protocol (without speaking to each other) and this morning we tried out connecting them together.

We successfully sent a message and received an acknowledgement!

It was a lot of fun and I we learned a surprising amount about SMPP (and quite how … interesting … the standard is).

In case they’re useful to anyone, here are the worksheets I made up: Team 1 ODT, Team 1 PDF, Team 2 ODT, Team 2 PDF.

Idea for a team who are less interested in SMPP (!) – try a similar exercise implementing FTP, which is a nice simple text-based protocol. I did this once and found it extremely rewarding.

Building an all-in-one Jar in Gradle with the Kotlin DSL

To build a “fat” Jar of your Java or Kotlin project that contains all the dependencies within a single file, you can use the shadow Gradle plugin.

I found it hard to find clear documentation on how it works using the Gradle Kotlin DSL (with a build.gradle.kts instead of build.gradle) so here is how I did it:

$ cat build.gradle.kts 
import com.github.jengelman.gradle.plugins.shadow.tasks.ShadowJar

plugins {
    kotlin("jvm") version "1.3.41"
    id("com.github.johnrengelman.shadow") version "5.1.0"
}

repositories {
    mavenCentral()
}

dependencies {
    implementation(kotlin("stdlib"))
}

tasks.withType<ShadowJar>() {
    manifest {
        attributes["Main-Class"] = "HelloKt"
    }
}

$ cat src/main/kotlin/Hello.kt 
fun main() {
    println("Hello!")
}

$ gradle wrapper --gradle-version 5.5
BUILD SUCCESSFUL in 0s
1 actionable task: 1 executed

$ ./gradlew shadowJar
BUILD SUCCESSFUL in 1s
2 actionable tasks: 2 executed

$ java -jar build/libs/hello-all.jar 
Hello!

Creating a self-signed certificate for Apache and connecting to it from Java

Our mission: to create a self-signed certificate for an Apache web server that allows us to connect to it over HTTPS (SSL/TLS) from a Java program.

The tricky bit for me was generating a certificate that contains Subject Alternative Names for my server, which is needed to connect to it from Java.

We will use the openssl command.

Creating a self-signed certificate for Apache HTTPD

First create a config file cert.conf:

[ req ]
distinguished_name  = subject
x509_extensions     = x509_ext
prompt = no

[ subject ]
commonName = Example Company

[ x509_ext ]
subjectAltName = @alternate_names

[ alternate_names ]
DNS.1 = example.com

In the above, replace “example.com” with the name you will use for the host when you connect from Java. This is important, because Java requires the name in the certificate to match the name it is using to connect to the server. If you’re connecting to it as localhost, just put “localhost”. Note: do not include “https://” or any port or path after the hostname, so “example.com:8080/mypath” is wrong – it should be just “example.com”.

The alternate_names section above gives the “Subject Alternative Names” for this certificate. You can add more as “DNS.2”, “DNS.3”, etc.

Next, generate the server key and self-signed certificate:

openssl genrsa 2048 > server.key
chmod 400 server.key
openssl req -new -x509 -config cert.conf -nodes -sha256 -days 365 -key server.key -out server.crt

Now you have two new files: server.key and server.crt. These are the files that will be used by Apache HTTPD, so put them somewhere useful (e.g. inside /usr/local/apache2/conf/) and refer to them in the Apache config file using keys “SSLCertificateKeyFile” and “SSLCertificateFile” respectively. For more info see the SSL/TLS How-To.

Checking the certificate is being used

Start up your Apache and ensure you can connect to it over HTTPS using curl:

curl -v --insecure https://example.com:8080

Replace “https://example.com:8080” above with the full URL (this time, include “https://” and the port and path.

To examine the certificate that is being returned, run:

openssl s_client -showcerts -connect example.com:8080

Replace “example.com:8080” above with hostname and port (no “https:// this time!).

Connecting from Java

To be able to connect from Java, we need a Trust Store. We can create one in PKCS#12 format with:

openssl pkcs12 -export -passout pass:000000 -out trust.pkcs12 -inkey server.key -in server.crt

Note: Java 8 onwards is able to use .pkcs12 (PKCS#12) files for its trust store. The old .jks (Java Key Store) format can also be used, but is deprecated.

Now you have a file we can use as a trust store, follow my other article to connect from Java over HTTPS with a self-signed certificate.

Convert a video to a GIF with reasonable colours

Here’s a little script I wrote to avoid copy-pasting the ffmpeg command from superuser every time I needed it.

It converts a video to a GIF file by pre-calculating a good palette, then using that palette.

Usage:

./to_gif input.mp4 output.gif

The file to_gif (which should be executable):

#!/bin/bash

set -e
set -u

# Credit: https://superuser.com/questions/556029/how-do-i-convert-a-video-to-gif-using-ffmpeg-with-reasonable-quality

INPUT="$1"
OUTPUT="$2"

PALETTE=$(tempfile --suffix=.png)

ffmpeg -y -i "${INPUT}" -vf palettegen "${PALETTE}"
ffmpeg -y -i "${INPUT}" -i "${PALETTE}" \
    -filter_complex "fps=15,paletteuse" "${OUTPUT}"

rm -f "${PALETTE}"

Note: you might want to modify the number after fps= to adjust how fast the video plays.

London Python Meetup January 2019 – Async Python and GeoPandas

It was a pleasure to go to the London Python Meetup organised by @python_london. There were plenty of friendly people and interesting conversations.

I gave a talk “Making 100 million requests with Python aiohttp” (slides, Blog post) explaining the basics of writing async code in Python 3 and how I used that to make a very large number of HTTP requests.

Andy giving the presentation

(Photo by CB Bailey.)

Hopefully it was helpful – there were several good questions, so I am optimistic that people were engaged with it.

After that, there was an excellent talk by Gareth Lloyd called “GeoPandas, the geospatial extension for Pandas” in which he explained how to use the very well-developed geo-spatial data tools available in the Python ecosphere to transform, combine, plot and analyse data which includes location information. I was really impressed with how easy the libraries looked to use, and also with the cool Jupyter notebook Gareth used to explain the ideas using live demos.

London Python Meetups seem like a cool place to meet Pythonistas of all levels of experience in a nice, low-pressure environment!

Meetup link: aiohttp / GeoPandas

Run bash inside any version of Linux using Docker

Docker is useful for some things, and not as useful as you think for others.

Here’s something massively useful: get a throwaway bash prompt inside any version of any Linux distribution in one command:

docker run -i -t --mount "type=bind,src=$HOME/Desktop,dst=/Desktop" ubuntu:18.10 bash

This command downloads a recent Ubuntu 18.10 image, mounts my desktop as /Desktop in the container, and gives me a bash prompt. From here I can install any packages I want and then use them.

For example, today I used it to decrypt a file that was encrypted with a cipher my main OS did not have a package for.

When I exit bash, the container stops and I can find it with docker ps -a then remove it with docker rm. To really clean up I can find the downloaded images with docker image ls and remove them with docker image rm.

Elm JSON decoder examples

I find JSON decoding in Elm confusing, so here are some thoughts and examples.

Setup

$ elm --version
0.19.0
$ mkdir myproj; cd myproj
$ elm init
...
$ elm install elm/json
...

To run the “Demo” parts of the examples below, type them into the interactive Elm interpreter. To try them out, start it like this:

$ elm repl

and import the library you need:

import Json.Decode as D

Scroll to “Concepts” at the bottom for lots of waffling about what is really going on, but if you’re looking to copy and paste concrete examples, here we are:

Examples

JSON object to Record

type alias MyRecord =
    { i : Int
    , s : String
    }

recordDecoder : D.Decoder MyRecord
recordDecoder =
    D.map2
        MyRecord
        (D.field "i" D.int)
        (D.field "s" D.string)

Note: this uses a special trick – it turns out you can construct a record like MyRecord writing e.g. MyRecord 3 "qux" – the arguments have to be in the same order that the properties are declared in the record. The above only works because you can do this.

Demo:

> type alias MyRec = {i: Int, s: String}
> myRecDec = D.map2 MyRec (D.field "i" D.int) (D.field "s" D.string)
<internals> : D.Decoder MyRec
> D.decodeString myRecDec "{\"i\": 3, \"s\": \"bar\"}"
Ok { i = 3, s = "bar" }
    : Result D.Error MyRec

JSON array of ints to List

intArrayDecoder : D.Decoder (List Int)
intArrayDecoder =
    D.list D.int

Demo:

> myArrDec = D.list D.int
<internals> : D.Decoder (List Int)
> D.decodeString myArrDec "[3, 4]"
Ok [3,4] : Result D.Error (List Int)

JSON array of strings to List

stringArrayDecoder : D.Decoder (List String)
stringArrayDecoder =
    D.list D.string

Demo:

> myArrDec2 = D.list D.string
<internals> : D.Decoder (List String)
> D.decodeString myArrDec2 "[\"a\", \"b\"]"
Ok ["a","b"] : Result D.Error (List String)

JSON object to Dict

intDictDecoder : D.Decoder (Dict String Int)
intDictDecoder =
    D.dict D.int

Demo:

> myDictDecoder = D.dict D.int
<internals> : D.Decoder (Dict.Dict String Int)
> D.decodeString myDictDecoder "{\"a\": \"b\"}"
Err (Field "a" (Failure ("Expecting an INT") <internals>))
    : Result D.Error (Dict.Dict String Int)
> D.decodeString myDictDecoder "{\"a\": 3}"
Ok (Dict.fromList [("a",3)])
    : Result D.Error (Dict.Dict String Int)

To build a Dict of String to String, replace D.int above with
D.string.

JSON array of objects to List of Records

type alias MyRecord =
    { i : Int
    , s : String
    }

recordDecoder : D.Decoder MyRecord
recordDecoder =
    D.map2
        MyRecord
        (D.field "i" D.int)
        (D.field "s" D.string)


listOfRecordsDecoder : D.Decoder (List MyRecord)
listOfRecordsDecoder =
    D.list recordDecoder

Demo:

> import Json.Decode as D
> type alias MyRec = {i: Int, s: String}
> myRecDec = D.map2 MyRec (D.field "i" D.int) (D.field "s" D.string)
<internals> : D.Decoder MyRec
> listOfMyRecDec = D.list myRecDec
<internals> : D.Decoder (List MyRec)
> D.decodeString listOfMyRecDec "[{\"i\": 4, \"s\": \"one\"}, {\"i\": 5, \"s\":\"two\"}]"
Ok [{ i = 4, s = "one" },{ i = 5, s = "two" }]
    : Result D.Error (List MyRec)

Concepts

What is a Decoder?

A Decoder is something that describes how to take in JSON and spit out something. The “something” part is written after Decoder, so e.g. Decoder Int describes how to take in JSON and spit out an Int.

The Json.Decode module contains a function that is a Decoder Int. It’s called int:

> D.int
<internals> : D.Decoder Int

In some not-all-all-true way, a Decoder is sort of like a function:

-- This is a lie, but just pretend with me for a sec
Decoder a : SomeJSON -> a
-- That was a lie

To actually run your a Decoder, provide it to a function like decodeString:

> D.decodeString D.int "45"
Ok 45 : Result D.Error Int

So the actually-true way of getting an actual function is to combine decodeString and a decoder like int:

> D.decodeString D.int
<function> : String -> Result D.Error Int

When you apply decodeString to int you get a function that takes in a String and returns either an Int or an error. The error could be because the string you passed was not valid JSON:

> D.decodeString D.int "foo bar"
Err (Failure ("This is not valid JSON! Unexpected token o in JSON at position 1") )
    : Result D.Error Int

or because the parsed JSON does not match what the Decoder you supplied expects:

> D.decodeString D.int "\"45\""
Err (Failure ("Expecting an INT") )
    : Result D.Error Int

(We supplied a String containing a JSON string, but the int Decoder expects to find a JSON int.)

Side note: ints and floats are treated as different, even though the JSON Spec treats them all as just “Numbers”:

> D.decodeString D.int "45.2"
Err (Failure ("Expecting an INT") )
    : Result D.Error Int

What is a Value?

Elm has a type that represents JSON that has been parsed (actually, parsed and stored in a JavaScript object) but not interpreted into a useful Elm type. You can make one using the functions inside Json.Encode:

> import Json.Encode as E
> foo = E.string "foo"
 : E.Value

You can even turn one of these into a String containing JSON using encode:

> E.encode 0 foo
"\"foo\"" : String

or interpret the Value as useful Elm types using decodeValue:

> D.decodeValue D.string foo
Ok "foo" : Result D.Error String

(When JSON values come from JavaScript, e.g. via flags, they actually come as Values, but you don’t usually need to worry about that.)

However, what you can’t do is pull Values apart in any way, other than the standard ways Elm gives you. So any custom Decoder that you write has to be built out of existing Decoders.

How do I write my own Decoder?

If you want to make a Decoder that does custom things, build it from the existing magic Decoders, give it a type that describes the type it outputs, and insert your code using one of the mapN functions.

For example, to decode only ints that are below 100:

> under100 i = if i < 100 then D.succeed i else (D.fail "Not under 100")
<function> : number -> D.Decoder number
> intUnder100 = D.int > D.andThen under100
<internals> : D.Decoder Int
> D.decodeString intUnder100 "50"
Ok 50 : Result D.Error Int
> D.decodeString intUnder100 "500"
Err (Failure ("Not under 100") <internals>)
    : Result D.Error Int

Here, we use the andThen function to transform the Int value coming from calling the int function into a Decoder Int that expresses success or failure in terms of decoding. When we do actual decoding using the decodeString funcion, this is transformed into the more familiar Result values like Ok or Err.

If you want to understand the above, pay close attention to the types of under100 and intUnder100.

If you want to write a Decoder that returns some complex type, you should build it using the mapN functions.

For example, to decode strings into arrays of words split by spaces:

> splitIntoWords = String.split " "
<function> : String -> List String
> words = D.map splitIntoWords D.string
<internals> : D.Decoder (List String)
> D.decodeString words "\"foo bar baz\""
Ok ["foo","bar","baz"]
    : Result D.Error (List String)

Above we used map to transform a Decoder String (the provided string function) into a Decoder (List String) by mapping it over a function (splitIntoWords) that transforms a String into a List String.

Again, to understand this, look carefully at the types of splitIntoWords
and words.

How do I build up complex Decoders?

Complex decoders are built by combining simple ones. Many functions that make decoders take another decoder as an argument. A good example is “JSON array of objects to List of Records” above – there we make a Decoder MyRecord and use it to decode a whole list of records by passing it as an argument to list, so that it returns a Decoder (List MyRecord) which can take in a JSON array of JSON objects, and return a List of MyRecords.

Why is this so confusing?

Because Decoders are not functions, but they feel like functions. In fact they are opaque descriptions of how to interpret JSON that the Elm runtime uses to make Elm objects for you out of Values, which are opaque objects that underneath represent a piece of parsed JSON.

Graft Animation Language on Raspberry Pi

Because the Rapsberry Pi uses a slightly older Python version, there is a special version of Graft for it.

Here’s how to get it:

  • Open a terminal window by clicking the black icon with a “>” symbol on it at the top near the left.
  • First we need to install a couple of things Graft needs, so type this, then press Enter:
    sudo apt install python3-attr at-spi2-core
  • If you want to be able to make animated GIFs, install one more thing:
    sudo apt install imagemagick
  • To download Graft and switch to the Raspberry Pi version, type in these commands, pressing Enter after each line.
    git clone https://github.com/andybalaam/graft.git
    cd graft
    git checkout raspberry-pi
  • Now, you should be able to run Graft just like on another computer, for example, like this:
    ./graft 'd+=10 S()'
  • If you’re looking for a fun way to start, why not try the worksheet “Tell a story by making animations with code”?

    For more info, see Graft Raspberry Pi Setup.

My experience upgrading to Elm 0.19

Elm is unstable, so upgrading to the next version can be painful. Here’s what I needed to do to upgrade from 0.18 to 0.19.

  • Replace elm-package.json and tests/elm-package.json with elm.json – e06f5a1728
  • Switch to the new elm-testb964b7c7a
  • Re-arrange Main, and how we call it from JavaScript – 0c118c49f
  • Stop using eeue56/elm-all-dict (since it’s not ported to 0.19 and porting it looked hard due to a lack of Debug.crash) – fe100f256
  • Replace toString with String.fromX or Debug.toString – 9e78163d0a3
  • Stop “shadowing” names by making new variables with the same name as another in the scope – 9688a621de
  • Adapt to the changed Html.style function – b991ab4f
  • Stop using Debug.crash – f98a70ad1
  • Adapt to the changes in the Regex module – 856762a4
  • Stop using tuples with more than 3 parts – 472c0bb7

The lack of Debug.crash is really, really painful, especially for a library like eeue56/elm-all-dict that has lots of invariants that are hard or impossible to enforce via the type system. On the other hand, if Elm can give a hard guarantee that there will be no runtime errors, this seems pretty cool. The problem is that some code may well have to return the wrong answer silently, instead of crashing, which could be much worse than crashing in some use-cases.

I was annoyed by the lack of more-than-3-part tuples, but even as I did the work to change my code, I saw it get better, so it’s hard to argue with.

The hardest part to work out was how to run the tests. Fortunately the tests themselves needed almost no changes. I just needed to do this:

rm -r tests/elm-stuff
rm tests/elm-package.json
sudo npm install -g elm-test@0.19.0-beta8
elm-test install
elm-test

My next job is to check out the –optimize compiler flag, and the advice on making the code smaller and faster.

Bulk adding items to Wunderlist using wunderline on Ubuntu MATE

If you use Wunderlist and want to be able to bulk-add tasks from a text file, first install and set up wunderline.

Now, to be able to right-click a text file containing one task per line on Ubuntu MATE, create a file called “wunderlist-bulk-add” in ~/.config/caja/scripts/ and make it executable. Paste the code below into that file.

(Note: it’s possible that this would work in GNOME if you replaced “caja” with “nautius” in that path – let me know in the comments.)

#!/usr/bin/env bash

set -e
set -u
set -o pipefail

function nonblank()
{
    grep -v -e '^[[:space:]]*$' "$@"
}

for F in "$@"; do
{
    COUNT=$(nonblank "$F" | wc -l | awk '{print $1}')
    if [ "$COUNT" = "" ]; then
    {
        zenity --info --no-wrap --text="File $F does not exist"
    }
    else
    {
        set +e
        zenity --question --no-wrap \
            --text="Add $COUNT items from $F to Wunderlist?"
        ANSWER=$?
        set -e

        if [ "$ANSWER" = "0" ]; then
        {
            set +e
            nonblank "$F" | \
                wunderline add --stdin | \
                zenity --progress --text "Adding $COUNT items to Wunderlist"
            SUCCEEDED=$?
            set -e
            if [ "$SUCCEEDED" = "0" ]; then
            {
                zenity --info --no-wrap --text "Added $COUNT items."
            }
            else
            {
                zenity --error --no-wrap \
                    --text "Failed to add some or all of these items."
            }; fi
        }; fi
    }; fi
}; done

Writing a new Flarum extension on Ubuntu

In a previous post I described how to install Flarum locally on Ubuntu.

Here is how I set up my development environment on top of that setup so I was able to write a new Flarum extension and test it on my local machine.

Recap: I installed Apache and PHP and enabled mod_rewrite, installed MariaDB and made a database, installed Composer, and used Composer to install Flarum at /var/www/html/flarum.

More info: Flarum Contribution Guide, extension development guide, extension development quick start, workflow discussion.

I decided to call my extension “rabbitescape/flarum-ext-rabbitescape-leveleditor”, so that is the name you will see below.

Here’s what I did:

cd /var/www/html/flarum

Edit /var/www/html/flarum/composer.json (see the workbench explanation for examples of the full file).

At the end of the “require” section I added a line like this:

“rabbitescape/flarum-ext-rabbitescape-leveleditor”: “*@dev”

(Remembering to add a comma at the end of the previous line so it remained valid JSON.)

After the “require” section, I added this:

    "repositories": [
        {
            "type": "path",
            "url": "workbench/*/"
        }
    ],

Next, I made the place where my extension would live:

mkdir workbench
cd workbench
mkdir flarum-ext-rabbitescape-leveleditor
cd flarum-ext-rabbitescape-leveleditor

Now I created a file inside flarum-ext-rabbitescape-leveleditor called composer.json like this:

{
    "name": "rabbitescape/flarum-ext-rabbitescape-leveleditor",
    "description": "Allow viewing and editing Rabbit Escape levels in posts",
    "type": "flarum-extension",
    "keywords": ["rabbitescape"],
    "license": "MIT",
    "authors": [
        {
            "name": "Andy Balaam",
            "email": "andybalaam@artificialworlds.net"
        }
    ],
    "support": {
        "issues": "https://github.com/andybalaam/flarum-ext-rabbitescape-leveleditor/issues",
        "source": "https://github.com/andybalaam/flarum-ext-rabbitescape-leveleditor"
    },
    "require": {
        "flarum/core": "^0.1.0-beta.5"
    },
    "extra": {
        "flarum-extension": {
            "title": "Rabbit Escape Level Editor"
        }
    }
}

In the same directory I created a file bootstrap.php like this:

<?php

return function () {
    echo 'Hello, world!';
};

Then I told Composer to refresh the project like this:

cd ../..   # Back up to the main directory
composer update

Among the output I saw a line like this which convinced me it had worked:

  - Installing rabbitescape/flarum-ext-rabbitescape-leveleditor (dev-master): Symlinking from workbench/flarum-ext-rabbitescape-leveleditor

Now I went to http://localhost/flarum, signed in as admin, adminpassword (as I set up in the previous post), clicked my username in the top right and chose Administration, then clicked Extensions on the left.

I saw my extension in the list, and turned it on by clicking the check box below it. This made the whole web site disappear, and instead just the text “Hello world!” appeared. This showed my extension was being loaded.

Finally, I edited bootstrap.php and commented out the line starting with “echo”. I refreshed the page in my browser and saw the Flarum site reappear.

Now, my extension is installed and ready to be developed! See flarum.org/docs/extend/start for how to get started making it do cool stuff.

Ubuntu “compose” key for easy unicode character input

I found out on Mastodon recently that the Compose key exists, allowing you to enter special characters using easy-to-remember key sequences (e.g. “<compose>/=” gives you “≠”).

To do this on Ubuntu (and probably many other systems), hold SHIFT, then press AltGr, release them both, and type something like “/=”. You should see the ≠ symbol appear.

In Ubuntu MATE, you can customise what key is the compose key by opening “Keyboard”, clicking “Layouts”, then “Options…” and checking a box under “Position of compose key”. I checked “Right Alt”, meaning my “AltGr” key is now the compose key.

More info: Ubuntu ComposeKey docs.

Want to explore? See the list of the default sequences you can type.

Redirecting all requests to https and www using .htaccess in Apache

I want all requests to artificialworlds.net/rabbit-escape/levels/ to get redirected to use the https protocol, and to include “www.” at the beginning of the URL, and I found lots of Stack Overflow articles, but nothing there worked perfectly for me. Here is how I managed it.

I edited the .htaccess file in the directory where I want this to apply, and added this at the top:

<IfModule mod_rewrite.c>
  RewriteEngine On
  RewriteCond %{HTTP_HOST} ^artificialworlds\.net$ [NC]
  RewriteRule ^ https://www.artificialworlds.net%{REQUEST_URI} [R=301,L]

  RewriteCond %{HTTPS} off
  RewriteRule ^ https://www.artificialworlds.net%{REQUEST_URI} [R=301,L]
</IfModule>

Example of a systemd service file

Here is an almost-minimal example of a systemd service file, that I use to run the Mastodon bot of my generative art playground Graft.

I made a dedicated user just to run this service, and installed Graft into /home/graft/apps/graft under that username. Now, as root, I edited a file called /etc/systemd/service/graft.service and made it look like this:

[Service]
ExecStart=/home/graft/apps/graft/bot-mastodon
User=graft
Group=graft
WorkingDirectory=/home/graft/apps/graft/
[Install]
WantedBy=multi-user.target

Now I can start the graft service like any other service like this:

sudo systemctl start graft

and find out its status with:

sudo systemctl status graft

If I want it to run on startup I can do:

sudo systemctl enable graft

and it will. Easy!

If I want to look at its output, it’s:

sudo journalctl -u graft

As a reward for reading this far, here’s a little animation you can make with Graft:

Installing Flarum on Ubuntu 18.04

I am setting up a forum for sharing levels for my game Rabbit Escape, and I have decided to try and use Flarum, because it looks really usable and responsive, has features we need like liking posts and following authors, and I think it will be reasonably OK to write the custom features we want.

So, I want a dev environment on my local Ubuntu 18.04 machine, and the first step to that is a standard install.

Warning: at the time of writing the Flarum docs say it does not work with PHP 7.2, which is what is included with Ubuntu 18.04, so this may not work. (So far it looks OK for me.)

Here’s how I got it working (as far as the web installer stage, anyway):

sudo apt install \
    apache2 \
    libapache2-mod-php \
    mariadb-server \
    php-mysql \
    php-json \
    php-gd \
    php-tokenizer \
    php-mbstring \
    php-curl

php -r "copy('https://getcomposer.org/installer', 'composer-setup.php');"

# Get the neat line from https://getcomposer.org/download/
# Don't copy it exactly!
php -r "if (hash_file('SHA384', 'composer-setup.php') === '544e09ee996cdf60ece3804abc52599c22b1f40f4323403c44d44fdfdd586475ca9813a858088ffbc1f233e9b180f061') { echo 'Installer verified'; } else { echo 'Installer corrupt'; unlink('composer-setup.php'); } echo PHP_EOL;"

mkdir $HOME/bin
php composer-setup.php --install-dir=$HOME/bin/ --filename=composer
rm composer-setup.php

cd /var/www/html
sudo mkdir flarum
sudo chown $(whoami) flarum

# Log out and in again here to get composer to be in your PATH
cd flarum
composer create-project flarum/flarum . --stability=beta

sudo chgrp -R www-data .
sudo chmod -R 775 .

sudo systemctl restart apache2

Go to http://localhost/flarum in your browser, and follow the instructions there to get set up.

If I get further, I will update this post, including on how to set up the MySQL database.

If you want to find and share levels for Rabbit Escape, check up on our progress setting up the forum at https://artificialworlds.net/rabbit-escape/levels.

Connecting to Slack from an IRC client using slirc

I tried to get back to an IRC interface to Slack using Matrix, and it had some problems. Thanks to Colin Watson’s comment on that post, I tried Daniel Beer’s slirc, and so far it seems to be working pretty well.

Here’s what I did:

Get a Slack legacy token which slirc will use to connect to Slack as you. Follow the instructions given at that link, and you should end up with a token that looks something like “abcd-123456768-ETC-ETC”. Keep a note of it.

Install the prerequisites for slirc, and download it:

sudo cpan AnyEvent AnyEvent::HTTP AnyEvent::Socket AnyEvent::WebSocket::Client URI::Encode Data::Dumper JSON
mkdir slirc
cd slirc
wget -q 'https://www.dlbeer.co.nz/articles/slirc/slirc-20180515.pl'
chmod +x slirc-20180515.pl

Create a file in the slirc directory you created above, called rc.conf, and make it look like this:

slack_token=abcd-123456768-ETC-ETC
password=somepassword
port=6667

Replace “abcd-123456768-ETC-ETC” with the Slack legacy token you noted down earlier.

Replace “somepassword” with something you’ve made up (not your Slack password) – this is what you will type as the password in your IRC client.

Run slirc and leave it running:

./slirc-20180515.pl rc.conf

(Make sure you are inside the slirc dir when you run that.)

Start your IRC client (e.g. HexChat) and add a server with address “localhost” and port 6667, with your slack username and the password you added in the rc.conf (which you wrote instead of “somepassword”).

This mostly works for me, except it has a tendency to open a load of ad-hoc chats as channels, so I have to close them all to get a usable list.

Using Matrix to connect to Slack from an IRC client on Ubuntu

Update: I found a better solution using slirc.

I like using HexChat to talk to my colleagues, like one other guy. It is fast, and it pops up a new window when someone sends me a direct/private message.

Recently, Slack shut down their IRC gateway, forcing me to use their slow UI, and making me unable to be responsive to direct messages without spending my entire life checking said UI, or being disturbed by notifications.

So, I decided to set up a Matrix bridge, because how hard can it be?

My system is Ubuntu MATE 18.04.

What I did

Install Synapse

Install Synapse, a Matrix homeserver.

wget -qO - https://matrix.org/packages/debian/repo-key.asc | sudo apt-key add -
sudo apt-add-repository http://matrix.org/packages/debian/
sudo apt update
sudo apt install matrix-synapse

Edit /etc/matrix-synapse/homeserver.yaml to change the line containing “registration_shared_secret” to look something like:

registration_shared_secret: "some secret password you made up"

(Replacing “some secret password you made up” with something secret.)

Make a new user:

register_new_matrix_user -c /etc/matrix-synapse/homeserver.yaml http://localhost:8008

Enter a username and password, and take a note of them.

Install Riot

(Technically, this step is optional, but I definitely recommend it to check everything is working.)

Install Riot, a client that can talk to a homeserver.

curl https://riot.im/packages/debian/repo-key.asc | sudo apt-key add -
sudo apt-add-repository https://riot.im/packages/debian/
sudo apt update
sudo apt install riot-web

Start Riot from the menu or by typing riot-web in a console.

Choose to log in to a Custom Server (not create a new user), and enter the Home Server URL as http://localhost:8008, leave the Identity Server URL as it is, and enter the username and password you entered in the previous step.

Riot should be able to connect to the Synapse server, even though it won’t have any rooms in it yet.

If this doesn’t work, try running riot-web in a console, and check the Synapse log file at /var/log/matrix-synapse/homeserver.log

Install matrix-puppet-slack

(Note: after this step, the homeserver can only be for you now, because it is logged into Slack as you. If you want to make a new Matrix server for lots of people, I found that matrix-appservice-slack works, but this only connects channels, not direct messages.)

Get a Slack legacy token which matrix-puppet-slack will use to connect to Slack as you. Follow the instructions given at that link, and you should end up with a token that looks something like “abcd-123456768-ETC-ETC”. Keep a note of it.

Install matrix-puppet-slack which allows you to connect a homeserver to a Slack instance, pretending it is you:

cd
git clone https://github.com/matrix-hacks/matrix-puppet-slack.git
cd matrix-puppet-slack/
npm install
cp config.sample.json config.json

Edit config.json so it looks something like this:

{
  "slack": [{
    "team_name": "Name of my team",
    "user_access_token": "abcd-123456768-ETC-ETC"
  }],
  "registrationPath": "slack-registration.yaml",
  "port": 8090,
  "bridge": {
    "homeserverUrl":"http://localhost:8008",
    "domain": "localhost",
    "registration": "slack-registration.yaml"
  }
}

Now generate a config file that Synapse will use, called slack-registration.yaml, by typing this command (still in the matrix-puppet-slack directory):

node index.js -r -u "http://localhost:8090"

When asked, type in the username and password you entered in the first step (when you ran the register_new_matrix_user command).

This creates a config file telling Synapse that matrix-puppet-slack is running on the local machine on port 8090.

Now, run it:

node index.js

and edit Synapse’s config file /etc/matrix-synapse/homeserver.yaml to point at the new config file you just made, by editing the line mentioning app_service_config_files so that it looks like this:

app_service_config_files: [
    "/path/to/matrix-puppet-slack/slack-registration.yaml"
]

Make sure you replace “/path/to/matrix-puppet-slack” in the above with the directory where you put matrix-puppet-slack, which is /home/yourusername/matrix-puppet-slack if you followed the instructions exactly.

Restart Synapse, and when you go back into Riot, you should see your Slack channels appear gradually (as people talk in them):

sudo service matrix-synapse restart

If not, check the output from the node index.js command, and the Synapse log file at /var/log/matrix-synapse/homeserver.log.

Install matrix-ircd

To allow connecting to Matrix from an IRC client, install matrix-ircd:

cd
git clone https://github.com/matrix-org/matrix-ircd.git
cd matrix-ircd
cargo build
cargo run -- --url "http://localhost:8008"

(Note this works on Ubuntu 18.04, but operating systems with older versions of Rust may need to get the latest first – see the matrix-ircd page for details.)

Connect from an IRC client

Now use an IRC client to connect to matrix-ird. I recommend HexChat if you’re looking for one.

matrix-ircd runs on port 5999 by default, so you should be able to connect to it from your IRC client by setting the server to localhost/5999 and using the username and password from the first step (when you ran the register_new_matrix_user command). Use plain username/password authentication.

If it works, you should see your Slack channels appear in your IRC client. If not check the logs mentioned before, and the error messages and logs from your IRC client.

What works and doesn’t work

  • Messages typed by me and others into IRC, Riot and Slack appear in all the other places, including direct messages.
  • Synapse and Riot are installed fully, so Synapse will be running after a reboot, and Riot is available in the menus, but matrix-puppet-slack and matrix-ircd are just running in a console, and have to be manually started. It should be reasonably simple to make custom systemd files to start them automatically.
  • matrix-puppet-slack crashed once and I had to restart it, so this may not be very reliable. I logged an issue.
  • The Slack channel names look terrible in my IRC client. I think matrix-ircd needs to find the pretty names (which are displayed correctly in Riot) and use them instead of the coded names used under the cover in Matrix.
  • My first ever direct message to someone does not work, even though I can choose them and attempt to send a message. As soon as they say something to me, they appear as a channel in Riot and IRC, and I can talk back and forth no problem from then on.
  • Direct messages look like normal channels in Matrix, which means I can’t use HexChat to pop up notifications for them, so this was all pointless.

Please leave comments below if you know how I can fix any of these problems.

Conclusions

  • Matrix seems really cool
  • It is basically possible to bridge IRC-Matrix-Slack as I wanted
  • But, some remaining bugs mean I have to keep using Slack’s UI for now :-(
  • Please comment telling me how to do this better

Examples of SQL join types (LEFT JOIN, INNER JOIN etc.)

I have 2 tables like this:

> SELECT * FROM table_a;
+------+------+
| id   | name |
+------+------+
|    1 | row1 |
|    2 | row2 |
+------+------+

> SELECT * FROM table_b;
+------+------+------+
| id   | name | aid  |
+------+------+------+
|    3 | row3 |    1 |
|    4 | row4 |    1 |
|    5 | row5 | NULL |
+------+------+------+

INNER JOIN cares about both tables

INNER JOIN cares about both tables, so you only get a row if both tables have one. If there is more than one matching pair, you get multiple rows.

> SELECT * FROM table_a a INNER JOIN table_b b ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | id   | name | aid  |
+------+------+------+------+------+
|    1 | row1 |    3 | row3 | 1    |
|    1 | row1 |    4 | row4 | 1    |
+------+------+------+------+------+

It makes no difference to INNER JOIN if you reverse the order, because it cares about both tables:

> SELECT * FROM table_b b INNER JOIN table_a a ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | aid  | id   | name |
+------+------+------+------+------+
|    3 | row3 | 1    |    1 | row1 |
|    4 | row4 | 1    |    1 | row1 |
+------+------+------+------+------+

You get the same rows, but the columns are in a different order because we mentioned the tables in a different order.

LEFT JOIN only cares about the first table

LEFT JOIN cares about the first table you give it, and doesn’t care much about the second, so you always get the rows from the first table, even if there is no corresponding row in the second:

> SELECT * FROM table_a a LEFT JOIN table_b b ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | id   | name | aid  |
+------+------+------+------+------+
|    1 | row1 |    3 | row3 | 1    |
|    1 | row1 |    4 | row4 | 1    |
|    2 | row2 | NULL | NULL | NULL |
+------+------+------+------+------+

Above you can see all rows of table_a even though some of them do not match with anything in table b, but not all rows of table_b – only ones that match something in table_a.

If we reverse the order of the tables, LEFT JOIN behaves differently:

> SELECT * FROM table_b b LEFT JOIN table_a a ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | aid  | id   | name |
+------+------+------+------+------+
|    3 | row3 | 1    |    1 | row1 |
|    4 | row4 | 1    |    1 | row1 |
|    5 | row5 | NULL | NULL | NULL |
+------+------+------+------+------+

Now we get all rows of table_b, but only matching rows of table_a.

RIGHT JOIN only cares about the second table

a RIGHT JOIN b gets you exactly the same rows as b LEFT JOIN a. The only difference is the default order of the columns.

> SELECT * FROM table_a a RIGHT JOIN table_b b ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | id   | name | aid  |
+------+------+------+------+------+
|    1 | row1 |    3 | row3 | 1    |
|    1 | row1 |    4 | row4 | 1    |
| NULL | NULL |    5 | row5 | NULL |
+------+------+------+------+------+

This is the same rows as table_b LEFT JOIN table_a, which we saw in the LEFT JOIN section.

Similarly:

> SELECT * FROM table_b b RIGHT JOIN table_a a ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | aid  | id   | name |
+------+------+------+------+------+
|    3 | row3 | 1    |    1 | row1 |
|    4 | row4 | 1    |    1 | row1 |
| NULL | NULL | NULL |    2 | row2 |
+------+------+------+------+------+

Is the same rows as table_a LEFT JOIN table_b.

No join at all gives you copies of everything

If you write your tables with no JOIN clause at all, just separated by commas, you get every row of the first table written next to every row of the second table, in every possible combination:

> SELECT * FROM table_b b, table_a;
+------+------+------+------+------+
| id   | name | aid  | id   | name |
+------+------+------+------+------+
|    3 | row3 | 1    |    1 | row1 |
|    3 | row3 | 1    |    2 | row2 |
|    4 | row4 | 1    |    1 | row1 |
|    4 | row4 | 1    |    2 | row2 |
|    5 | row5 | NULL |    1 | row1 |
|    5 | row5 | NULL |    2 | row2 |
+------+------+------+------+------+

Fixing Slack emojis in HexChat

If, like me, you are this person:

(Source: xkcd.com/1782)

You may want to fix the stupid :slightly_smiling_face: messages you receive from Slack via the IRC gateway. Obviously, I’d prefer they went away entirely, but it’s still better to see a character than being spammed with colon abominations all over the place.

You’ll need the Python emoji package, and a HexChat plugin like this:

# Replace all the horrible :slightly_smiling_face: rubbish that Slack inserts
# into horrible Unicode emoji symbols.
# Author: Andy Balaam
# License: CC0 https://creativecommons.org/publicdomain/zero/1.0/

# Requires https://pypi.python.org/pypi/emoji - I used 0.4.5
# I manually copied the emoji dir into:
# /home/andrebal/.local/lib/python2.7/site-packages
import emoji
import hexchat

__module_name__ = "slack-emojis"
__module_version__ = "1.0"
__module_description__ = "Translate emojis from Slack with colons into emojis"

print "Loading slack-emojis"
chmsg = "Channel Message"
prmsg = "Private Message to Dialog"

def preprint(words, word_eol, userdata):
    txt = word_eol[1]
    replaced = emoji.emojize(txt, use_aliases=True)
    if replaced != txt:
        hexchat.emit_print(
            userdata["msgtype"],
            words[0],
            replaced.encode('utf-8'),
        )
        return hexchat.EAT_HEXCHAT
    else:
        return hexchat.EAT_NONE

hexchat.hook_print(chmsg, preprint, {"msgtype": chmsg})
hexchat.hook_print(prmsg, preprint, {"msgtype": prmsg})

According to the page linked above, Slack are retiring the IRC gateway, which will make me very unhappy.

Update: added support for private messages too.

Ideas on how lexing will work in Pepper3

I am trying to practice documentation-driven development in Pepper3, so every time I start on an area, I will write documentation explaining how it works, and include examples that are automatically verified during the build.

I’ve started work on lexing, since you can’t do much before you do that, but in fact, of course, I need to have a command line interface before I can verify any of the examples, so I’m working on that too.

Lexing is the process that takes a stream of characters (e.g. from a file) and turns it into a stream of “tokens” that are chunks of code like a variable name, a number or a string. (There is more on lexing in my mini programming language, Cell.)

My thoughts so far about lexing are in lexing.md, and current ideas about command line interface are at command_line.md. All very much subject to change.

Headlines:

  • Ordinary programmers can write their own lexing rules.
  • Operators (functions like “+” that find their arguments on their left and right, instead of between brackets like normal functions) are defined at the lexing phase, so any symbol (e.g. “in”) can be an operator if you want.
  • Anything you might want to do with a pepper program, including running it, compiling it, packaging it for an distribution system, should be available as a sub-command of the main pepper3 command line.
  • The command is “pepper3”, never “pepper”. If a new, incompatible version comes out, it will be called “pepper4”, and they will be parallel-installable, with no confusion.

Deleting commits from the git history

Today I wanted to fix a Git repo that contained some bad commits (i.e. git fsck complained about them). [I wanted to do this because GitLab was not allowing me to push the bad commits.]

I wanted the code to look exactly as it did before, but the history to look different, so the bad commits disappeared, and (presumably) the work done in the bad commits to look like it was done in the commits following them.

Here’s what I ran:

git filter-branch -f --commit-filter '

    if [ "${GIT_COMMIT}" = "abdcef012345abcdef012345etcetcetc" ];
    then
        echo "Skipping GIT_COMMIT=${GIT_COMMIT}" >&2;
        skip_commit "$@";
    else
        git commit-tree "$@";
    fi
' --tag-name-filter cat -- --all

(Where abdcef012345abcdef012345etcetcetc was the ID of the commit I wanted to delete.)

Of course, you can make this cleverer to exclude multiple commits at a time, or run this several times, putting in the right commit ID each time.

Questions and answers about Pepper3

Series: Examples, Questions

My last post Examples of Pepper3 code was a reply to my friend’s email asking what it was all about. They replied with some questions, and I thought the questions and answers might shed some more light:

Questions!

Brilliant ones, thanks.

In general though you’ve said a lot about what Pepper can do without giving design decisions.

Yep, total brain dump.

Remind me again who this language is for :)

It’s a multi-paradigm (generic, functional, OO) language aimed at application programmers who want:

  • “native” performance on their chosen platform (definitely including actual native machine code). This is inspired by C++.
  • easy deployment (preferably a single binary containing everything, with an option to link most dependencies statically), including packaging of installers for major OSes. This is inspired by C++, and the pain of C++.
  • perfect flexibility for creating types – “meta-programming” is just programming. Things you would have done using code generation (e.g. generating a class hierarchy from an XSD) are done by running arbitrary code at compile time. The powerful type system is inspired by Haskell and the book “Modern C++ Design”, and the meta-programming is inspired by Lisp.
  • Simple memory management without GC through ownership. This is inspired by modern C++, and then Rust came along and implemented it before I could, thus proving it works. However, I would remove a lot of the functionality in Rust (lifetimes) to make it much simpler.
  • Strong support for functional programming if you want it. This is inspired by Haskell.
  • The simplest possible core language, with application programmers able to expand it by giving them the same tools as the language designers – e.g. “for” is just a function, so you can make your own. I am hoping I can even make “class” a function. This is inspired by Lisp, and oppositely-inspired by Java.
  • Separation between the idea of Interfaces, which I think I will call “type specifiers” (and will allow arbitrary code execution to determine whether a type satisfies the requirements) and structs/classes, allowing us to make new Interfaces and have old code satisfy them, meaning we can do generic stuff with e.g. ints even if
    no-one declared that “class Int : public Quaternion” or whatever.
  • Lots of “nudges” towards things that are good: by default things will be functional and immutable – you will have to explicitly say if you want to use more dangerous constructs like side effects and mutable values.
  • No implicit conversions, or really anything happening without you saying so.

Can you assign floats to ints or vice versa?

Yes, but you shouldn’t.

If you’re setting types in code at the start of a file, is this only available in the main file? Are there multiple files per program? Can
you have libraries? If so, do these decide the functionality of their types in the library or does this only happen in the main file?

I haven’t totally decided – either by being enforced, or as a matter of style, you will generally do this once at the beginning of the program (and choose on the compiler command line to do it e.g. the debug way or the release way) and it will affect all of your code.

Libraries will be packaged as Pepper3 source code, so choices you make of the type of Int etc. will be reflected through the whole dependency tree. Cool, huh?

This is inspired by Python.

Can you group variables together into structs or similar?

Yes – it will be especially easy to make “value types”, and lots of default methods will be provided, that you will be strongly encouraged to use – e.g. copy and move operations. This is inspired by Elm.

Why are variables immutable by default but mutable with a special syntax? It’s the opposite of C++ const, but why that way around?

This is one of the “nudges” – immutable stuff is much easier to think about, and makes parallel stuff easier, and allows optimisations and so on, so turning it on by default means you have to choose to take the bad path, and are inclined to take the virtuous one. This is inspired by Haskell and Rust.

Why only allow assignments, function calls and operators? I’m sure you have good reasons.

To be as simple as possible, so you only have those things to learn and the rest can be understood by just reading the code. This is inspired by Python.

I wrote more of my (earlier) thoughts in this 4-post series, which is better thought through: Goodness in Programming Languages

Examples of Pepper3 code

Series: Examples, Questions

I have restarted my effort to make a new programming language that fits the way I like things. I haven’t pushed any code yet, but I have made a lot of progress in my head to understand what I want.

Here are some random examples that might get across some of the ways I am thinking:


// You code using general types like "Int" but you can set what
// they really are in the code (usually at the beginning), so
// if you plan to use native ints in the production code, it's
// a good idea to use:
Int = CheckedNativeInt;
// while in dev, since it will crash at runtime if you overflow.

// Then, in production when you're sure you have no errors,
// switch to an unchecked one:
Int = NativeInt;

// But, if you prefer correctness over efficiency, you can use
// mathematical integer that never overflows:
Int = ArbitrarySizeInt;


// Variables are immutable by default, so:
Int x = 4;
x = 3;      // this is a compile error


// But this is OK
Mutable(Int) y = 6;
y = y + x;

// Notice that you can call functions that return types that you
// then use, like Mutable(Int) here.

// Generally, code can run at either compile time or run time.
// Code to do with types has to run at compile time.
// By default, other code runs at run time, but you can force
// it to run early if you want to.


// A main method looks like this - you get hold of e.g. stdout through
// a World instance - I try to avoid any global functions like print, or
// global variables like sys.stdout.

Auto main =
{:(World world)->Int
    //...
};

// (Although note that Int, String etc. actually are global variables,
// which is a bit annoying)

// I wish the main method were simpler-looking.  The only saving grace
// is that for simple examples you don't need a main method -
// Pepper3 just calculates the expression you provide in your file and
// prints it out.


// Expressions in curly brackets are lambda functions, so:

{3};

// is a function taking no arguments, returning 3, and:

{:(Int x)
    x * 2
};

// is a function that doubles a value.

Obviously, we can tie functions to names:

Auto dbl =
    {:(Int x)
        x * 2
    };

// Meaning we can call dbl like this:
dbl(4);

// Auto is a magic word to say ("use type inference"), so
// this is equivalent to the above:

fn([Int]->Int) dbl =
    {:(Int x)
        x * 2
    };


// Because {} makes an anon function, things like "for" can be
// functions instead of keywords.

for(range(3), {:(Int x)
    world.stdout.println(to(String)(x));
});


// As far as possible, Pepper3 will only contain assignment statements:
String s = "xx";

// and expressions containing function calls and operators:
dbl(3) + 6;


// This means we can make our own constructs like a different type of
// for loop, which would need a new keyword in some languages:

Auto parallel_for = import(multiprocess.parallel_for);

Women Who Code workshop on “Write your own programming language”

On Wednesday 28th June 2017 a group of people from OpenMarket went to the Fora office space in Clerkenwell, London to run a workshop with the Women Who Code group, who work to help women achieve their career goals.

OpenMarket provided the workshop “Write your own programming language” and funded the food, and the venue was provided gratis by Fora.

We started the evening with some networking and food:

networking

food

but most of the time was spent coding:

coding

with lots of help from our OpenMarket helpers:

helpers

The feedback we got was very positive:

Everyone seemed to be having fun, so we hope we might get invited back to do more in future.

Why do this?

At OpenMarket we want to improve our diversity, and we have started by looking at gender diversity specifically. By being involved with events like this we hope to learn how we can make our company better at welcoming and supporting employees, encourage people from under-represented groups to apply to work here, and improve the general climate in our industry.

Thank you

A huge thank you to the OpenMarket people (from London and Guadalajara!) who helped out – I think people felt welcome and there was plenty of help available for the attendees – you did a great job.

Thank you also for the great response from everyone in our London office – several people in the office wanted to come but couldn’t make it on the night – I am hoping we will get more opportunities in future.

We’re also really grateful to OpenMarket for funding the food, to Fora for providing the space, and to Women Who Code for doing such great work to improve our industry.

Links

[Photos by David Lawson.]

Running a virtualenv with a custom-built Python

For my attempt to improve the asyncio.as_completed Python standard library function I needed to build a local copy of cpython (the Python interpreter).

To test it, I needed the aiohttp module, which is not part of the standard library, so the easiest way to get it was using virtualenv.

Here is the recipe I used to get a virtualenv and install packages using pip with a custom-built Python:

$ ~/code/public/cpython/python -m venv env
$ . env/bin/activate
(env) $ pip install aiohttp
(env) $ python mycode.py

Making 100 million requests with Python aiohttp

Series: asyncio basics, large numbers in parallel, parallel HTTP requests, adding to stdlib

I’ve been working on how to make a very large number of HTTP requests using Python’s asyncio and aiohttp.

Paweł Miech’s post Making 1 million requests with python-aiohttp taught me how to think about this, and got us a long way, with 1 million requests running in a reasonable time, but I need to go further.

Paweł’s approach limits the number of requests that are in progress, but it uses an unbounded amount of memory to hold the futures that it wants to execute.

We can avoid using unbounded memory by using the limited_as_completed function I outined in my previous post.

Setup

Server

We have a server program “server”:

(Note it differs from Paweł’s version because I am using an older version of aiohttp which has fewer convenient features.)

#!/usr/bin/env python3.5

from aiohttp import web
import asyncio
import random

async def handle(request):
    await asyncio.sleep(random.randint(0, 3))
    return web.Response(text="Hello, World!")

async def init():
    app = web.Application()
    app.router.add_route('GET', '/{name}', handle)
    return await loop.create_server(
        app.make_handler(), '127.0.0.1', 8080)

loop = asyncio.get_event_loop()
loop.run_until_complete(init())
loop.run_forever()

This just responds “Hello, World!” to every request it receives, but after an artificial delay of 0-3 seconds.

Synchronous client

As a baseline, we have a synchronous client “client-sync”:

#!/usr/bin/env python3.5

import requests
import sys

url = "http://localhost:8080/{}"
for i in range(int(sys.argv[1])):
    requests.get(url.format(i)).text

This waits for each request to complete before making the next one. Like the other clients below, it takes the number of requests to make as a command-line argument.

Async client using semaphores

Copied mostly verbatim from Making 1 million requests with python-aiohttp we have an async client “client-async-sem” that uses a semaphore to restrict the number of requests that are in progress at any time to 1000:

#!/usr/bin/env python3.5

from aiohttp import ClientSession
import asyncio
import sys

limit = 1000

async def fetch(url, session):
    async with session.get(url) as response:
        return await response.read()

async def bound_fetch(sem, url, session):
    # Getter function with semaphore.
    async with sem:
        await fetch(url, session)

async def run(session, r):
    url = "http://localhost:8080/{}"
    tasks = []
    # create instance of Semaphore
    sem = asyncio.Semaphore(limit)
    for i in range(r):
        # pass Semaphore and session to every GET request
        task = asyncio.ensure_future(bound_fetch(sem, url.format(i), session))
        tasks.append(task)
    responses = asyncio.gather(*tasks)
    await responses

loop = asyncio.get_event_loop()
with ClientSession() as session:
    loop.run_until_complete(asyncio.ensure_future(run(session, int(sys.argv[1]))))

Async client using limited_as_completed

The new client I am presenting here uses limited_as_completed from the previous post. This means it can make a generator that provides the futures to wait for as they are needed, instead of making them all at the beginning.

It is called “client-async-as-completed”:

#!/usr/bin/env python3.5

from aiohttp import ClientSession
import asyncio
from itertools import islice
import sys

def limited_as_completed(coros, limit):
    futures = [
        asyncio.ensure_future(c)
        for c in islice(coros, 0, limit)
    ]
    async def first_to_finish():
        while True:
            await asyncio.sleep(0)
            for f in futures:
                if f.done():
                    futures.remove(f)
                    try:
                        newf = next(coros)
                        futures.append(
                            asyncio.ensure_future(newf))
                    except StopIteration as e:
                        pass
                    return f.result()
    while len(futures) > 0:
        yield first_to_finish()

async def fetch(url, session):
    async with session.get(url) as response:
        return await response.read()

limit = 1000

async def print_when_done(tasks):
    for res in limited_as_completed(tasks, limit):
        await res

r = int(sys.argv[1])
url = "http://localhost:8080/{}"
loop = asyncio.get_event_loop()
with ClientSession() as session:
    coros = (fetch(url.format(i), session) for i in range(r))
    loop.run_until_complete(print_when_done(coros))
loop.close()

Again, this limits the number of requests to 1000.

Test setup

Finally, we have a test runner script called “timed”:

#!/usr/bin/env bash

./server &
sleep 1 # Wait for server to start

/usr/bin/time --format "Memory usage: %MKB\tTime: %e seconds" "$@"

# %e Elapsed real (wall clock) time used by the process, in seconds.
# %M Maximum resident set size of the process in Kilobytes.

kill %1

This runs each process, ensuring the server is restarted each time it runs, and prints out how long it took to run, and how much memory it used.

Results

When making only 10 requests, the async clients worked faster because they launched all the requests simultaneously and only had to wait for the longest one (3 seconds). The memory usage of all three clients was fine:

$ ./timed ./client-sync 10
Memory usage: 20548KB	Time: 15.16 seconds
$ ./timed ./client-async-sem 10
Memory usage: 24996KB	Time: 3.13 seconds
$ ./timed ./client-async-as-completed 10
Memory usage: 23176KB	Time: 3.13 seconds

When making 100 requests, the synchronous client was very slow, but all three clients worked eventually:

$ ./timed ./client-sync 100
Memory usage: 20528KB	Time: 156.63 seconds
$ ./timed ./client-async-sem 100
Memory usage: 24980KB	Time: 3.21 seconds
$ ./timed ./client-async-as-completed 100
Memory usage: 24904KB	Time: 3.21 seconds

At this point let’s agree that life is too short to wait for the synchronous client.

When making 10000 requests, both async clients worked quite quickly, and both had increased memory usage, but the semaphore-based one used almost twice as much memory as the limited_as_completed version:

$ ./timed ./client-async-sem 10000
Memory usage: 77912KB	Time: 18.10 seconds
$ ./timed ./client-async-as-completed 10000
Memory usage: 46780KB	Time: 17.86 seconds

For 1 million requests, the semaphore-based client took 25 minutes on my (32GB RAM) machine. It only used about 10% of my CPU, and it used a lot of memory (over 3GB):

$ ./timed ./client-async-sem 1000000
Memory usage: 3815076KB	Time: 1544.04 seconds

Note: Paweł’s version only took 9 minutes on his laptop and used all his CPU, so I wonder whether I have made a mistake somewhere, or whether my version of Python (3.5.2) is not as good as a later one.

The limited_as_completed version ran in a similar amount of time but used 100% of my CPU, and used a much smaller amount of memory (162MB):

$ ./timed ./client-async-as-completed 1000000
Memory usage: 162168KB	Time: 1505.75 seconds

Now let’s try 100 million requests. The semaphore-based version lasted 10 hours before it was killed by Linux’s OOM Killer, but it didn’t manage to make any requests in this time, because it creates all its futures before it starts making requests:

$ ./timed ./client-async-sem 100000000
Command terminated by signal 9

I left the limited_as_completed version over the weekend and it managed to succeed eventually:

$ ./timed ./client-async-as-completed 100000000
Memory usage: 294304KB	Time: 150213.15 seconds

So its memory usage was still very bounded, and it managed to do about 665 requests/second over an extended period, which is almost identical to the throughput of the previous cases.

Conclusion

Making a million requests is usually enough, but when we really need to do a lot of work while keeping our memory usage bounded, it looks like an approach like limited_as_completed is a good way to go. I also think it’s slightly easier to understand.

In the next post I describe my attempt to get something like this added to the Python standard library.

Python – printing UTC dates in ISO8601 format with time zone

By default, when you make a UTC date from a Unix timestamp in Python and print it in ISO format, it has no time zone:

$ python3
>>> from datetime import datetime
>>> datetime.utcfromtimestamp(1496998804).isoformat()
'2017-06-09T09:00:04'

Whenever you talk about a datetime, I think you should always include a time zone, so I find this problematic.

The solution is to mention the timezone explicitly when you create the datetime:

$ python3
>>> from datetime import datetime, timezone
>>> datetime.fromtimestamp(1496998804, tz=timezone.utc).isoformat()
'2017-06-09T09:00:04+00:00'

Note, including the timezone explicitly works the same way when creating a datetime in other ways:

$ python3
>>> from datetime import datetime, timezone
>>> datetime(2017, 6, 9).isoformat()
'2017-06-09T00:00:00'
>>> datetime(2017, 6, 9, tzinfo=timezone.utc).isoformat()
'2017-06-09T00:00:00+00:00'

Python 3 – large numbers of tasks with limited concurrency

Series: asyncio basics, large numbers in parallel, parallel HTTP requests, adding to stdlib

I am interested in running large numbers of tasks in parallel, so I need something like asyncio.as_completed, but taking an iterable instead of a list, and with a limited number of tasks running concurrently. First, let’s try to build something pretty much equivalent to asyncio.as_completed. Here is my attempt, but I’d welcome feedback from readers who know better:

# Note this is not a coroutine - it returns
# an iterator - but it crucially depends on
# work being done inside the coroutines it
# yields - those coroutines empty out the
# list of futures it holds, and it will not
# end until that list is empty.
def my_as_completed(coros):

    # Start all the tasks
    futures = [asyncio.ensure_future(c) for c in coros]

    # A coroutine that waits for one of the
    # futures to finish and then returns
    # its result.
    async def first_to_finish():

        # Wait forever - we could add a
        # timeout here instead.
        while True:

            # Give up control to the scheduler
            # - otherwise we will spin here
            # forever!
            await asyncio.sleep(0)

            # Return anything that has finished
            for f in futures:
                if f.done():
                    futures.remove(f)
                    return f.result()

    # Keep yielding a waiting coroutine
    # until all the futures have finished.
    while len(futures) > 0:
        yield first_to_finish()

The above can be substituted for asyncio.as_completed in the code that uses it in the first article, and it seems to work. It also makes a reasonable amount of sense to me, so it may be correct, but I’d welcome comments and corrections.

my_as_completed above accepts an iterable and returns a generator producing results, but inside it starts all tasks concurrently, and stores all the futures in a list. To handle bigger lists we will need to do better, by limiting the number of running tasks to a sensible number.

Let’s start with a test program:

import asyncio
async def mycoro(number):
    print("Starting %d" % number)
    await asyncio.sleep(1.0 / number)
    print("Finishing %d" % number)
    return str(number)

async def print_when_done(tasks):
    for res in asyncio.as_completed(tasks):
        print("Result %s" % await res)

coros = [mycoro(i) for i in range(1, 101)]

loop = asyncio.get_event_loop()
loop.run_until_complete(print_when_done(coros))
loop.close()

This uses asyncio.as_completed to run 100 tasks and, because I adjusted the asyncio.sleep command to wait longer for earlier tasks, it prints something like this:

$ time python3 python-async.py
Starting 47
Starting 93
Starting 48
...
Finishing 93
Finishing 94
Finishing 95
...
Result 93
Result 94
Result 95
...
Finishing 46
Finishing 45
Finishing 42
...
Finishing 2
Result 2
Finishing 1
Result 1

real    0m1.590s
user    0m0.600s
sys 0m0.072s

So all 100 tasks were completed in 1.5 seconds, indicating that they really were run in parallel, but all 100 were allowed to run at the same time, with no limit.

We can adjust the test program to run using our customised my_as_completed function, and pass in an iterable of coroutines instead of a list by changing the last part of the program to look like this:

async def print_when_done(tasks):
    for res in my_as_completed(tasks):
        print("Result %s" % await res)
coros = (mycoro(i) for i in range(1, 101))
loop = asyncio.get_event_loop()
loop.run_until_complete(print_when_done(coros))
loop.close()

But we get similar output to last time, with all tasks running concurrently.

To limit the number of concurrent tasks, we limit the size of the futures list, and add more as needed:

from itertools import islice
def limited_as_completed(coros, limit):
    futures = [
        asyncio.ensure_future(c)
        for c in islice(coros, 0, limit)
    ]
    async def first_to_finish():
        while True:
            await asyncio.sleep(0)
            for f in futures:
                if f.done():
                    futures.remove(f)
                    try:
                        newf = next(coros)
                        futures.append(
                            asyncio.ensure_future(newf))
                    except StopIteration as e:
                        pass
                    return f.result()
    while len(futures) > 0:
        yield first_to_finish()

We start limit tasks at first, and whenever one ends, we ask for the next coroutine in coros and set it running. This keeps the number of running tasks at or below limit until we start running out of input coroutines (when next throws and we don’t add anything to futures), then futures starts emptying until we eventually stop yielding coroutine objects.

I thought this function might be useful to others, so I started a little repo over here and added it: asyncioplus/limited_as_completed.py. Please provide merge requests and log issues to improve it – maybe it should be part of standard Python?

When we run the same example program, but call limited_as_completed instead of the other versions:

async def print_when_done(tasks):
    for res in limited_as_completed(tasks, 10):
        print("Result %s" % await res)
coros = (mycoro(i) for i in range(1, 101))
loop = asyncio.get_event_loop()
loop.run_until_complete(print_when_done(coros))
loop.close()

We see output like this:

$ time python3 python-async.py
Starting 1
Starting 2
...
Starting 9
Starting 10
Finishing 10
Result 10
Starting 11
...
Finishing 100
Result 100
Finishing 1
Result 1

real	0m1.535s
user	0m1.436s
sys	0m0.084s

So we can see that the tasks are still running concurrently, but this time the number of concurrent tasks is limited to 10.

See also

To achieve a similar result using semaphores, see Python asyncio.semaphore in async-await function and Making 1 million requests with python-aiohttp.

It feels like limited_as_completed is more re-usable as an approach but I’d love to hear others’ thoughts on this. E.g. could/should I use a semaphore to implement limited_as_completed instead of manually holding a queue?

Basic ideas of Python 3 asyncio concurrency

Series: asyncio basics, large numbers in parallel, parallel HTTP requests, adding to stdlib

Python 3’s asyncio module and the async and await keywords combine to allow us to do cooperative concurrent programming, where a code path voluntarily yields control to a scheduler, trusting that it will get control back when some resource has become available (or just when the scheduler feels like it). This way of programming can be very confusing, and has been popularised by Twisted in the Python world, and nodejs (among others) in other worlds.

I have been trying to get my head around the basic ideas as they surface in Python 3’s model. Below are some definitions and explanations that have been useful to me as I tried to grasp how it all works.

Futures and coroutines are both things that you can wait for.

You can make a coroutine by declaring it with async def:

import asyncio
async def mycoro(number):
    print("Starting %d" % number)
    await asyncio.sleep(1)
    print("Finishing %d" % number)
    return str(number)

Almost always, a coroutine will await something such as some blocking IO. (Above we just sleep for a second.) When we await, we actually yield control to the scheduler so it can do other work and wake us up later, when something interesting has happened.

You can make a future out of a coroutine, but often you don’t need to. Bear in mind that if you do want to make a future, you should use ensure_future, but this actually runs what you pass to it – it doesn’t just create a future:

myfuture1 = asyncio.ensure_future(mycoro(1))
# Runs mycoro!

But, to get its result, you must wait for it – it is only scheduled in the background:

# Assume mycoro is defined as above
myfuture1 = asyncio.ensure_future(mycoro(1))
# We end the program without waiting for the future to finish

So the above fails like this:

$ python3 ./python-async.py
Task was destroyed but it is pending!
task: <Task pending coro=<mycoro() running at ./python-async:10>>
sys:1: RuntimeWarning: coroutine 'mycoro' was never awaited

The right way to block waiting for a future outside of a coroutine is to ask the event loop to do it:

# Keep on assuming mycoro is defined as above for all the examples
myfuture1 = asyncio.ensure_future(mycoro(1))
loop = asyncio.get_event_loop()
loop.run_until_complete(myfuture1)
loop.close()

Now this works properly (although we’re not yet getting any benefit from being asynchronous):

$ python3 python-async.py
Starting 1
Finishing 1

To run several things concurrently, we make a future that is the combination of several other futures. asyncio can make a future like that out of coroutines using asyncio.gather:

several_futures = asyncio.gather(
    mycoro(1), mycoro(2), mycoro(3))
loop = asyncio.get_event_loop()
print(loop.run_until_complete(several_futures))
loop.close()

The three coroutines all run at the same time, so this only takes about 1 second to run, even though we are running 3 tasks, each of which takes 1 second:

$ python3 python-async.py
Starting 3
Starting 1
Starting 2
Finishing 3
Finishing 1
Finishing 2
['1', '2', '3']

asyncio.gather won’t necessarily run your coroutines in order, but it will return a list of results in the same order as its input.

Notice also that run_until_complete returns the result of the future created by gather – a list of all the results from the individual coroutines.

To do the next bit we need to know how to call a coroutine from a coroutine. As we’ve already seen, just calling a coroutine in the normal Python way doesn’t run it, but gives you back a “coroutine object”. To actually run the code, we need to wait for it. When we want to block everything until we have a result, we can use something like run_until_complete but in an async context we want to yield control to the scheduler and let it give us back control when the coroutine has finished. We do that by using await:

import asyncio
async def f2():
    print("start f2")
    await asyncio.sleep(1)
    print("stop f2")
async def f1():
    print("start f1")
    await f2()
    print("stop f1")
loop = asyncio.get_event_loop()
loop.run_until_complete(f1())
loop.close()

This prints:

$ python3 python-async.py
start f1
start f2
stop f2
stop f1

Now we know how to call a coroutine from inside a coroutine, we can continue.

We have seen that asyncio.gather takes in some futures/coroutines and returns a future that collects their results (in order).

If, instead, you want to get results as soon as they are available, you need to write a second coroutine that deals with each result by looping through the results of asyncio.as_completed and awaiting each one.

# Keep on assuming mycoro is defined as at the top
async def print_when_done(tasks):
    for res in asyncio.as_completed(tasks):
        print("Result %s" % await res)
coros = [mycoro(1), mycoro(2), mycoro(3)]
loop = asyncio.get_event_loop()
loop.run_until_complete(print_when_done(coros))
loop.close()

This prints:

$ python3 python-async.py
Starting 1
Starting 3
Starting 2
Finishing 3
Result 3
Finishing 2
Result 2
Finishing 1
Result 1

Notice that task 3 finishes first and its result is printed, even though tasks 1 and 2 are still running.

asyncio.as_completed returns an iterable sequence of futures, each of which must be awaited, so it must run inside a coroutine, which must be waited for too.

The argument to asyncio.as_completed has to be a list of coroutines or futures, not an iterable, so you can’t use it with a very large list of items that won’t fit in memory.

Side note: if we want to work with very large lists, asyncio.wait won’t help us here – it also takes a list of futures and waits for all of them to complete (like gather), or, with other arguments, for one of them to complete or one of them to fail. It then returns two sets of futures: done and not-done. Each of these must be awaited to get their results, so:

asyncio.gather

# is roughly equivalent to:

async def mygather(*args):
    ret = []
    for r in (await asyncio.wait(args))[0]:
        ret.append(await r)
    return ret

I am interested in running very large numbers of tasks with limited concurrency – see the next article for how I managed it.

C++ iterator wrapping a stream not 1-1

Series: Iterator, Iterator Wrapper, Non-1-1 Wrapper

Sometimes we want to write an iterator that consumes items from some underlying iterator but produces its own items slower than the items it consumes, like this:

ColonSep items("aa:foo::x");
// Prints "aa, foo, , x"
for(auto s : items)
{
    std::cout << s << ", ";
}

When we pass a 9-character string (i.e. an iterator that yields 9 items) to ColonSep, above, we only repeat 4 times in our loop, because ColonSep provides an iterable range that yields one value for each whole word it finds in the underlying iterator of 9 characters.

To do something like this I'd recommend consuming the items of the underlying iterator early, so it is ready when requested with operator*. We also need our iterator class to hold on to the end of the underlying iterator as well as the current position.

First we need a small container to hold the next item we will provide:

struct maybestring
{
    std::string value_;
    bool at_end_;

    explicit maybestring(const std::string value)
    : value_(value)
    , at_end_(false)
    {}

    explicit maybestring()
    : value_("--error-past-end--")
    , at_end_(true)
    {}
};

A maybestring either holds the next item we will provide, or at_end_ is true, meaning we have reached the end of the underlying iterator and we will report that we are at the end ourself when asked.

Like the simpler iterators we have looked at, we still need a little container to return from the postincrement operator:

class stringholder
{
    const std::string value_;
public:
    stringholder(const std::string value) : value_(value) {}
    std::string operator*() const { return value_; }
};

Now we are ready to write our iterator class, which always has the next value ready in its next_ member, and holds on to the current and end positions of the underlying iterator in wrapped_ and wrapped_end_:

class myit
{
private:
    typedef std::string::const_iterator wrapped_t;
    wrapped_t wrapped_;
    wrapped_t wrapped_end_;
    maybestring next_;

The constructor holds on the underlying iterator pointers, and immediately fills next_ with the next value by calling next_item passing in true to indicate that this is the first item:

public:
    myit(wrapped_t wrapped, wrapped_t wrapped_end)
    : wrapped_(wrapped)
    , wrapped_end_(wrapped_end)
    , next_(next_item(true))
    {
    }

    // Previously provided by std::iterator
    typedef int                     value_type;
    typedef std::ptrdiff_t          difference_type;
    typedef int*                    pointer;
    typedef int&                    reference;
    typedef std::input_iterator_tag iterator_category;

next_item looks like this:

private:
    maybestring next_item(bool first_time)
    {
        if (wrapped_ == wrapped_end_)
        {
            return maybestring();  // We are at the end
        }
        else
        {
            if (!first_time)
            {
                ++wrapped_;
            }
            return read_item();
        }
    }

next_item recognises whether we've reached the end of the underlying iterator and saves the empty maybstring if so. Otherwise, it skips forward once (unless we are on the first element) and then calls read_item:

    maybestring read_item()
    {
        std::string ret = "";
        for (; wrapped_ != wrapped_end_; ++wrapped_)
        {
            char c = *wrapped_;
            if (c == ':')
            {
                break;
            }
            ret += c;
        }
        return maybestring(ret);
    }

read_item implements the real logic of looping through the underlying iterator and combining those values together to create the next item to provide.

The hard part of the iterator class is done, leaving only the more normal functions we must provide:

public:
    value_type operator*() const
    {
        assert(!next_.at_end_);
        return next_.value_;
    }

    bool operator==(const myit& other) const
    {
        // We only care about whether we are at the end
        return next_.at_end_ == other.next_.at_end_;
    }

    bool operator!=(const myit& other) const { return !(*this == other); }

    stringholder operator++(int)
    {
        assert(!next_.at_end_);
        stringholder ret(next_.value_);
        next_ = next_item(false);
        return ret;
    }

    myit& operator++()
    {
        assert(!next_.at_end_);
        next_ = next_item(false);
        return *this;
    }
}

Note that operator== is only concerned with whether or not we are an end iterator or not. Nothing else matters for providing correct iteration.

Our final bit of bookkeeping is the range class that allows our new iterator to be used in a for loop:

class ColonSep
{
private:
    const std::string str_;
public:
    ColonSep(const std::string str) : str_(str) {}
    myit begin() { return myit(std::begin(str_), std::end(str_)); }
    myit end()   { return myit(std::end(str_),   std::end(str_)); }
};

A lot of the code above is needed for all code that does this kind of job. Next time we'll look at how to use templates to make it useable in the general case.

A Magical new World — Thoughts of a first time ACCU attendee.

I went to my first ACCU Conference last week. It was great.

ACCU Conference

I’d heard about ACCU from Russel Winder several months ago. He recommended I check out the conference (for which hes on the programme board) since I’m a fan and user of the C and C++ languages.

I arrived in Bristol on Tuesday excited for what the week held.

This post contains a section about the talks and a section about my experience at the bottom.

Be aware that some of the photos might not look as good on here as they should, I think Medium has compressed them a bit. All my photos should be online shortly.

The Talks

Russ Miles opens ACCU 2017

We started the conference proper with a fantastically explosive keynote delivered by Russ Miles who jumped on stage to deliver a programming parody of Highway to Hell accompanied by his own guitar playing.
His keynote was all about modern development and how most of a programmer’s tools currently just shout information at the programmer, rather than actually helping.

Later on the Wednesday I headed into a talk from Kevlin Henny that totally re-jigged how I think about concurrency. Thinking outside the Synchronisation Quadrant was wonderfully entertaining, with Kevlin excitedly bouncing across the floor.
A fantastically engaging speaker.

Lightning talks on Wednesday

Wednesday’s talks continued with several other good talks and a number of great lightning talks too.
Finalising with the welcome reception where delegates gathered in the hotel for drinks, food and conversation.

It was here that I really got the chance to socialise with a good few people, including Anna-Jayne and Beth, who I’d been excited about meeting since I found out they were going to be there!

Thursday began with an interesting keynote about the Chapel parallel programming language. The talk has encouraged me to try the language out and I’ll certainly be having a good play with that soon.

Peter Hilton’s Documentation for Developers workshop

Thursday’s stand out talks included Documentation for Developers workshop by Peter Hilton. I really enjoyed the workshoppy style that Peter used to deliver the talk. He got the audience working in groups, talking to each other and essentially complaining about documentation. He finished with suggesting a method of writing docs called Readme Driven Development as well as other suggestions.

The other talk on Thursday which I really loved was “The C++ Type System is your Friend”. Hubert Matthews was a great speaker with clear experience in explaining a complex topic in an easier to understand fashion.
I can’t say I understood everything, but I certainly liked listening to Hubert speak.

Thursday evening I headed out for dinner with Anna-Jayne and Beth before heading back to my accommodation to write up a last minute talk for Friday.

My talk was covering Intel Software Guard Extensions — Russel announced that there was an open slot on Friday for a 15 minute topic and I took the chance to speak then.

Friday began with a curious but thought provoking talk from Fran Buontempo called AI: Actual Intelligence.
I’m not entirely sure what the take away from the talk was intended to be, but nonetheless it was interesting!

Friday morning was full of 15 minute talks. A format I think is wonderful.
I really loved that amongst the 90 min talks throughout the rest of the week, there was time for these quick fire shorter talks too that were still serious technical talks (unlike the 5min lightning talks).

The talks I went to see were:

Odin Holmes talks about Named Parameters

At Friday lunch time I took part in a bit of an unplanned workshop on sketch noting with Michel Grootjans. It was essentially an hour of trying to make our notes prettier!
It was a lot of fun.

Sketch Noting with Michel Grootjans

Friday was the conference dinner — a rock themed night of fun and frivolities.

This was by far the high point of the conference for me.
It offered a great evening of meeting people and having a lot of fun.
I loved how everyone loosened up and spoke to anyone else there.

I met a whole bunch of people, and got on super well with a few people who I would like to consider friends now.

ACCU made it easy to get to know people too by forcing everyone who isn't a speaker to move tables between each meal course. Its a great idea!

Odin enjoys inflatable instruments

Saturday’s talks started with a really fun talk from Arjan van Leeuwen about string handling in C++1x. Covering the differences between char arrays and std::strings and how best to use them. As well as tantalising us with a C++17 feature called std::string_view (immutable views of a string).

Later I watched a talk from Anthony Williams and another from Odin, both of which went wildly over my head, but all the same I gained a few things from both of them.

Finishing off the conference was a brilliant keynote from renowned speaker and member of the ISO C++ standards committee Herb Sutter.

Herb Sutter on Metaclasses at ACCU 2017

Herb introduced a new feature of C++ that he may be proposing to the standards committee.
He described a feature allowing one to create meta-classes.

Essentially, one could describe a template of a class with certain interfaces, data and operators. Then, one could implement an instance of that class defining all the functionality of the class.
Its essentially a way to more cleanly describe something akin to inheritance with virtual functions.

I highly suggest you try to catch the talk, since it was so interesting that even an hour or so after the talk there was still quite a crowd of people gathered around Herb asking him questions.

Herb is surrounded by curious programmers

The Conference Environment

As a first time ACCU attendee — this wouldn't be a useful blog post without a few words about the environment at the conference.

As most of my readers know, I’m a young transwomen, so a safe and welcome environment is something that I very much appreciate and makes a huge difference to my experience of an event.

Its something thats super hard to achieve in a world like software development where the workforce are predominantly male.

I’m glad to say that ACCU did a great job of creating a safe and welcoming space. Despite being predominantly male as expected, everyone I encountered was not only friendly and helpful, but ever so willing to go out of their way to make me feel welcome and comfortable.
Everyone I met simply accepted me for me and didnt treat me any other way than friendly.

I would suggest that offering diversity tickets to ACCU would help make me feel even better there, since I’d feel better with a more diverse set of delegates.

I was especially comforted by Russel mentioning the code of conduct, without fail, every day of the conference. As well as one of the lightning talks being, delivered by a man, taking the form of a spoke word-ish piece praising the welcoming nature of ACCU and calling for the maintenance of the welcoming nature to all people in the community, not just people like himself.

I’d like to especially mention Julie and the Archer-Yates team for checking up on my happiness throughout the conference, they really helped me feel safe there.

I think there still could be work to do about making the conference a good place for younger adults — I was rather overwhelmed by the fact that everyone seemed older than me and clearly had a better idea of how to conduct themselves in the conference setting.
However, I think the only real way of solving this problem would be to make the conference easier to access to younger people (i,e cheaper tickets for students, its still super expensive) which wouldn't always be possible. Additionally, the inclusion of some simpler, easier to understand talks would have been great. Lots of the talks were very complicated and easily got to a level that was way over my head.

Thanks to everyone who helped me feel welcome at ACCU — including but not limited to Richard, Antonello, Anna-Jayne, Beth, Jackie, Fran, Russel and Odin.

In conclusion

ACCU was a fantastic experience for me. I would highly recommend it to anyone interested in improving their C and C++ programming skills as well as general programming skills.
I’ll certainly be heading back next year if I can, and am happily a registered ACCU member now!