Migrating videos from YouTube to PeerTube inside a Docker container

I have quite a few videos hosted on YouTube that I would like to upload to my new PeerTube location, but I don’t want to install all the PeerTube dependencies on my machine, so I did it all inside a Docker image.

First I built and started a Docker container:

$ git clone https://github.com/chocobozzz/PeerTube /tmp/peertube
$ cd /tmp/peertube
$ docker build . -f ./support/docker/production/Dockerfile.stretch --tag peertube
$ docker run --tty --interactive peertube bash

Then I ran these commands inside it:

# yarn install --production=false
# node dist/server/tools/import-videos.js -u "https://peertube.mastodon.host" -U "andybalaam" -t "https://www.youtube.com/watch?v=TG0qRDrUPpA"

Of course, it would be better to write this up into its own Dockerfile to make this a one-liner.

References: PeerTube Docker setup, PeerTube video import.

How to write a programming language articles

Recent Overload journal issues contain my new articles on How to Write a Programming Language.

Part 1: How to Write a Programming Language: Part 1, The Lexer

Part 2: How to Write a Programming Language: Part 2, The Parser

PDF of the latest issue: Overload 146 containing part 2.

This is all creative-commons licensed and developed in public at github.com/andybalaam/articles-how-to-write-a-programming-language

Allow drag-to-side, but not drag-to-top in Ubuntu MATE (Marco)

I love the “tiling” feature in many window managers including Marco that means I can drag windows to the side of the screen and get them covering one half.

However, I never use the similar feature that allows dragging a window to the top, and it often triggers when I just want to move a window upwards.

Today I discovered that Marco (the non-fancy window manager in Ubuntu MATE and probably other places) does allow me to have one without the other, even though the UI configuration tools don’t expose the option.

Here’s what I did:

gsettings set org.mate.Marco.general allow-top-tiling false

Now my windows can be dragged to the side for half-screen maximisation, but not to the top for full-screen!

Connecting to Slack from an IRC client using slirc

I tried to get back to an IRC interface to Slack using Matrix, and it had some problems. Thanks to Colin Watson’s comment on that post, I tried Daniel Beer’s slirc, and so far it seems to be working pretty well.

Here’s what I did:

Get a Slack legacy token which slirc will use to connect to Slack as you. Follow the instructions given at that link, and you should end up with a token that looks something like “abcd-123456768-ETC-ETC”. Keep a note of it.

Install the prerequisites for slirc, and download it:

sudo cpan AnyEvent AnyEvent::HTTP AnyEvent::Socket AnyEvent::WebSocket::Client URI::Encode Data::Dumper JSON
mkdir slirc
cd slirc
wget -q 'https://www.dlbeer.co.nz/articles/slirc/slirc-20180515.pl'
chmod +x slirc-20180515.pl

Create a file in the slirc directory you created above, called rc.conf, and make it look like this:

slack_token=abcd-123456768-ETC-ETC
password=somepassword
port=6667

Replace “abcd-123456768-ETC-ETC” with the Slack legacy token you noted down earlier.

Replace “somepassword” with something you’ve made up (not your Slack password) – this is what you will type as the password in your IRC client.

Run slirc and leave it running:

./slirc-20180515.pl rc.conf

(Make sure you are inside the slirc dir when you run that.)

Start your IRC client (e.g. HexChat) and add a server with address “localhost” and port 6667, with your slack username and the password you added in the rc.conf (which you wrote instead of “somepassword”).

This mostly works for me, except it has a tendency to open a load of ad-hoc chats as channels, so I have to close them all to get a usable list.

Using Matrix to connect to Slack from an IRC client on Ubuntu

Update: I found a better solution using slirc.

I like using HexChat to talk to my colleagues, like one other guy. It is fast, and it pops up a new window when someone sends me a direct/private message.

Recently, Slack shut down their IRC gateway, forcing me to use their slow UI, and making me unable to be responsive to direct messages without spending my entire life checking said UI, or being disturbed by notifications.

So, I decided to set up a Matrix bridge, because how hard can it be?

My system is Ubuntu MATE 18.04.

What I did

Install Synapse

Install Synapse, a Matrix homeserver.

wget -qO - https://matrix.org/packages/debian/repo-key.asc | sudo apt-key add -
sudo apt-add-repository http://matrix.org/packages/debian/
sudo apt update
sudo apt install matrix-synapse

Edit /etc/matrix-synapse/homeserver.yaml to change the line containing “registration_shared_secret” to look something like:

registration_shared_secret: "some secret password you made up"

(Replacing “some secret password you made up” with something secret.)

Make a new user:

register_new_matrix_user -c /etc/matrix-synapse/homeserver.yaml http://localhost:8008

Enter a username and password, and take a note of them.

Install Riot

(Technically, this step is optional, but I definitely recommend it to check everything is working.)

Install Riot, a client that can talk to a homeserver.

curl https://riot.im/packages/debian/repo-key.asc | sudo apt-key add -
sudo apt-add-repository https://riot.im/packages/debian/
sudo apt update
sudo apt install riot-web

Start Riot from the menu or by typing riot-web in a console.

Choose to log in to a Custom Server (not create a new user), and enter the Home Server URL as http://localhost:8008, leave the Identity Server URL as it is, and enter the username and password you entered in the previous step.

Riot should be able to connect to the Synapse server, even though it won’t have any rooms in it yet.

If this doesn’t work, try running riot-web in a console, and check the Synapse log file at /var/log/matrix-synapse/homeserver.log

Install matrix-puppet-slack

(Note: after this step, the homeserver can only be for you now, because it is logged into Slack as you. If you want to make a new Matrix server for lots of people, I found that matrix-appservice-slack works, but this only connects channels, not direct messages.)

Get a Slack legacy token which matrix-puppet-slack will use to connect to Slack as you. Follow the instructions given at that link, and you should end up with a token that looks something like “abcd-123456768-ETC-ETC”. Keep a note of it.

Install matrix-puppet-slack which allows you to connect a homeserver to a Slack instance, pretending it is you:

cd
git clone https://github.com/matrix-hacks/matrix-puppet-slack.git
cd matrix-puppet-slack/
npm install
cp config.sample.json config.json

Edit config.json so it looks something like this:

{
  "slack": [{
    "team_name": "Name of my team",
    "user_access_token": "abcd-123456768-ETC-ETC"
  }],
  "registrationPath": "slack-registration.yaml",
  "port": 8090,
  "bridge": {
    "homeserverUrl":"http://localhost:8008",
    "domain": "localhost",
    "registration": "slack-registration.yaml"
  }
}

Now generate a config file that Synapse will use, called slack-registration.yaml, by typing this command (still in the matrix-puppet-slack directory):

node index.js -r -u "http://localhost:8090"

When asked, type in the username and password you entered in the first step (when you ran the register_new_matrix_user command).

This creates a config file telling Synapse that matrix-puppet-slack is running on the local machine on port 8090.

Now, run it:

node index.js

and edit Synapse’s config file /etc/matrix-synapse/homeserver.yaml to point at the new config file you just made, by editing the line mentioning app_service_config_files so that it looks like this:

app_service_config_files: [
    "/path/to/matrix-puppet-slack/slack-registration.yaml"
]

Make sure you replace “/path/to/matrix-puppet-slack” in the above with the directory where you put matrix-puppet-slack, which is /home/yourusername/matrix-puppet-slack if you followed the instructions exactly.

Restart Synapse, and when you go back into Riot, you should see your Slack channels appear gradually (as people talk in them):

sudo service matrix-synapse restart

If not, check the output from the node index.js command, and the Synapse log file at /var/log/matrix-synapse/homeserver.log.

Install matrix-ircd

To allow connecting to Matrix from an IRC client, install matrix-ircd:

cd
git clone https://github.com/matrix-org/matrix-ircd.git
cd matrix-ircd
cargo build
cargo run -- --url "http://localhost:8008"

(Note this works on Ubuntu 18.04, but operating systems with older versions of Rust may need to get the latest first – see the matrix-ircd page for details.)

Connect from an IRC client

Now use an IRC client to connect to matrix-ird. I recommend HexChat if you’re looking for one.

matrix-ircd runs on port 5999 by default, so you should be able to connect to it from your IRC client by setting the server to localhost/5999 and using the username and password from the first step (when you ran the register_new_matrix_user command). Use plain username/password authentication.

If it works, you should see your Slack channels appear in your IRC client. If not check the logs mentioned before, and the error messages and logs from your IRC client.

What works and doesn’t work

  • Messages typed by me and others into IRC, Riot and Slack appear in all the other places, including direct messages.
  • Synapse and Riot are installed fully, so Synapse will be running after a reboot, and Riot is available in the menus, but matrix-puppet-slack and matrix-ircd are just running in a console, and have to be manually started. It should be reasonably simple to make custom systemd files to start them automatically.
  • matrix-puppet-slack crashed once and I had to restart it, so this may not be very reliable. I logged an issue.
  • The Slack channel names look terrible in my IRC client. I think matrix-ircd needs to find the pretty names (which are displayed correctly in Riot) and use them instead of the coded names used under the cover in Matrix.
  • My first ever direct message to someone does not work, even though I can choose them and attempt to send a message. As soon as they say something to me, they appear as a channel in Riot and IRC, and I can talk back and forth no problem from then on.
  • Direct messages look like normal channels in Matrix, which means I can’t use HexChat to pop up notifications for them, so this was all pointless.

Please leave comments below if you know how I can fix any of these problems.

Conclusions

  • Matrix seems really cool
  • It is basically possible to bridge IRC-Matrix-Slack as I wanted
  • But, some remaining bugs mean I have to keep using Slack’s UI for now :-(
  • Please comment telling me how to do this better

Examples of SQL join types (LEFT JOIN, INNER JOIN etc.)

I have 2 tables like this:

> SELECT * FROM table_a;
+------+------+
| id   | name |
+------+------+
|    1 | row1 |
|    2 | row2 |
+------+------+

> SELECT * FROM table_b;
+------+------+------+
| id   | name | aid  |
+------+------+------+
|    3 | row3 |    1 |
|    4 | row4 |    1 |
|    5 | row5 | NULL |
+------+------+------+

INNER JOIN cares about both tables

INNER JOIN cares about both tables, so you only get a row if both tables have one. If there is more than one matching pair, you get multiple rows.

> SELECT * FROM table_a a INNER JOIN table_b b ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | id   | name | aid  |
+------+------+------+------+------+
|    1 | row1 |    3 | row3 | 1    |
|    1 | row1 |    4 | row4 | 1    |
+------+------+------+------+------+

It makes no difference to INNER JOIN if you reverse the order, because it cares about both tables:

> SELECT * FROM table_b b INNER JOIN table_a a ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | aid  | id   | name |
+------+------+------+------+------+
|    3 | row3 | 1    |    1 | row1 |
|    4 | row4 | 1    |    1 | row1 |
+------+------+------+------+------+

You get the same rows, but the columns are in a different order because we mentioned the tables in a different order.

LEFT JOIN only cares about the first table

LEFT JOIN cares about the first table you give it, and doesn’t care much about the second, so you always get the rows from the first table, even if there is no corresponding row in the second:

> SELECT * FROM table_a a LEFT JOIN table_b b ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | id   | name | aid  |
+------+------+------+------+------+
|    1 | row1 |    3 | row3 | 1    |
|    1 | row1 |    4 | row4 | 1    |
|    2 | row2 | NULL | NULL | NULL |
+------+------+------+------+------+

Above you can see all rows of table_a even though some of them do not match with anything in table b, but not all rows of table_b – only ones that match something in table_a.

If we reverse the order of the tables, LEFT JOIN behaves differently:

> SELECT * FROM table_b b LEFT JOIN table_a a ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | aid  | id   | name |
+------+------+------+------+------+
|    3 | row3 | 1    |    1 | row1 |
|    4 | row4 | 1    |    1 | row1 |
|    5 | row5 | NULL | NULL | NULL |
+------+------+------+------+------+

Now we get all rows of table_b, but only matching rows of table_a.

RIGHT JOIN only cares about the second table

a RIGHT JOIN b gets you exactly the same rows as b LEFT JOIN a. The only difference is the default order of the columns.

> SELECT * FROM table_a a RIGHT JOIN table_b b ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | id   | name | aid  |
+------+------+------+------+------+
|    1 | row1 |    3 | row3 | 1    |
|    1 | row1 |    4 | row4 | 1    |
| NULL | NULL |    5 | row5 | NULL |
+------+------+------+------+------+

This is the same rows as table_b LEFT JOIN table_a, which we saw in the LEFT JOIN section.

Similarly:

> SELECT * FROM table_b b RIGHT JOIN table_a a ON a.id=b.aid;
+------+------+------+------+------+
| id   | name | aid  | id   | name |
+------+------+------+------+------+
|    3 | row3 | 1    |    1 | row1 |
|    4 | row4 | 1    |    1 | row1 |
| NULL | NULL | NULL |    2 | row2 |
+------+------+------+------+------+

Is the same rows as table_a LEFT JOIN table_b.

No join at all gives you copies of everything

If you write your tables with no JOIN clause at all, just separated by commas, you get every row of the first table written next to every row of the second table, in every possible combination:

> SELECT * FROM table_b b, table_a;
+------+------+------+------+------+
| id   | name | aid  | id   | name |
+------+------+------+------+------+
|    3 | row3 | 1    |    1 | row1 |
|    3 | row3 | 1    |    2 | row2 |
|    4 | row4 | 1    |    1 | row1 |
|    4 | row4 | 1    |    2 | row2 |
|    5 | row5 | NULL |    1 | row1 |
|    5 | row5 | NULL |    2 | row2 |
+------+------+------+------+------+

Fixing Slack emojis in HexChat

If, like me, you are this person:

(Source: xkcd.com/1782)

You may want to fix the stupid :slightly_smiling_face: messages you receive from Slack via the IRC gateway. Obviously, I’d prefer they went away entirely, but it’s still better to see a character than being spammed with colon abominations all over the place.

You’ll need the Python emoji package, and a HexChat plugin like this:

# Replace all the horrible :slightly_smiling_face: rubbish that Slack inserts
# into horrible Unicode emoji symbols.
# Author: Andy Balaam
# License: CC0 https://creativecommons.org/publicdomain/zero/1.0/

# Requires https://pypi.python.org/pypi/emoji - I used 0.4.5
# I manually copied the emoji dir into:
# /home/andrebal/.local/lib/python2.7/site-packages
import emoji
import hexchat

__module_name__ = "slack-emojis"
__module_version__ = "1.0"
__module_description__ = "Translate emojis from Slack with colons into emojis"

print "Loading slack-emojis"
chmsg = "Channel Message"
prmsg = "Private Message to Dialog"

def preprint(words, word_eol, userdata):
    txt = word_eol[1]
    replaced = emoji.emojize(txt, use_aliases=True)
    if replaced != txt:
        hexchat.emit_print(
            userdata["msgtype"],
            words[0],
            replaced.encode('utf-8'),
        )
        return hexchat.EAT_HEXCHAT
    else:
        return hexchat.EAT_NONE

hexchat.hook_print(chmsg, preprint, {"msgtype": chmsg})
hexchat.hook_print(prmsg, preprint, {"msgtype": prmsg})

According to the page linked above, Slack are retiring the IRC gateway, which will make me very unhappy.

Update: added support for private messages too.

Ideas on how lexing will work in Pepper3

I am trying to practice documentation-driven development in Pepper3, so every time I start on an area, I will write documentation explaining how it works, and include examples that are automatically verified during the build.

I’ve started work on lexing, since you can’t do much before you do that, but in fact, of course, I need to have a command line interface before I can verify any of the examples, so I’m working on that too.

Lexing is the process that takes a stream of characters (e.g. from a file) and turns it into a stream of “tokens” that are chunks of code like a variable name, a number or a string. (There is more on lexing in my mini programming language, Cell.)

My thoughts so far about lexing are in lexing.md, and current ideas about command line interface are at command_line.md. All very much subject to change.

Headlines:

  • Ordinary programmers can write their own lexing rules.
  • Operators (functions like “+” that find their arguments on their left and right, instead of between brackets like normal functions) are defined at the lexing phase, so any symbol (e.g. “in”) can be an operator if you want.
  • Anything you might want to do with a pepper program, including running it, compiling it, packaging it for an distribution system, should be available as a sub-command of the main pepper3 command line.
  • The command is “pepper3”, never “pepper”. If a new, incompatible version comes out, it will be called “pepper4”, and they will be parallel-installable, with no confusion.

Deleting commits from the git history

Today I wanted to fix a Git repo that contained some bad commits (i.e. git fsck complained about them). [I wanted to do this because GitLab was not allowing me to push the bad commits.]

I wanted the code to look exactly as it did before, but the history to look different, so the bad commits disappeared, and (presumably) the work done in the bad commits to look like it was done in the commits following them.

Here’s what I ran:

git filter-branch -f --commit-filter '

    if [ "${GIT_COMMIT}" = "abdcef012345abcdef012345etcetcetc" ];
    then
        echo "Skipping GIT_COMMIT=${GIT_COMMIT}" >&2;
        skip_commit "$@";
    else
        git commit-tree "$@";
    fi
' --tag-name-filter cat -- --all

(Where abdcef012345abcdef012345etcetcetc was the ID of the commit I wanted to delete.)

Of course, you can make this cleverer to exclude multiple commits at a time, or run this several times, putting in the right commit ID each time.

Questions and answers about Pepper3

Series: Examples, Questions

My last post Examples of Pepper3 code was a reply to my friend’s email asking what it was all about. They replied with some questions, and I thought the questions and answers might shed some more light:

Questions!

Brilliant ones, thanks.

In general though you’ve said a lot about what Pepper can do without giving design decisions.

Yep, total brain dump.

Remind me again who this language is for :)

It’s a multi-paradigm (generic, functional, OO) language aimed at application programmers who want:

  • “native” performance on their chosen platform (definitely including actual native machine code). This is inspired by C++.
  • easy deployment (preferably a single binary containing everything, with an option to link most dependencies statically), including packaging of installers for major OSes. This is inspired by C++, and the pain of C++.
  • perfect flexibility for creating types – “meta-programming” is just programming. Things you would have done using code generation (e.g. generating a class hierarchy from an XSD) are done by running arbitrary code at compile time. The powerful type system is inspired by Haskell and the book “Modern C++ Design”, and the meta-programming is inspired by Lisp.
  • Simple memory management without GC through ownership. This is inspired by modern C++, and then Rust came along and implemented it before I could, thus proving it works. However, I would remove a lot of the functionality in Rust (lifetimes) to make it much simpler.
  • Strong support for functional programming if you want it. This is inspired by Haskell.
  • The simplest possible core language, with application programmers able to expand it by giving them the same tools as the language designers – e.g. “for” is just a function, so you can make your own. I am hoping I can even make “class” a function. This is inspired by Lisp, and oppositely-inspired by Java.
  • Separation between the idea of Interfaces, which I think I will call “type specifiers” (and will allow arbitrary code execution to determine whether a type satisfies the requirements) and structs/classes, allowing us to make new Interfaces and have old code satisfy them, meaning we can do generic stuff with e.g. ints even if
    no-one declared that “class Int : public Quaternion” or whatever.
  • Lots of “nudges” towards things that are good: by default things will be functional and immutable – you will have to explicitly say if you want to use more dangerous constructs like side effects and mutable values.
  • No implicit conversions, or really anything happening without you saying so.

Can you assign floats to ints or vice versa?

Yes, but you shouldn’t.

If you’re setting types in code at the start of a file, is this only available in the main file? Are there multiple files per program? Can
you have libraries? If so, do these decide the functionality of their types in the library or does this only happen in the main file?

I haven’t totally decided – either by being enforced, or as a matter of style, you will generally do this once at the beginning of the program (and choose on the compiler command line to do it e.g. the debug way or the release way) and it will affect all of your code.

Libraries will be packaged as Pepper3 source code, so choices you make of the type of Int etc. will be reflected through the whole dependency tree. Cool, huh?

This is inspired by Python.

Can you group variables together into structs or similar?

Yes – it will be especially easy to make “value types”, and lots of default methods will be provided, that you will be strongly encouraged to use – e.g. copy and move operations. This is inspired by Elm.

Why are variables immutable by default but mutable with a special syntax? It’s the opposite of C++ const, but why that way around?

This is one of the “nudges” – immutable stuff is much easier to think about, and makes parallel stuff easier, and allows optimisations and so on, so turning it on by default means you have to choose to take the bad path, and are inclined to take the virtuous one. This is inspired by Haskell and Rust.

Why only allow assignments, function calls and operators? I’m sure you have good reasons.

To be as simple as possible, so you only have those things to learn and the rest can be understood by just reading the code. This is inspired by Python.

I wrote more of my (earlier) thoughts in this 4-post series, which is better thought through: Goodness in Programming Languages

Examples of Pepper3 code

Series: Examples, Questions

I have restarted my effort to make a new programming language that fits the way I like things. I haven’t pushed any code yet, but I have made a lot of progress in my head to understand what I want.

Here are some random examples that might get across some of the ways I am thinking:


// You code using general types like "Int" but you can set what
// they really are in the code (usually at the beginning), so
// if you plan to use native ints in the production code, it's
// a good idea to use:
Int = CheckedNativeInt;
// while in dev, since it will crash at runtime if you overflow.

// Then, in production when you're sure you have no errors,
// switch to an unchecked one:
Int = NativeInt;

// But, if you prefer correctness over efficiency, you can use
// mathematical integer that never overflows:
Int = ArbitrarySizeInt;


// Variables are immutable by default, so:
Int x = 4;
x = 3;      // this is a compile error


// But this is OK
Mutable(Int) y = 6;
y = y + x;

// Notice that you can call functions that return types that you
// then use, like Mutable(Int) here.

// Generally, code can run at either compile time or run time.
// Code to do with types has to run at compile time.
// By default, other code runs at run time, but you can force
// it to run early if you want to.


// A main method looks like this - you get hold of e.g. stdout through
// a World instance - I try to avoid any global functions like print, or
// global variables like sys.stdout.

Auto main =
{:(World world)->Int
    //...
};

// (Although note that Int, String etc. actually are global variables,
// which is a bit annoying)

// I wish the main method were simpler-looking.  The only saving grace
// is that for simple examples you don't need a main method -
// Pepper3 just calculates the expression you provide in your file and
// prints it out.


// Expressions in curly brackets are lambda functions, so:

{3};

// is a function taking no arguments, returning 3, and:

{:(Int x)
    x * 2
};

// is a function that doubles a value.

Obviously, we can tie functions to names:

Auto dbl =
    {:(Int x)
        x * 2
    };

// Meaning we can call dbl like this:
dbl(4);

// Auto is a magic word to say ("use type inference"), so
// this is equivalent to the above:

fn([Int]->Int) dbl =
    {:(Int x)
        x * 2
    };


// Because {} makes an anon function, things like "for" can be
// functions instead of keywords.

for(range(3), {:(Int x)
    world.stdout.println(to(String)(x));
});


// As far as possible, Pepper3 will only contain assignment statements:
String s = "xx";

// and expressions containing function calls and operators:
dbl(3) + 6;


// This means we can make our own constructs like a different type of
// for loop, which would need a new keyword in some languages:

Auto parallel_for = import(multiprocess.parallel_for);

Recording gameplay videos on RetroPie

Credits: this is a slightly corrected and shortened version of How To Record A GamePlay Video From A RetroPie by selsine, which is itself based on Recording Live Gameplay in RetroPie’s RetroArch Emulators Natively on the Raspberry Pi by Retro Resolution.

RetroPie is based on RetroArch. RetroArch has a feature to record gameplay videos, but the current version of RetroPie has it disabled, presumably because it was thought to be too intensive to run properly on a Raspberry Pi.

These instructions tell you how to turn the recording feature on, and set it up. This works perfectly on my Raspberry Pi 3, allowing me to record video and sound from games I am playing.

The code for this is here: github.com/andybalaam/retropie-recording – this was code written by RetroRevolution, with small corrections and additions by me.

Before you start, you should have RetroPie working and connected to the Internet, and updated to the latest version.

Note: you should make a backup of your RetroPie before you start, because if you type the command below you could completely break it, meaning you will have to wipe your SD card and start fresh.

Turning on the recording feature

RetroArch uses the ffmpeg program to record video. To turn on recording, we need to log into the Pi using ssh, download and compile ffmpeg, and then recompile RetroArch with recording support turned on.

Log in to the Pi using ssh

Find out the IP address of your Pi by choosing “RetroPie setup” in the RetroPie menu and choosing “Show IP Address”. Write down the IP address (four numbers with dots in between – for example: 192.168.0.3).

On your Linux* computer open a Terminal and type:

ssh pi@192.168.0.3

(put in the IP address you wrote down instead of 192.168.0.3)

When it asks for your password, type: raspberry

If this works right, you should see something like this:
The RetroPie Project joystick logo

* Note: if you don’t have Linux, this should work OK on a Mac, or on Windows you could try using PuTTY.

Download and compile ffmpeg

Log in to the RetroPie as described above. The commands shown below should all be typed in to the window where you are logged in to the RetroPie.

Download the script ffmpeg-install.sh by typing this:

wget https://github.com/andybalaam/retropie-recording/raw/master/ffmpeg-install.sh

Now run it like this:

bash ffmpeg-install.sh

(Note: DON’T use sudo to run this – just type exactly what is written above.)

Now wait a long time for this to work. If it prints out errors, something went wrong – read what it says, and you may need to edit the ffmpeg-install.sh script to figure out what to do. Leave a comment and include the errors you saw if you need help.

Hopefully it will end successfully and print:

FFmpeg and Codec Installation Complete

If so, you are ready to move on to recompiling RetroArch:

Recompile RetroArch with recording turned on

Download the script build-retroarch-with-ffmpeg.sh by typing this:

wget https://github.com/andybalaam/retropie-recording/raw/master/build-retroarch-with-ffmpeg.sh

Now run it like this:

bash build-retroarch-with-ffmpeg.sh

It should finish in about 10 minutes, and print:

Building RetroArch with ffmpeg enabled complete

If it printed that, your RetroPie now has recording support! Restart your RetroPie:

Restart the RetroPie

Restart your RetroPie.

If you want to check that recording support is enabled, Look for “Checking FFmpeg Has Been Enabled in RetroArch” on the RetroResolution guide.

Now you need to set up RetroPie to record your emulator.

Setting up recording for your emulator

To set up an emulator, you need a general recording config file (the same for all emulators), and a launch config for the actual emulator you are using.

Create the recording config file

Log into the RetroPie as described in the first section, and type this to download the recording config file. If you want to change settings like what file format to record in, this is the file you will need to change.

wget https://github.com/andybalaam/retropie-recording/blob/master/recording-config.cfg

Create a launch config for your emulator

Each RetroPie emulator has a config file that describes how to launch it. For example, the NES emulator’s version is in /opt/retropie/configs/nes/emulators.cfg.

To get a list of all the emulators, log into your RetroPie and type:

ls /opt/retropie/configs

In that list you will see, for example, “nes” for the NES emulators, and “gb” for the GameBoy emulators. Find the one you want to edit, and edit it with the nano editor by typing:

nano /opt/retropie/configs/gb/emulators.cfg

(Instead of “gb” type the right name for the emulator you want to use, from the list you got when you typed the “ls” command above.)

Now you need to add a new line in this file. Each line describes how to launch an emulator. You should copy an existing line, and add some more stuff to the end.

For example, my version of this file looks like this:

lr-gambatte = "/opt/retropie/emulators/retroarch/bin/retroarch -L /opt/retropie/libretrocores/lr-gambatte/gambatte_libretro.so --config /opt/retropie/configs/gb/retroarch.cfg %ROM%"
lr-gambatte-record = "/opt/retropie/emulators/retroarch/bin/retroarch -L /opt/retropie/libretrocores/lr-gambatte/gambatte_libretro.so --config /opt/retropie/configs/gb/retroarch.cfg --record /home/pi/recording_GB_$(date +%Y-%m-%d-%H%M%S).mkv --recordconfig /home/pi/recording-config.cfg %ROM%"
default = "lr-gambatte"
lr-tgbdual = "/opt/retropie/emulators/retroarch/bin/retroarch -L /opt/retropie/libretrocores/lr-tgbdual/tgbdual_libretro.so --config /opt/retropie/configs/gb/retroarch.cfg %ROM%"

The line I added is coloured: The green parts are things copied from the line above, and the red parts are new – those parts tell the launcher to use the recording config we made in the previous section.

When you’ve made your edits, press Ctrl-X to exit nano, and type “Y” when it asks whether you want to save.

Once you’ve done something similar to this for every emulator you want to record with, you are ready to actually do the recording!

Actually doing a recording

Launching a game with recording turned on

In the normal RetroPie interface, go to your emulator and start it, but press the A button while it’s launching, and choose “Select emulator for ROM”. In the list that comes up, choose the new line you added in emulators.cfg. In our example, that was called “lr-gambatte-record”.

Now play the game, and exit when you are finished. If all goes well, the recording will have been saved!

(Note: doing this means that every time you launch this game it will be recorded. To stop it doing this, press the “A” button while it’s launching, choose “Select emulator for ROM” and choose the normal line – in our example that would be “lr-gambatte”.)

Getting the recorded files

To get your recording off the RetroPie, go back to your computer, open a terminal, and type:

scp pi@192.168.0.3:recording_*.mkv ./

This will copy all recorded videos from your RetroPie onto your computer (into your home directory, unless you did a cd commmand before you typed the above).

Now you should delete the files from your RetroPie. Log in to the RetroPie as described in the first section, and delete all recording files by typing this:

Note: This deletes all your recordings, and you can’t undo!

rm recording_*.mkv

Note: This deletes all your recordings, and you can’t undo!

Safer: recording onto a USB stick

Note: recording directly onto the RetroPie like we described above is dangerous because you could fill up all the disk space or corrupt your SD card, which could make RetroPie stop working, meaning you need to wipe your SD card and set up RetroPie again.

It’s safer to record onto a separate USB disk. To find out how, read “Recording to an External Storage Device” in Retro Resolution’s guide.

Women Who Code workshop on “Write your own programming language”

On Wednesday 28th June 2017 a group of people from OpenMarket went to the Fora office space in Clerkenwell, London to run a workshop with the Women Who Code group, who work to help women achieve their career goals.

OpenMarket provided the workshop “Write your own programming language” and funded the food, and the venue was provided gratis by Fora.

We started the evening with some networking and food:

networking

food

but most of the time was spent coding:

coding

with lots of help from our OpenMarket helpers:

helpers

The feedback we got was very positive:

Everyone seemed to be having fun, so we hope we might get invited back to do more in future.

Why do this?

At OpenMarket we want to improve our diversity, and we have started by looking at gender diversity specifically. By being involved with events like this we hope to learn how we can make our company better at welcoming and supporting employees, encourage people from under-represented groups to apply to work here, and improve the general climate in our industry.

Thank you

A huge thank you to the OpenMarket people (from London and Guadalajara!) who helped out – I think people felt welcome and there was plenty of help available for the attendees – you did a great job.

Thank you also for the great response from everyone in our London office – several people in the office wanted to come but couldn’t make it on the night – I am hoping we will get more opportunities in future.

We’re also really grateful to OpenMarket for funding the food, to Fora for providing the space, and to Women Who Code for doing such great work to improve our industry.

Links

[Photos by David Lawson.]

Running a virtualenv with a custom-built Python

For my attempt to improve the asyncio.as_completed Python standard library function I needed to build a local copy of cpython (the Python interpreter).

To test it, I needed the aiohttp module, which is not part of the standard library, so the easiest way to get it was using virtualenv.

Here is the recipe I used to get a virtualenv and install packages using pip with a custom-built Python:

$ ~/code/public/cpython/python -m venv env
$ . env/bin/activate
(env) $ pip install aiohttp
(env) $ python mycode.py

Adding a concurrency limit to Python’s asyncio.as_completed

Series: asyncio basics, large numbers in parallel, parallel HTTP requests, adding to stdlib

In the previous post I demonstrated how the limited_as_completed method allows us to run a very large number of tasks using concurrency, but limiting the number of concurrent tasks to a sensible limit to ensure we don’t exhaust resources like memory or operating system file handles.

I think this could be a useful addition to the Python standard library, so I have been working on a modification to the current asyncio.as_completed method. My work so far is here: limited-as_completed.

I ran similar tests to the ones I ran for the last blog post with this code to validate that the modified standard library version achieves the same goals as before.

I used an identical copy of timed from the previous post and updated versions of the other files because I was using a much newer version of aiohttp along with the custom-built python I was running.

server looked like:

#!/usr/bin/env python3

from aiohttp import web
import asyncio
import random

async def handle(request):
    await asyncio.sleep(random.randint(0, 3))
    return web.Response(text="Hello, World!")

app = web.Application()
app.router.add_get('/{name}', handle)

web.run_app(app)

client-async-sem needed me to add a custom TCPConnector to avoid a new limit on the number of concurrent connections that was added to aiohttp in version 2.0. I also need to move the ClientSession usage inside a coroutine to avoid a warning:

#!/usr/bin/env python3

from aiohttp import ClientSession, TCPConnector
import asyncio
import sys

limit = 1000

async def fetch(url, session):
    async with session.get(url) as response:
        return await response.read()

async def bound_fetch(sem, url, session):
    # Getter function with semaphore.
    async with sem:
        await fetch(url, session)

async def run(r):
    with ClientSession(connector=TCPConnector(limit=limit)) as session:
        url = "http://localhost:8080/{}"
        tasks = []
        # create instance of Semaphore
        sem = asyncio.Semaphore(limit)
        for i in range(r):
            # pass Semaphore and session to every GET request
            task = asyncio.ensure_future(
                bound_fetch(sem, url.format(i), session))
            tasks.append(task)
        responses = asyncio.gather(*tasks)
        await responses

loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.ensure_future(run(int(sys.argv[1]))))

My new code that uses my proposed extension to as_completed looked like:

#!/usr/bin/env python3

from aiohttp import ClientSession, TCPConnector
import asyncio
import sys

async def fetch(url, session):
    async with session.get(url) as response:
        return await response.read()

limit = 1000

async def print_when_done():
    with ClientSession(connector=TCPConnector(limit=limit)) as session:
        tasks = (fetch(url.format(i), session) for i in range(r))
        for res in asyncio.as_completed(tasks, limit=limit):
            await res

r = int(sys.argv[1])
url = "http://localhost:8080/{}"
loop = asyncio.get_event_loop()
loop.run_until_complete(print_when_done())
loop.close()

and with these, we get similar behaviour to the previous post:

$ ./timed ./client-async-sem 10000
Memory usage: 73640KB	Time: 19.18 seconds
$ ./timed ./client-async-stdlib 10000
Memory usage: 49332KB	Time: 18.97 seconds

So the implementation I plan to submit to the Python standard library appears to work well. In fact, I think it is better than the one I presented in the previous post, because it uses on_complete callbacks to notice when futures have completed, which reduces the busy-looping we were doing to check for and yield finished tasks.

The Python issue is bpo-30782 and the pull request is #2424.

Note: at first glance, it looks like the aiohttp.ClientSession‘s limit on the number of connections (introduced in version 1.0 and then updated in version 2.0) gives us what we want without any of this extra code, but in fact it only limits the number of connections, not the number of futures we are creating, so it has the same problem of unbounded memory use as the semaphore-based implementation.

Making 100 million requests with Python aiohttp

Series: asyncio basics, large numbers in parallel, parallel HTTP requests, adding to stdlib

I’ve been working on how to make a very large number of HTTP requests using Python’s asyncio and aiohttp.

Paweł Miech’s post Making 1 million requests with python-aiohttp taught me how to think about this, and got us a long way, with 1 million requests running in a reasonable time, but I need to go further.

Paweł’s approach limits the number of requests that are in progress, but it uses an unbounded amount of memory to hold the futures that it wants to execute.

We can avoid using unbounded memory by using the limited_as_completed function I outined in my previous post.

Setup

Server

We have a server program “server”:

(Note it differs from Paweł’s version because I am using an older version of aiohttp which has fewer convenient features.)

#!/usr/bin/env python3.5

from aiohttp import web
import asyncio
import random

async def handle(request):
    await asyncio.sleep(random.randint(0, 3))
    return web.Response(text="Hello, World!")

async def init():
    app = web.Application()
    app.router.add_route('GET', '/{name}', handle)
    return await loop.create_server(
        app.make_handler(), '127.0.0.1', 8080)

loop = asyncio.get_event_loop()
loop.run_until_complete(init())
loop.run_forever()

This just responds “Hello, World!” to every request it receives, but after an artificial delay of 0-3 seconds.

Synchronous client

As a baseline, we have a synchronous client “client-sync”:

#!/usr/bin/env python3.5

import requests
import sys

url = "http://localhost:8080/{}"
for i in range(int(sys.argv[1])):
    requests.get(url.format(i)).text

This waits for each request to complete before making the next one. Like the other clients below, it takes the number of requests to make as a command-line argument.

Async client using semaphores

Copied mostly verbatim from Making 1 million requests with python-aiohttp we have an async client “client-async-sem” that uses a semaphore to restrict the number of requests that are in progress at any time to 1000:

#!/usr/bin/env python3.5

from aiohttp import ClientSession
import asyncio
import sys

limit = 1000

async def fetch(url, session):
    async with session.get(url) as response:
        return await response.read()

async def bound_fetch(sem, url, session):
    # Getter function with semaphore.
    async with sem:
        await fetch(url, session)

async def run(session, r):
    url = "http://localhost:8080/{}"
    tasks = []
    # create instance of Semaphore
    sem = asyncio.Semaphore(limit)
    for i in range(r):
        # pass Semaphore and session to every GET request
        task = asyncio.ensure_future(bound_fetch(sem, url.format(i), session))
        tasks.append(task)
    responses = asyncio.gather(*tasks)
    await responses

loop = asyncio.get_event_loop()
with ClientSession() as session:
    loop.run_until_complete(asyncio.ensure_future(run(session, int(sys.argv[1]))))

Async client using limited_as_completed

The new client I am presenting here uses limited_as_completed from the previous post. This means it can make a generator that provides the futures to wait for as they are needed, instead of making them all at the beginning.

It is called “client-async-as-completed”:

#!/usr/bin/env python3.5

from aiohttp import ClientSession
import asyncio
from itertools import islice
import sys

def limited_as_completed(coros, limit):
    futures = [
        asyncio.ensure_future(c)
        for c in islice(coros, 0, limit)
    ]
    async def first_to_finish():
        while True:
            await asyncio.sleep(0)
            for f in futures:
                if f.done():
                    futures.remove(f)
                    try:
                        newf = next(coros)
                        futures.append(
                            asyncio.ensure_future(newf))
                    except StopIteration as e:
                        pass
                    return f.result()
    while len(futures) > 0:
        yield first_to_finish()

async def fetch(url, session):
    async with session.get(url) as response:
        return await response.read()

limit = 1000

async def print_when_done(tasks):
    for res in limited_as_completed(tasks, limit):
        await res

r = int(sys.argv[1])
url = "http://localhost:8080/{}"
loop = asyncio.get_event_loop()
with ClientSession() as session:
    coros = (fetch(url.format(i), session) for i in range(r))
    loop.run_until_complete(print_when_done(coros))
loop.close()

Again, this limits the number of requests to 1000.

Test setup

Finally, we have a test runner script called “timed”:

#!/usr/bin/env bash

./server &
sleep 1 # Wait for server to start

/usr/bin/time --format "Memory usage: %MKB\tTime: %e seconds" "$@"

# %e Elapsed real (wall clock) time used by the process, in seconds.
# %M Maximum resident set size of the process in Kilobytes.

kill %1

This runs each process, ensuring the server is restarted each time it runs, and prints out how long it took to run, and how much memory it used.

Results

When making only 10 requests, the async clients worked faster because they launched all the requests simultaneously and only had to wait for the longest one (3 seconds). The memory usage of all three clients was fine:

$ ./timed ./client-sync 10
Memory usage: 20548KB	Time: 15.16 seconds
$ ./timed ./client-async-sem 10
Memory usage: 24996KB	Time: 3.13 seconds
$ ./timed ./client-async-as-completed 10
Memory usage: 23176KB	Time: 3.13 seconds

When making 100 requests, the synchronous client was very slow, but all three clients worked eventually:

$ ./timed ./client-sync 100
Memory usage: 20528KB	Time: 156.63 seconds
$ ./timed ./client-async-sem 100
Memory usage: 24980KB	Time: 3.21 seconds
$ ./timed ./client-async-as-completed 100
Memory usage: 24904KB	Time: 3.21 seconds

At this point let’s agree that life is too short to wait for the synchronous client.

When making 10000 requests, both async clients worked quite quickly, and both had increased memory usage, but the semaphore-based one used almost twice as much memory as the limited_as_completed version:

$ ./timed ./client-async-sem 10000
Memory usage: 77912KB	Time: 18.10 seconds
$ ./timed ./client-async-as-completed 10000
Memory usage: 46780KB	Time: 17.86 seconds

For 1 million requests, the semaphore-based client took 25 minutes on my (32GB RAM) machine. It only used about 10% of my CPU, and it used a lot of memory (over 3GB):

$ ./timed ./client-async-sem 1000000
Memory usage: 3815076KB	Time: 1544.04 seconds

Note: Paweł’s version only took 9 minutes on his laptop and used all his CPU, so I wonder whether I have made a mistake somewhere, or whether my version of Python (3.5.2) is not as good as a later one.

The limited_as_completed version ran in a similar amount of time but used 100% of my CPU, and used a much smaller amount of memory (162MB):

$ ./timed ./client-async-as-completed 1000000
Memory usage: 162168KB	Time: 1505.75 seconds

Now let’s try 100 million requests. The semaphore-based version lasted 10 hours before it was killed by Linux’s OOM Killer, but it didn’t manage to make any requests in this time, because it creates all its futures before it starts making requests:

$ ./timed ./client-async-sem 100000000
Command terminated by signal 9

I left the limited_as_completed version over the weekend and it managed to succeed eventually:

$ ./timed ./client-async-as-completed 100000000
Memory usage: 294304KB	Time: 150213.15 seconds

So its memory usage was still very bounded, and it managed to do about 665 requests/second over an extended period, which is almost identical to the throughput of the previous cases.

Conclusion

Making a million requests is usually enough, but when we really need to do a lot of work while keeping our memory usage bounded, it looks like an approach like limited_as_completed is a good way to go. I also think it’s slightly easier to understand.

In the next post I describe my attempt to get something like this added to the Python standard library.

Python – printing UTC dates in ISO8601 format with time zone

By default, when you make a UTC date from a Unix timestamp in Python and print it in ISO format, it has no time zone:

$ python3
>>> from datetime import datetime
>>> datetime.utcfromtimestamp(1496998804).isoformat()
'2017-06-09T09:00:04'

Whenever you talk about a datetime, I think you should always include a time zone, so I find this problematic.

The solution is to mention the timezone explicitly when you create the datetime:

$ python3
>>> from datetime import datetime, timezone
>>> datetime.fromtimestamp(1496998804, tz=timezone.utc).isoformat()
'2017-06-09T09:00:04+00:00'

Note, including the timezone explicitly works the same way when creating a datetime in other ways:

$ python3
>>> from datetime import datetime, timezone
>>> datetime(2017, 6, 9).isoformat()
'2017-06-09T00:00:00'
>>> datetime(2017, 6, 9, tzinfo=timezone.utc).isoformat()
'2017-06-09T00:00:00+00:00'

Python 3 – large numbers of tasks with limited concurrency

Series: asyncio basics, large numbers in parallel, parallel HTTP requests, adding to stdlib

I am interested in running large numbers of tasks in parallel, so I need something like asyncio.as_completed, but taking an iterable instead of a list, and with a limited number of tasks running concurrently. First, let’s try to build something pretty much equivalent to asyncio.as_completed. Here is my attempt, but I’d welcome feedback from readers who know better:

# Note this is not a coroutine - it returns
# an iterator - but it crucially depends on
# work being done inside the coroutines it
# yields - those coroutines empty out the
# list of futures it holds, and it will not
# end until that list is empty.
def my_as_completed(coros):

    # Start all the tasks
    futures = [asyncio.ensure_future(c) for c in coros]

    # A coroutine that waits for one of the
    # futures to finish and then returns
    # its result.
    async def first_to_finish():

        # Wait forever - we could add a
        # timeout here instead.
        while True:

            # Give up control to the scheduler
            # - otherwise we will spin here
            # forever!
            await asyncio.sleep(0)

            # Return anything that has finished
            for f in futures:
                if f.done():
                    futures.remove(f)
                    return f.result()

    # Keep yielding a waiting coroutine
    # until all the futures have finished.
    while len(futures) > 0:
        yield first_to_finish()

The above can be substituted for asyncio.as_completed in the code that uses it in the first article, and it seems to work. It also makes a reasonable amount of sense to me, so it may be correct, but I’d welcome comments and corrections.

my_as_completed above accepts an iterable and returns a generator producing results, but inside it starts all tasks concurrently, and stores all the futures in a list. To handle bigger lists we will need to do better, by limiting the number of running tasks to a sensible number.

Let’s start with a test program:

import asyncio
async def mycoro(number):
    print("Starting %d" % number)
    await asyncio.sleep(1.0 / number)
    print("Finishing %d" % number)
    return str(number)

async def print_when_done(tasks):
    for res in asyncio.as_completed(tasks):
        print("Result %s" % await res)

coros = [mycoro(i) for i in range(1, 101)]

loop = asyncio.get_event_loop()
loop.run_until_complete(print_when_done(coros))
loop.close()

This uses asyncio.as_completed to run 100 tasks and, because I adjusted the asyncio.sleep command to wait longer for earlier tasks, it prints something like this:

$ time python3 python-async.py
Starting 47
Starting 93
Starting 48
...
Finishing 93
Finishing 94
Finishing 95
...
Result 93
Result 94
Result 95
...
Finishing 46
Finishing 45
Finishing 42
...
Finishing 2
Result 2
Finishing 1
Result 1

real    0m1.590s
user    0m0.600s
sys 0m0.072s

So all 100 tasks were completed in 1.5 seconds, indicating that they really were run in parallel, but all 100 were allowed to run at the same time, with no limit.

We can adjust the test program to run using our customised my_as_completed function, and pass in an iterable of coroutines instead of a list by changing the last part of the program to look like this:

async def print_when_done(tasks):
    for res in my_as_completed(tasks):
        print("Result %s" % await res)
coros = (mycoro(i) for i in range(1, 101))
loop = asyncio.get_event_loop()
loop.run_until_complete(print_when_done(coros))
loop.close()

But we get similar output to last time, with all tasks running concurrently.

To limit the number of concurrent tasks, we limit the size of the futures list, and add more as needed:

from itertools import islice
def limited_as_completed(coros, limit):
    futures = [
        asyncio.ensure_future(c)
        for c in islice(coros, 0, limit)
    ]
    async def first_to_finish():
        while True:
            await asyncio.sleep(0)
            for f in futures:
                if f.done():
                    futures.remove(f)
                    try:
                        newf = next(coros)
                        futures.append(
                            asyncio.ensure_future(newf))
                    except StopIteration as e:
                        pass
                    return f.result()
    while len(futures) > 0:
        yield first_to_finish()

We start limit tasks at first, and whenever one ends, we ask for the next coroutine in coros and set it running. This keeps the number of running tasks at or below limit until we start running out of input coroutines (when next throws and we don’t add anything to futures), then futures starts emptying until we eventually stop yielding coroutine objects.

I thought this function might be useful to others, so I started a little repo over here and added it: asyncioplus/limited_as_completed.py. Please provide merge requests and log issues to improve it – maybe it should be part of standard Python?

When we run the same example program, but call limited_as_completed instead of the other versions:

async def print_when_done(tasks):
    for res in limited_as_completed(tasks, 10):
        print("Result %s" % await res)
coros = (mycoro(i) for i in range(1, 101))
loop = asyncio.get_event_loop()
loop.run_until_complete(print_when_done(coros))
loop.close()

We see output like this:

$ time python3 python-async.py
Starting 1
Starting 2
...
Starting 9
Starting 10
Finishing 10
Result 10
Starting 11
...
Finishing 100
Result 100
Finishing 1
Result 1

real	0m1.535s
user	0m1.436s
sys	0m0.084s

So we can see that the tasks are still running concurrently, but this time the number of concurrent tasks is limited to 10.

See also

To achieve a similar result using semaphores, see Python asyncio.semaphore in async-await function and Making 1 million requests with python-aiohttp.

It feels like limited_as_completed is more re-usable as an approach but I’d love to hear others’ thoughts on this. E.g. could/should I use a semaphore to implement limited_as_completed instead of manually holding a queue?

Basic ideas of Python 3 asyncio concurrency

Series: asyncio basics, large numbers in parallel, parallel HTTP requests, adding to stdlib

Python 3’s asyncio module and the async and await keywords combine to allow us to do cooperative concurrent programming, where a code path voluntarily yields control to a scheduler, trusting that it will get control back when some resource has become available (or just when the scheduler feels like it). This way of programming can be very confusing, and has been popularised by Twisted in the Python world, and nodejs (among others) in other worlds.

I have been trying to get my head around the basic ideas as they surface in Python 3’s model. Below are some definitions and explanations that have been useful to me as I tried to grasp how it all works.

Futures and coroutines are both things that you can wait for.

You can make a coroutine by declaring it with async def:

import asyncio
async def mycoro(number):
    print("Starting %d" % number)
    await asyncio.sleep(1)
    print("Finishing %d" % number)
    return str(number)

Almost always, a coroutine will await something such as some blocking IO. (Above we just sleep for a second.) When we await, we actually yield control to the scheduler so it can do other work and wake us up later, when something interesting has happened.

You can make a future out of a coroutine, but often you don’t need to. Bear in mind that if you do want to make a future, you should use ensure_future, but this actually runs what you pass to it – it doesn’t just create a future:

myfuture1 = asyncio.ensure_future(mycoro(1))
# Runs mycoro!

But, to get its result, you must wait for it – it is only scheduled in the background:

# Assume mycoro is defined as above
myfuture1 = asyncio.ensure_future(mycoro(1))
# We end the program without waiting for the future to finish

So the above fails like this:

$ python3 ./python-async.py
Task was destroyed but it is pending!
task: <Task pending coro=<mycoro() running at ./python-async:10>>
sys:1: RuntimeWarning: coroutine 'mycoro' was never awaited

The right way to block waiting for a future outside of a coroutine is to ask the event loop to do it:

# Keep on assuming mycoro is defined as above for all the examples
myfuture1 = asyncio.ensure_future(mycoro(1))
loop = asyncio.get_event_loop()
loop.run_until_complete(myfuture1)
loop.close()

Now this works properly (although we’re not yet getting any benefit from being asynchronous):

$ python3 python-async.py
Starting 1
Finishing 1

To run several things concurrently, we make a future that is the combination of several other futures. asyncio can make a future like that out of coroutines using asyncio.gather:

several_futures = asyncio.gather(
    mycoro(1), mycoro(2), mycoro(3))
loop = asyncio.get_event_loop()
print(loop.run_until_complete(several_futures))
loop.close()

The three coroutines all run at the same time, so this only takes about 1 second to run, even though we are running 3 tasks, each of which takes 1 second:

$ python3 python-async.py
Starting 3
Starting 1
Starting 2
Finishing 3
Finishing 1
Finishing 2
['1', '2', '3']

asyncio.gather won’t necessarily run your coroutines in order, but it will return a list of results in the same order as its input.

Notice also that run_until_complete returns the result of the future created by gather – a list of all the results from the individual coroutines.

To do the next bit we need to know how to call a coroutine from a coroutine. As we’ve already seen, just calling a coroutine in the normal Python way doesn’t run it, but gives you back a “coroutine object”. To actually run the code, we need to wait for it. When we want to block everything until we have a result, we can use something like run_until_complete but in an async context we want to yield control to the scheduler and let it give us back control when the coroutine has finished. We do that by using await:

import asyncio
async def f2():
    print("start f2")
    await asyncio.sleep(1)
    print("stop f2")
async def f1():
    print("start f1")
    await f2()
    print("stop f1")
loop = asyncio.get_event_loop()
loop.run_until_complete(f1())
loop.close()

This prints:

$ python3 python-async.py
start f1
start f2
stop f2
stop f1

Now we know how to call a coroutine from inside a coroutine, we can continue.

We have seen that asyncio.gather takes in some futures/coroutines and returns a future that collects their results (in order).

If, instead, you want to get results as soon as they are available, you need to write a second coroutine that deals with each result by looping through the results of asyncio.as_completed and awaiting each one.

# Keep on assuming mycoro is defined as at the top
async def print_when_done(tasks):
    for res in asyncio.as_completed(tasks):
        print("Result %s" % await res)
coros = [mycoro(1), mycoro(2), mycoro(3)]
loop = asyncio.get_event_loop()
loop.run_until_complete(print_when_done(coros))
loop.close()

This prints:

$ python3 python-async.py
Starting 1
Starting 3
Starting 2
Finishing 3
Result 3
Finishing 2
Result 2
Finishing 1
Result 1

Notice that task 3 finishes first and its result is printed, even though tasks 1 and 2 are still running.

asyncio.as_completed returns an iterable sequence of futures, each of which must be awaited, so it must run inside a coroutine, which must be waited for too.

The argument to asyncio.as_completed has to be a list of coroutines or futures, not an iterable, so you can’t use it with a very large list of items that won’t fit in memory.

Side note: if we want to work with very large lists, asyncio.wait won’t help us here – it also takes a list of futures and waits for all of them to complete (like gather), or, with other arguments, for one of them to complete or one of them to fail. It then returns two sets of futures: done and not-done. Each of these must be awaited to get their results, so:

asyncio.gather

# is roughly equivalent to:

async def mygather(*args):
    ret = []
    for r in (await asyncio.wait(args))[0]:
        ret.append(await r)
    return ret

I am interested in running very large numbers of tasks with limited concurrency – see the next article for how I managed it.

C++ iterator wrapping a stream not 1-1

Series: Iterator, Iterator Wrapper, Non-1-1 Wrapper

Sometimes we want to write an iterator that consumes items from some underlying iterator but produces its own items slower than the items it consumes, like this:

ColonSep items("aa:foo::x");
// Prints "aa, foo, , x"
for(auto s : items)
{
    std::cout << s << ", ";
}

When we pass a 9-character string (i.e. an iterator that yields 9 items) to ColonSep, above, we only repeat 4 times in our loop, because ColonSep provides an iterable range that yields one value for each whole word it finds in the underlying iterator of 9 characters.

To do something like this I'd recommend consuming the items of the underlying iterator early, so it is ready when requested with operator*. We also need our iterator class to hold on to the end of the underlying iterator as well as the current position.

First we need a small container to hold the next item we will provide:

struct maybestring
{
    std::string value_;
    bool at_end_;

    explicit maybestring(const std::string value)
    : value_(value)
    , at_end_(false)
    {}

    explicit maybestring()
    : value_("--error-past-end--")
    , at_end_(true)
    {}
};

A maybestring either holds the next item we will provide, or at_end_ is true, meaning we have reached the end of the underlying iterator and we will report that we are at the end ourself when asked.

Like the simpler iterators we have looked at, we still need a little container to return from the postincrement operator:

class stringholder
{
    const std::string value_;
public:
    stringholder(const std::string value) : value_(value) {}
    std::string operator*() const { return value_; }
};

Now we are ready to write our iterator class, which always has the next value ready in its next_ member, and holds on to the current and end positions of the underlying iterator in wrapped_ and wrapped_end_:

class myit
{
private:
    typedef std::string::const_iterator wrapped_t;
    wrapped_t wrapped_;
    wrapped_t wrapped_end_;
    maybestring next_;

The constructor holds on the underlying iterator pointers, and immediately fills next_ with the next value by calling next_item passing in true to indicate that this is the first item:

public:
    myit(wrapped_t wrapped, wrapped_t wrapped_end)
    : wrapped_(wrapped)
    , wrapped_end_(wrapped_end)
    , next_(next_item(true))
    {
    }

    // Previously provided by std::iterator
    typedef int                     value_type;
    typedef std::ptrdiff_t          difference_type;
    typedef int*                    pointer;
    typedef int&                    reference;
    typedef std::input_iterator_tag iterator_category;

next_item looks like this:

private:
    maybestring next_item(bool first_time)
    {
        if (wrapped_ == wrapped_end_)
        {
            return maybestring();  // We are at the end
        }
        else
        {
            if (!first_time)
            {
                ++wrapped_;
            }
            return read_item();
        }
    }

next_item recognises whether we've reached the end of the underlying iterator and saves the empty maybstring if so. Otherwise, it skips forward once (unless we are on the first element) and then calls read_item:

    maybestring read_item()
    {
        std::string ret = "";
        for (; wrapped_ != wrapped_end_; ++wrapped_)
        {
            char c = *wrapped_;
            if (c == ':')
            {
                break;
            }
            ret += c;
        }
        return maybestring(ret);
    }

read_item implements the real logic of looping through the underlying iterator and combining those values together to create the next item to provide.

The hard part of the iterator class is done, leaving only the more normal functions we must provide:

public:
    value_type operator*() const
    {
        assert(!next_.at_end_);
        return next_.value_;
    }

    bool operator==(const myit& other) const
    {
        // We only care about whether we are at the end
        return next_.at_end_ == other.next_.at_end_;
    }

    bool operator!=(const myit& other) const { return !(*this == other); }

    stringholder operator++(int)
    {
        assert(!next_.at_end_);
        stringholder ret(next_.value_);
        next_ = next_item(false);
        return ret;
    }

    myit& operator++()
    {
        assert(!next_.at_end_);
        next_ = next_item(false);
        return *this;
    }
}

Note that operator== is only concerned with whether or not we are an end iterator or not. Nothing else matters for providing correct iteration.

Our final bit of bookkeeping is the range class that allows our new iterator to be used in a for loop:

class ColonSep
{
private:
    const std::string str_;
public:
    ColonSep(const std::string str) : str_(str) {}
    myit begin() { return myit(std::begin(str_), std::end(str_)); }
    myit end()   { return myit(std::end(str_),   std::end(str_)); }
};

A lot of the code above is needed for all code that does this kind of job. Next time we'll look at how to use templates to make it useable in the general case.