Limiting the number of open sockets in a tokio-based TCP listener

I learned quite a bit today about how to think about concurrency in Rust. I was trying to use a Semaphore to limit how many open sockets my TCP listener allowed, and I had real trouble making it work. It either didn’t actually work, allowing any number of clients to connect, or the compiler told me I couldn’t do what I wanted to do, because the lifetime of my Semaphore was not 'static. Here’s the journey I took towards working code that I think is correct (feedback welcome).

Motivation

In the tokio tutorial there is a short section entitled “Backpressure and bounded channels” (partway down the Channels page). It contains this statement:

…take care to ensure total amount of concurrency is bounded. For example, when writing a TCP accept loop, ensure that the total number of open sockets is bounded.

Obviously, when I started work on a TCP accept loop, I wanted to follow this advice.

Like many things in my journey with Rust, it was harder than I expected, and eventually enlightening.

The code

Here is a short Rust program that listens on a TCP port and accepts incoming connections.

Cargo.toml:

[package]
name = "tcp-listener-example"
version = "1.0.0"
edition = "2018"
include = ["src/"]

[dependencies]
tokio = { version = ">=1.0.1", features = ["full"] }

src/main.rs:

use tokio::io::AsyncReadExt;
use tokio::net::TcpListener;

#[tokio::main]
async fn main() {
    let listener = TcpListener::bind("0.0.0.0:8080").await.unwrap();

    loop {
        let (mut tcp_stream, _) = listener.accept().await.unwrap();
        tokio::spawn(async move {
            let mut buf: [u8; 1024] = [0; 1024];
            loop {
                let n = tcp_stream.read(&mut buf).await.unwrap();
                if n == 0 {
                    return;
                }
                print!("{}", String::from_utf8_lossy(&buf[0..n]));
            }
        });
    }
}

This program listens on port 8080, and every time a client connects, it spawns an asynchronous task to deal with it.

If I run it with:

cargo run

It starts, and I can connect to it from multiple other processes like this:

telnet 0.0.0.0 8080

Anything I type into the telnet terminal window gets printed out in the terminal where I ran cargo run. The program works: it listens on TCP port 8080 and prints out all the messages it receives.

So what’s the problem?

The problem is that this program can be overwhelmed: if lots of processes connect to it, it will accept all the connections, and eventually run out of sockets. This might prevent other things working right on the computer, or it might crash our program, or something else. We need some kind of sensible limit, as the tokio tutorial mentions.

So how do we limit the number of people allowed to connect at the same time?

Just use a semaphore, dummy

A semaphore does exactly what we need here – it keeps a count of how many people are doing something, and prevents that number getting too big. So all we need to do is restrict the number of clients that we allow to connect using a semaphore.

Here was my first attempt:

use tokio::io::AsyncReadExt;
use tokio::net::TcpListener;
use tokio::sync::Semaphore;

#[tokio::main]
async fn main() {
    let listener = TcpListener::bind("0.0.0.0:8080").await.unwrap();
    let sem = Semaphore::new(2);

    loop {
        let (mut tcp_stream, _) = listener.accept().await.unwrap();
        // Don't copy this code: it doesn't work
        let aq = sem.try_acquire();
        if let Ok(_guard) = aq {
            tokio::spawn(async move {
                let mut buf: [u8; 1024] = [0; 1024];
                loop {
                    let n = tcp_stream.read(&mut buf).await.unwrap();
                    if n == 0 {
                        return;
                    }
                    print!("{}", String::from_utf8_lossy(&buf[0..n]));
                }
            });
        } else {
            println!("Rejecting client: too many open sockets");
        }
    }
}

This compiles fine, but it doesn’t do anything! Even though we called Semaphore::new with an argument of 2, intending to allow only 2 clients to connect, in fact I can still connect more times than that. It looks like our code changes had no effect at all.

What we were hoping to happen was that every time a client connected, we created _guard, which is a SemaphoreGuard, that occupies one of the slots in the semaphore. We were expecting that guard to live until the client disconnects, at which point the slot will be released.

Why doesn’t it work? It’s easy to understand when you think about what tokio::spawn does. It creates a task and asks for it to be executed in the future, but it doesn’t actually run it. So tokio::spawn returns immediately, and _guard is dropped, before the code that handles the request is executed. So, obviously, our change doesn’t actually restrict how many requests are being handled because the semaphore slot is freed up before the request is processed.

Just hold the guard for longer, dummy

So, let’s hold on to the SemaphoreGuard for longer:

use tokio::io::AsyncReadExt;
use tokio::net::TcpListener;
use tokio::sync::Semaphore;

#[tokio::main]
async fn main() {
    let listener = TcpListener::bind("0.0.0.0:8080").await.unwrap();
    let sem = Semaphore::new(2);

    loop {
        let (mut tcp_stream, _) = listener.accept().await.unwrap();
        let aq = sem.try_acquire();
        if let Ok(guard) = aq {
            tokio::spawn(async move {
                let mut buf: [u8; 1024] = [0; 1024];
                loop {
                    let n = tcp_stream.read(&mut buf).await.unwrap();
                    if n == 0 {
                        drop(guard);
                        return;
                    }
                    print!("{}", String::from_utf8_lossy(&buf[0..n]));
                }
            });
        } else {
            println!("Rejecting client: too many open sockets");
        }
    }
}

The idea is to pass the SemaphoreGuard object into the code that actually deals with the client request. The way I’ve attempted that is by referring to guard somewhere within the async move closure. What I’ve actually done is tell it to drop guard when we are finished with the request, but actually any mention of that variable within the closure would have been enough to tell the compiler we want to move it in, and only drop it when we are done.

It all sounds reasonable, but actually this code doesn’t compile. Here’s the error I get:

error[E0597]: `sem` does not live long enough
  --> src/main.rs:12:18
   |
12 |         let aq = sem.try_acquire();
   |                  ^^^--------------
   |                  |
   |                  borrowed value does not live long enough
   |                  argument requires that `sem` is borrowed for `'static`
...
29 | }
   | - `sem` dropped here while still borrowed

What the compiler is saying is that our SemaphoreGuard is referring to sem (the Semaphore object), but that the guard might live longer than the semaphore.

Why? Surely sem is held within a scope that includes the whole of the client-handling code, so it should live long enough?

No. Actually, the async move closure that we are passing to tokio::spawn is being added to a list of tasks to run in the future, so it could live much longer. The fact that we are inside an infinite loop confused me further here, but the principle still remains: whenever we make a closure like this and pass something into it, the closure must own it, or if we are borrowing it, it must live forever (which is what a 'static lifetime means).

The code above passes ownership of guard to the closure, but guard itself is referring to (borrowing) sem. This is why the compiler says that “sem is borrowed for 'static“.

Wrong things I tried

Because I didn’t understand what I was doing, I tried various other things like making sem an Arc, making guard an Arc, creating guard inside the closure, and even trying to make sem actually have 'static storage by making it a constant. (That last one didn’t work because only very simple types like numbers and strings can be constants.)

Solution: Share the Semaphore in an Arc

After what felt like too much thrashing around, I found what I think is the right answer:

use std::sync::Arc;
use tokio::io::AsyncReadExt;
use tokio::net::TcpListener;
use tokio::sync::Semaphore;

#[tokio::main]
async fn main() {
    let listener = TcpListener::bind("0.0.0.0:8080").await.unwrap();
    let sem = Arc::new(Semaphore::new(2));

    loop {
        let (mut tcp_stream, _) = listener.accept().await.unwrap();
        let sem_clone = Arc::clone(&sem);
        tokio::spawn(async move {
            let aq = sem_clone.try_acquire();
            if let Ok(_guard) = aq {
                let mut buf: [u8; 1024] = [0; 1024];
                loop {
                    let n = tcp_stream.read(&mut buf).await.unwrap();
                    if n == 0 {
                        return;
                    }
                    print!("{}", String::from_utf8_lossy(&buf[0..n]));
                }
            } else {
                println!("Rejecting client: too many open sockets");
            }
        });
    }
}

This code:

  • Creates a Semaphore and stores it inside an Arc, which is a reference-counting pointer that can be shared between tasks. This means it will live as long as someone holds a reference to it.
  • Clones the Arc so we have a copy that can be safely moved into the async move closure. We can’t move sem in to the closure because it’s going to get used again the next time around the loop. We can move sem_clone in to the closure because it’s not used anywhere else. sem and sem_clone both refer to the same Semaphore object, so they agree on the count of clients that are connected, but they are different Arc instances, so one can be moved into the closure.
  • Only aquires the SemaphoreGuard once we’re inside the closure. This way we’re not doing something difficult like borrowing a reference to something that lives outside the closure. Instead, we’re borrowing a reference via sem_clone, which is owned by the closure which we are inside, so we know it will live long enough.

It actually works! After two clients are connected, listener.accept actually opens a socket to any new client, but because we return almost immediately from the closure, we only hold it open very briefly before dropping it. This seemed preferable to refusing to open it at all, which I thought would probably leave clients hanging, waiting for a connection that might never come.

Lifetimes are cool, and tricky

Once again, I have learned a lot about what my code is really doing from the Rust compiler. I find this stuff really confusing, but hopefully by writing down my understanding in this post I have helped my current and future selves, and maybe even you, be clearer about how to share a semaphore between multiple asynchronous tasks.

It’s really fun and empowering to write code that I am reasonably confident is correct, and also works. The sense that “the compiler has my back” is strong, and I like it.

Recommendation against the use of WhatsApp in your company

Here is the email I just sent to the organisation I volunteer for. Feel free to adapt and use in your context.

Dear [organisation leaders],

Much of the tech industry (e.g. [1]) is warning against the use of WhatsApp due to its policy of collecting and sharing user information with third parties and the poor track record of its parent company (Facebook) on ethical issues (see examples [2] and [3], and many more).

The situation was made considerably worse with a recent change to the WhatsApp terms and conditions [4].

So, as your IT person I recommend not using WhatsApp for our work.

We already have an alternative available, and I would be really happy to help anyone who needs help setting it up.

[Details here of the alternative we use (Zulip) and how to use it. The simplest alternative to recommend is Signal.]

Thanks, Andy

[1] What Facebook and WhatsApp’s Data Sharing Plans Really Mean for User Privacy

[2] Facebook experimented with modifying people’s moods

[3] Facebook paid teens for total access to their phone activity

[4] If you’re a WhatsApp user, you’ll have to share your personal data

Streaming video with Owncast on a free Oracle Cloud computer

I just streamed about 40 minutes of me playing Trials Fusion using Owncast. Owncast is a self-hosted alternative to streaming services like Twitch and YouTube live.

Normally, you would need to pay for a computer to self-host it on. Owncast suggest this will cost about $5/month.

But, Oracle Cloud has a “Always Free” tier that includes a “Compute Instance” (a virtual machine running Linux) that is capable of running Owncast.

Here’s how I did it:

Register for Oracle Cloud

This was probably the worst bit.

I went to oraclecloud.com and clicked “Sign up for free cloud tier”. It didn’t work in Firefox(!) so I had to use Chromium.

I had to enter my name, address, email address, phone number and credit card details. The email was verified, the phone number was verified (with a text message), and the credit card was verified (with a real transaction), so there was no getting around any of it.

They promise that they won’t charge my card. I’ll let you know if I discover differently.

Create a Compute Instance

Once I was logged in to the Oracle “console” (web site), I clicked the burger menu in the top left, chose “Compute” and then “Instances” to create a new instance. I followed all the default settings (including using the default “image”, which meant my instance was running Oracle Linux, which I think is similar to Red Hat), and when I got to the ssh keys part, I supplied the public key of my existing SSH key pair. Read the docs there if you don’t have one of these.

As soon as that was done, and I waited for the instance to be created and started, I was able to SSH in to my instance using a username of opc and the Public IP Address listed:

ssh opc@PUBLIC_IP

(Note: here and below, if I say “PUBLIC_IP”, I mean the IP address listed in the information about your compute instance. It should be a list of four numbers separated by dots.)

Allow connecting to the instance on different ports

Owncast listens for HTTP connections on port 8080, and RTMP streams on 1935, so I needed to do two things to make that work.

Modify the Security List to add Ingress Rules

  • On the information about my instance, I clicked on the name of the Subnet (under Primary VNIC).
  • In the subnet, I clicked the name of the Security List (“Default Security List for …”) in the Security Lists list.
  • In the Security List I clicked Add Ingress Rules and entered:
    Stateless: unchecked
    Source Type: CIDR
    Source CIDR: 0.0.0.0/0
    IP Protocol: TCP
    Source Port Range: (blank)
    Destination Port Range: 8080
    Description: (blank)

    and then clicked Add Ingress Rules to create the rule.

  • I then added another Ingress Rule that was identical, except Destination Port Range was 1935.

Allow ports 8080 and 1935 on the instance’s own firewall

It took me a long time to figure out, but it turns out the Oracle Linux running on the Compute Instance has its own firewall. Eventually, thanks to a blog post by meinside: When Oracle Cloud’s Ubuntu instance doesn’t accept connections to ports other than 22, and some Oracle docs on ways to secure resources, I found that I needed to SSH in to the machine (like I showed above) and run these commands:

sudo firewall-cmd --zone=public --permanent --add-port=8080/tcp
sudo firewall-cmd --zone=public --permanent --add-port=1935/tcp
sudo firewall-cmd --reload

Now I was able to connect to the services I ran on the machine on those ports.

Install Owncast

The Owncast install was incredibly easy. I just followed the instructions at Owncast Quickstart. I SSHd in to the instance as before, and ran:

curl -s https://owncast.online/install.sh | bash

and then edited the file owncast/config.yaml to have a custom stream key in it. You can do that by typing:

nano owncast/config.yaml

There is information about this file at: owncast.online/docs/configuration.

Run Owncast

I ran the service like this:

cd owncast
./owncast

In future, if I want to leave it running, I may run it inside screen, or even use systemd or similar.

Open the web site

I could now see the web site by typing this into my browser’s address bar:

http://PUBLIC_IP:8080

(Where PUBLIC_IP is the Public IP copied from the Instance info as before.)

Stream some video

Finally, in OBS‘s Settings I chose the Stream section and entered:

Service: Custom...
Server: rtmp://PUBLIC_IP/live
Stream key: STREAM_KEY

Where “STREAM_KEY” means the stream key I added to config.yaml earlier.

Now, when I clicked “Start Streaming” in OBS, my stream appeared on the web site!

Costs and limits

Oracle stated during sign-up that I would not be charged unless I explicitly chose to use a different tier.

The Compute Instance is part of the “Always Free” tier, so in theory it should stay up and working.

However, if you use lots of resources (which streaming for a long time probably does), I would expect services would be throttled and/or stopped completely. I have no idea whether they will allow enough resources for regular streaming, or whether this is all waste of time. We shall see.

Pinephone update

I got a Pinephone for Christmas!

Here is quick summary of my experience with it. (Originally published on mastodon.)


Update on the pinephone as promised.

I love it, but I would definitely not recommend expecting to use it as your actual phone.

I have the Manjaro Phosh edition. Phosh is GNOME customised for mobile.

It turns on, you can unlock it, and you get a launcher. It has apps, and some of them work.

Firefox works really well. I can use it for Youtube and loads of other sites. I installed uBlock Origin, and it works.

Adding my Nextcloud config to Phosh seamlessly gave me Calendar, Contact and TODO list apps working, with my data in them.

The Maps app found me easily via GPS. I could bring up directions by entering a from and to, but it didn't seem to want to guide me via GPS.

Several apps don't fit properly on screen, and there doesn't seem to be a way to scroll or move the windows.

The camera technically works but the picture looks terrible (squashed, wibbly and blue-coloured).

Scrolling around on the launcher updates at about 5-10 fps, which is fine but would put many people off.

Many of the apps available to install in the Software app don't really work. I assume the list of apps is the standard for GNOME or Manjaro, so many are not adapted for phones.

I _love_ the fact that all the work that has been put into desktop Linux can be re-used on phones. Why wasn't it always this way?

It's great to be able to buy hardware that is specifically designed to run properly free software.

The Terminal app works nicely and presents a keyboard with extra keys that you need in a terminal.

The settings app works nicely.

My biggest frustration was not being able to find software in the Software app that worked nicely.

I was looking for a Youtube app that protected my privacy. On Android I use NewPipe Legacy. On desktop I use Freetube. I couldn't find Freetube in Software. I tried Minitube but it was unusable (window didn't fit).

I haven't tried installing software from the command line. Maybe I can find (or build) Freetube via a Manjaro repo?

Or maybe I should investigate NewPipe Legacy via anbox, although that seems to miss the point a little :-)

Is your program a function or a service?

Maybe everyone knows this already, but for my own clarity, I think there are really two types of computer program:

  • A function: something that you run, and get back a result. Example: a command-line tool like ls
  • A service: something that sits around waiting for things to happen, and responds to them. Example: a web server

How functions work

Programs that are essentially functions should:

  • Validate their input and stop if it is wrong
  • Stop when they have finished their job*
  • Let you know whether they succeeded or failed

*The Halting Problem shows that you can’t prove they stop, so I won’t ask you to do that.

Writing functions is relatively easy.

How services work

Programs that are services should:

  • Start when you tell them to start, even when things are not right
  • Keep running until you tell them to stop, even when bad things happen
  • Tell the user about problems via some communication mechanism

Writing services seems a little harder than writing functions.

What about UIs?

I suggest that programs with UIs are just a special case of services. Do you agree?

What about let-it-crash?

I think that let-it-crash is a good way to build services, but when you build a service that way, I consider the whole system to be the real service: this means the code we are writing, plus the runtime. In this case, the runtime is responsible for keeping the service running (by restarting it), and telling the user about problems.

In effect, let-it-crash allows us to write programs that look like functions (which I claim is easier), and still have them behave like services, because the runtime does the extra work for us. Erlang seems like a good example of this.

What are the implications?

If you are writing a service, your program should start when asked, and keep running until it is asked to stop, even if things are bad.

For example:

  • a service that relies on a data source should keep running when that data source is unavailable, and emit errors saying that it is unable to work. It should start working when the data source becomes available. (Again, if you implement this behaviour by using a runtime that allows you to write in a let-it-crash style, good for you.)
  • a service that relies on the existence of a directory should probably create that directory if it doesn’t exist.
  • a service that needs config might want to start up with sane defaults if the config is not supplied. Or maybe it should complain loudly and poll for the file to be created?

Why not stop when things are wrong?

  • Using this approach, it doesn’t matter the order of starting services. The more services we have, the more painful it is to have an order we must follow.
  • It’s nice when things are predictable. We expect services to keep running under normal circumstances. Using this approach, our expectations are not wrong when things go wrong.

What are the down sides?

  • You must pay attention to the error reporting coming from running services – they may not be working.
  • Services will still stop, due to bugs, or at least due to hardware failures, so you still have to pay attention to whether services are running.

More: 12 Fractured Apps