Shutdown order consistency: how Rust helps

Some Java code with bugs

Here’s my main method (in Java). Can you guess the bug?

Db db = new Db();
Monitoring monitoring = new Monitoring();
Monitoring mon2 = new Monitoring();
Billing billing = new Billing(db, monitoring);
monitoring.setDb(db);

runMainLoop(billing, mon2);

db.stop();
billing.stop();
monitoring.stop();

If you would like to hunt down the 2 bugs manually, try reading the full code here: ShutdownOrder.java

But maybe you have an idea already? Maybe you’ve seen code like this before? If you have, you probably have an instinct that there’s some kind of bug, even if you can’t say for sure what it is. Code like this almost always has bugs!

This code compiles fine, but it contains two bugs.

First, we forgot to setDb() on mon2. This causes a NullPointerException, because Monitoring expects always to have a working Db.

Second, and in general harder to spot, we shut down our services in the wrong order. It turns out that Monitoring uses its Db during shutdown, so we get an exception. Even worse, if some other code needed to run after monitoring.stop(), it won’t, because the exception prevents us getting any further.

Of course, this is toy code, but this kind of problem is common (and much harder to spot) in real-life code. In fact, my team dealt with a similar bug this week.

It’s fundamentally hard to figure out your shutdown order. It’s complicated further if classes have start() methods too, which I have seen in lots of Java code.

Given that this is just a hard problem, maybe there’s no point looking for tools to make it easier?

Some Rust code without those bugs

Let’s try writing this code in Rust. Here’s the main method:

let db = Db::new();
let monitoring = Monitoring::new(&db);
let mon2 = Monitoring::new(&db);
let billing = Billing::new(&db, &monitoring);

run_main_loop(&billing, &mon2);

// drop() is called automatically on all objects here

Here’s the full code: shutdown_order.rs

This code shuts down all the services automatically at the end, and any mistakes we make in the order are compile errors, not things we find later when our code is running.

The code to shut down each service looks like this:

impl Drop for Monitoring<'_> {
    fn drop(&mut self) {
        // [Disconnect from monitoring API]
        self.db.add_record("MonitorShutDown");
    }
}

This is us implementing the Drop trait for the struct Monitoring (traits are a bit like Java Interfaces). The Drop trait is special: it indicates what to do when an instance of this struct is dropped. In Rust, this is guaranteed to happen when the instance goes out of scope, which is why our comment at the end of the main method sounds so confident.

Furthermore, Rust’s compiler shuts down everything in the reverse order in which it was created, and guarantees that nothing gets used after it has been dropped.

Rust’s lovely world gives us two relevant treats: no unexpected nulls, and lifetimes.

Treat number 1: no unexpected nulls

First, in Rust, like in other modern languages like Kotlin, we have to be explicit about items that could be missing. In our example, we were able to re-arrange the code so that db can never be missing (or null), and the compiler encouraged us to do so. If we really needed it to be missing some of the time, we could have used the Option type, and the compiler would have forced us to handle the case when it was missing, instead of unexpectedly getting a NullPointerException like we did in Java. (In fact, if we’d structured our code to use final in as many places as possible, we could have been encouraged towards basically the same solution in Java too.)

Treat number 2: lifetimes

Second, if you look a bit more closely at the full code of shutdown_order.rs you’ll see lots of confusing-looking annotations like <'a> and &'a:

struct Monitoring<'a> {
    db: &'a Db,
}

The approximate meaning of those annotations is: a Monitoring holds a reference to a Db, and that Db must last longer than the Monitoring.

This “lasts longer than” wording is what Rust Lifetimes are for. Lifetimes are a way of saying how long something lasts.

Lifetimes are really confusing when you start with Rust, and have caused me a lot of pain. Code like this is where they are both most painful and most helpful. As I mentioned earlier, the problem of shutdown order is fundamentally hard. Rust gives you that pain at the beginning, and until you understand what’s going on, the pain is very confusing and acute. But, once your code compiles, it is correct, at least as far as problems like this are concerned.

I love the sense of security it gives me to write Rust code and know the compiler has checked my code for this kind of problem, meaning it can’t crop up at 3am on Christmas Day…

Final note/caveat

This Rust code is probably over-simplified, because all the references are immutable (you can’t change the objects they point to). In practice, we may well have mutable references, and if we do we’re going have to deal with the further difficulty that Rust won’t allow two different objects to hold references to an object if any of those references are mutable. So it would object to Billing and Monitoring using the Db object at the same time. We’d need to make it immutable (as we have here), or find a different way of structuring the code: for example, we could hold the Db instance only within the run_main_loop code, and pass it in temporarily to the Billing and Monitoring objects when we called their methods. A large part of the art, fun and pain of learning Rust is finding new patterns for your code that do what you need to do and also keep the compiler happy. When you manage it, you get amazing benefits!

Edge computing providers

I’m looking into Edge computing at work. By Edge computing I mean running WASM programs in lots and lots of smallish computers in places near to actual people (rather than in huge cloud data centres). I think it’s cool because I love Rust, and Rust is the leading language to compile to WASM.

Here are some companies providing Edge computing services:

  • Fastly – good links with WASM community (hired Mozilla devs), and early adopters – custom WASM engine wasmtime.
  • Cloudflare – huge, and early adopters – WASM engine is Google V8.
  • AWS Lambda@Edge – docs are light on detail, but it looks like a real offering, probably.

Also-rans:

Who did I miss?

Struggling with Rust to figure out the right types for a function signature

I am loving writing code in Rust. So many things about the language and its ecosystem feel so right*.

* For example: ownership of objects, expressive type system, compile to native, offline API docs, immutability, high quality libraries.

One of the things I like about it is that I don’t feel like I need to use an IDE, so I can happily code in Vim with no clever plugins.

One thing an IDE might give me would be an “extract function” refactoring. In most languages I am happy to do that manually, because I can let the compile errors guide me on what my function should look like.

However, in Rust I sometimes find it’s hard to find the right signature for a function I want to extract, and I am struggling to persuade the compiler to help me.

Here is an example from my new listsync project, in listsync-client-rust.rs:

use actix_web::{middleware, App, HttpServer};
use listsync_client_rust;
// ...
#[actix_rt::main]
async fn main() -> std::io::Result<()> {
//...
    HttpServer::new(|| {
        App::new()
            .wrap(listsync_client_rust::cookie_session::new_session())
            .wrap(middleware::Logger::default())
            .configure(listsync_client_rust::config)
    })
//...

I would like to extract the code highlighted above, the creation of an App, into a separate function, like this:

fn new_app() -> ??? {
    App::new()
        .wrap(listsync_client_rust::cookie_session::new_session())
        .wrap(middleware::Logger::default())
        .configure(listsync_client_rust::config)
}
//...
    HttpServer::new(|| {
        new_app()
    })

Simple, right? To find out what the return type of the function should be, I can just make a bad guess, and get the compiler to tell me what I did wrong. In this case, I will guess by changing the question marks above into i32, and run cargo test. I get quite a few errors, one of which is:

error[E0277]: the trait bound `i32: actix_service::IntoServiceFactory<_>` is not satisfied
  --> src/bin/listsync-client-rust.rs:27:5
   |
27 | /     HttpServer::new(|| {
28 | |         new_app()
29 | |     })
   | |______^ the trait `actix_service::IntoServiceFactory<_>` is not implemented for `i32`
   |
   = note: required by `actix_web::server::HttpServer`

So the first problem I see is that the error message I am seeing is about the later code, and there are no errors about my new function.

I obviously went a little too fast. Let’s change the HttpServer::new code back to how it was before, and only make a new function new_app. Now I get an error that should help me:

error[E0308]: mismatched types
  --> src/bin/listsync-client-rust.rs:12:5
   |
11 |   fn new_app() -> i32 {
   |                   --- expected `i32` because of return type
12 | /     App::new()
13 | |         .wrap(listsync_client_rust::cookie_session::new_session())
14 | |         .wrap(middleware::Logger::default())
15 | |         .configure(listsync_client_rust::config)
   | |________________________________________________^ expected i32, found struct `actix_web::app::App`
   |
   = note: expected type `i32`
              found type `actix_web::app::App<impl actix_service::ServiceFactory, actix_web::middleware::logger::StreamLog<actix_http::body::Body>>`

So the compiler has told us what type we are returning! Let’s copy that into the type signature of the function:

use actix_service::ServiceFactory;
use actix_http::body::Body;
// ...
fn new_app() -> App<impl ServiceFactory, middleware::logger::StreamLog<Body>> {
// ...

The first error I get from the compiler is a distraction:

error[E0432]: unresolved import `actix_service`
 --> src/bin/listsync-client-rust.rs:1:5
  |
1 | use actix_service::ServiceFactory;
  |     ^^^^^^^^^^^^^ use of undeclared type or module `actix_service`

I can fix it by adding actix-service = "1.0.5" to Cargo.toml. (I found the version by looking in Cargo.lock, since this dependency was already implicitly used – I just need to make it explicit if I am going to use it directly.)

Once I do that I get the next error:

error[E0603]: module `logger` is private
  --> src/bin/listsync-client-rust.rs:13:54
   |
13 | fn new_app() -> App<impl ServiceFactory, middleware::logger::StreamLog<Body>> {
   |                                                      ^^^^^^

This leaves me a bit stuck: I can’t use StreamLog because it’s in a private module.

More importantly, it makes the point that I don’t actually want to be as specific as I am being: I don’t care what the exact type parameters for App are – I just want to return an App of some kind and have the compiler fill in the blanks. Ideally, if I change the body of new_app later, for example to add another wrap call that changes the type of App we are returning, I’d like to leave the return type the same and have it just work.

With that in mind, I took a look at the type that HttpServer::new takes in. Here is impl HttpServer:

impl<F, I, S, B> HttpServer<F, I, S, B> where
    F: Fn() -> I + Send + Clone + 'static,
    I: IntoServiceFactory<S>,
    S: ServiceFactory<Config = AppConfig, Request = Request>,
    S::Error: Into<Error> + 'static,
    S::InitError: Debug,
    S::Response: Into<Response<B>> + 'static,
    <S::Service as Service>::Future: 'static,
    B: MessageBody + 'static, 

and HttpServer::new looks like:

pub fn new(factory: F) -> Self

So it takes in a function which actually makes the App, and the type of that function is F, which is a Fn which returns a I + Send + Clone + 'static. From the declaration of impl HttpServer we can see that the type of I depends on S and B, which have quite complex types. Let’s paste the whole thing in:

use actix_http::{Error, Request, Response};
use actix_service::IntoServiceFactory;
use actix_web::body::MessageBody;
use actix_web::dev::{AppConfig, Service};
use core::fmt::Debug;
// ...
fn new_app<I, S, B>() -> I
where
    I: IntoServiceFactory<S> + Send + Clone + 'static,
    S: ServiceFactory<Config = AppConfig, Request = Request>,
    S::Error: Into<Error> + 'static,
    S::InitError: Debug,
    S::Response: Into<Response<B>> + 'static,
    <S::Service as Service>::Future: 'static,
    B: MessageBody + 'static,
{
    App::new()
        .wrap(listsync_client_rust::cookie_session::new_session())
        .wrap(middleware::Logger::default())
        .configure(listsync_client_rust::config)
}

Note that I had to modify I to include the extra requirements on the return type of F from the definition of HttpServer. (I think I did the right thing, but I’m not sure. If I just remove the + Send + Clone + 'static it seems to behave similarly.)

Now I get this error from the compiler:

error[E0308]: mismatched types
  --> src/bin/listsync-client-rust.rs:27:5
   |
17 |   fn new_app<I, S, B>() -> I
   |                            - expected `I` because of return type
...
27 | /     App::new()
28 | |         .wrap(listsync_client_rust::cookie_session::new_session())
29 | |         .wrap(middleware::Logger::default())
30 | |         .configure(listsync_client_rust::config)
   | |________________________________________________^ expected type parameter, found struct `actix_web::app::App`
   |
   = note: expected type `I`
              found type `actix_web::app::App<impl actix_service::ServiceFactory, actix_web::middleware::logger::StreamLog<actix_http::body::Body>>`
   = help: type parameters must be constrained to match other types
   = note: for more information, visit https://doc.rust-lang.org/book/ch10-02-traits.html#traits-as-parameters

The compiler really tries to help here, suggesting I read a chapter of the Rust Book, but even after reading it I could not figure out how to do what I am trying to do.

Can anyone help me?

Wouldn’t it be amazing if there were some way the compiler could give me easier-to-understand help to figure this out?