Maybe everyone knows this already, but for my own clarity, I think there are really two types of computer program:
- A function: something that you run, and get back a result. Example: a command-line tool like ls
- A service: something that sits around waiting for things to happen, and responds to them. Example: a web server
How functions work
Programs that are essentially functions should:
- Validate their input and stop if it is wrong
- Stop when they have finished their job*
- Let you know whether they succeeded or failed
*The Halting Problem shows that you can’t prove they stop, so I won’t ask you to do that.
Writing functions is relatively easy.
How services work
Programs that are services should:
- Start when you tell them to start, even when things are not right
- Keep running until you tell them to stop, even when bad things happen
- Tell the user about problems via some communication mechanism
Writing services seems a little harder than writing functions.
What about UIs?
I suggest that programs with UIs are just a special case of services. Do you agree?
What about let-it-crash?
I think that let-it-crash is a good way to build services, but when you build a service that way, I consider the whole system to be the real service: this means the code we are writing, plus the runtime. In this case, the runtime is responsible for keeping the service running (by restarting it), and telling the user about problems.
In effect, let-it-crash allows us to write programs that look like functions (which I claim is easier), and still have them behave like services, because the runtime does the extra work for us. Erlang seems like a good example of this.
What are the implications?
If you are writing a service, your program should start when asked, and keep running until it is asked to stop, even if things are bad.
For example:
- a service that relies on a data source should keep running when that data source is unavailable, and emit errors saying that it is unable to work. It should start working when the data source becomes available. (Again, if you implement this behaviour by using a runtime that allows you to write in a let-it-crash style, good for you.)
- a service that relies on the existence of a directory should probably create that directory if it doesn’t exist.
- a service that needs config might want to start up with sane defaults if the config is not supplied. Or maybe it should complain loudly and poll for the file to be created?
Why not stop when things are wrong?
- Using this approach, it doesn’t matter the order of starting services. The more services we have, the more painful it is to have an order we must follow.
- It’s nice when things are predictable. We expect services to keep running under normal circumstances. Using this approach, our expectations are not wrong when things go wrong.
What are the down sides?
- You must pay attention to the error reporting coming from running services – they may not be working.
- Services will still stop, due to bugs, or at least due to hardware failures, so you still have to pay attention to whether services are running.
More: 12 Fractured Apps