Goodness in programming languages, part 4 – Ownership & Memory

Posts in this series: Syntax, Deployment, Metaprogramming, Ownership

There is often a trade-off between programming language features and how fast (and predictably) the programs run. From web sites that serve millions of visitors to programs running on small devices we need to be able to make our programs run quickly.

One trade-off that is made in many modern programming languages (including Python, Ruby, C#, Java and JVM-based languages) is that the system owns all the memory. This avoids the need for the programmer to think about how long pieces of memory need to live, but it means a lot of memory can hang around a lot longer than it really needs to. In addition, it can mean the CPU has to jump around to lots of different memory locations to find pieces of dynamically-allocated memory in different locations. Where this jumping around causes caches to be invalidated that can really slow things down.

While these garbage collection-based languages have been evolving, C++ has been developing along a different track. C++ allows the programmer to allocate and free up memory manually (as in C), but over time the community of C++ programmers has been developing a new way of thinking about memory, and developing tools in the C++ language to make it easier to work in this way.

Modern C++ code rarely or never uses “delete” or “free” to deallocate memory, but instead defines clearly which object owns each other object. When the owning object is no longer needed, everything it owns can be deleted, immediately freeing their memory. The top-level objects are owned by the current scope, so when the function or block of code we are in ends, the system knows these objects and the ones they own can be deleted. Objects that last for the whole life of the program are owned by the scope of the main function or equivalent.

One advantage of explicit ownership is that the right thing happens automatically when something unexpected happens (e.g. an exception is thrown, or we return early from a function). Because the objects are owned by a scope, as soon as we exit that scope they are automatically deleted, and no memory is “leaked”.

Because ownership is explicit, we can often group owned objects in memory immediately next to the objects that own them. This means we jump around to different memory locations less often, and we have to do less work to find and delete regions of memory. This makes our programs faster.

Here are some things I like:

  • Modern C++’s clarity about who owns what. By expressing ownership explicitly we make clear our intentions, and avoid memory leaks.
  • Modern C++’s fast and cache-friendly memory handling. Allocating memory for several objects together reduces time spent looking for space, and means caches are more likely to be used.

In my experience, the most frequent performance problems I have had to solve have really been memory problems. Explicit ownership can reduce unnecessary memory management overhead by taking back the work from the system (the garbage collector) and allowing programmers to be explicit about who owns what.

How to use git (the basics)

Series: Why git?, Basics, Branches, Merging, Remotes

Git is a very powerful tool, but somewhat intimidating at first. I will be making some videos working through how to use it step by step.

First, we look at how to track your own code on your own computer, and then get a brief look at a killer feature: stash, which lets you pause what you were doing and come back to it later.

Slides: How to use git (the basics) slides.

setUp and tearDown considered harmful

Some unit test frameworks provide methods (often called setUp and tearDown, or annotated with @Before and @After) that are called automatically before a unit test executes, and afterwards.

This structure is presumably intended to avoid repetition of code that is identical in all the tests within one test file.

I have always instinctively avoided using these methods, and when a colleague used them recently I thought I should try to write up why I feel negative about them. Here it is:

Update: note I am talking about unit tests here. I know these frameworks can be used for other types of test, and maybe in that context these methods could be useful.

1. It’s action at a distance

setUp and tearDown are called automatically, with no indication in your code that you use them, or don’t use them. They are “magic”, and everyone hates magic.

If someone is reading your test (because it broke, probably) they don’t know whether some setUp will be called without manually scanning your code to find out whether it exists. Do you hate them?

2. setUp contains useless stuff

How many tests do you have in one file? When you first write it, maybe, just maybe, all the tests need the exact same setup. Later, you’ll write new tests that only use part of it.

Very soon, you grow an uber-setUp that does all the setup for various different tests, creating objects you don’t need. This adds complexity for everyone who has to read your tests – they don’t know which bits of setUp are used in this test, and which are cruft for something else.

3. They require member variables

The only useful work you can do inside setUp and tearDown is creating and modifying member variables.

Now your tests aren’t self-contained – they use these member variables, and you must make absolutely sure that your test works no matter what state they are in. These member variables are not useful for anything else – they are purely an artifact of the choice to use setUp and tearDown.

4. A named function is better

When you have setup code to share, write a function or method. Give it a name, make it return the thing it creates. By giving it a name you make your test easier to read. By returning what it creates, you avoid the use of member variables. By avoiding the magic setUp method, you give yourself the option of calling more than one setup function, making code re-use more granular (if you want).

5. What goes in tearDown?

If you’re using tearDown, what are you doing?

Are you tearing down some global state? I thought this was a unit test?

Are you ensuring nothing is left in an unpredictable state for future tests? Surely those tests guarantee their state at the start?

What possible use is there for this function?

Conclusion

A unit test should be a self-contained pure function, with no dependencies on other state. setUp and tearDown force you to depend on member variables of your test class, for no benefits over named functions, except that you don’t have to type their names. I consider typing the name of a properly-named function to be a benefit.

Why use git for source control?

Series: Why git?, Basics, Branches, Merging, Remotes

Putting your code in git is fast, flexible and powerful. You can track versions on a single machine, or scale up to thousands of people working together, with sub-teams, reviews and cherry-picking of changes. Don’t fear branching any more:

Slides: Why use git for source control? slides.