Branches in git

Series: Why git?, Basics, Branches, Merging, Remotes

Last time we discussed starting a project and committing changes.

Now we look at how to create branches, which are one of the main reasons for having source control. We’ll cover creating branches, switching between them, and the simplest parts of merging from one to another.

Slides: Branches in git.

Passing several values through a pipe in bash

I have been fiddling with some git-related shell scripts, and decided to try and follow the same approach as git in their structure. This means using the Unix system where each piece of functionality is a separate script (or executable) that communicates by using command-line arguments, reading from the standard input stream, and writing output to the standard output stream.

This allows each piece of functionality to be written in any programming or scripting language. In git’s case this has allowed initial versions to be written in bash or perl and later optimised versions (sometimes written in C) to be dropped in, piece by piece. It’s an incredibily flexible way of working and can also be very efficient.

Most of my prototyping has been in bash, and I’ve found sometimes I need to write out multiple values from a script and collect them as input in another script.

Writing the output is simple:

#!/bin/bash

# outputter.bash

# Imagine A, B and C have been created by some complex process:
A="foo bar"
B="  bar"
C="baz   "

# At the end of our script we simply write them out on separate lines in a known order
echo "${A}"
echo "${B}"
echo "${C}"

But reading them in somewhere else gave me some trouble until I learned this recipe:

#!/bin/bash

# inputter.bash

# Read in the values one per line:
IFS=$'\n' read A
IFS=$'\n' read B
IFS=$'\n' read C

# Now we can use them.
echo "A='${A}'"
echo "B='${B}'"
echo "C='${C}'"

And now the values transfer succesfully, preserving whitespace:

$ ./outputter.bash | ./inputter.bash 
A='foo bar'
B='  bar'
C='baz   '

The recipe uses bash’s built-in read command to populate the variables, but sets the IFS variable (Internal Field Separator) to a newline, meaning all the whitespace in the line is treated as part of the value to be read. The $'\n' syntax is a literal newline.

Goodness in programming languages, part 4 – Ownership & Memory

Posts in this series: Syntax, Deployment, Metaprogramming, Ownership

There is often a trade-off between programming language features and how fast (and predictably) the programs run. From web sites that serve millions of visitors to programs running on small devices we need to be able to make our programs run quickly.

One trade-off that is made in many modern programming languages (including Python, Ruby, C#, Java and JVM-based languages) is that the system owns all the memory. This avoids the need for the programmer to think about how long pieces of memory need to live, but it means a lot of memory can hang around a lot longer than it really needs to. In addition, it can mean the CPU has to jump around to lots of different memory locations to find pieces of dynamically-allocated memory in different locations. Where this jumping around causes caches to be invalidated that can really slow things down.

While these garbage collection-based languages have been evolving, C++ has been developing along a different track. C++ allows the programmer to allocate and free up memory manually (as in C), but over time the community of C++ programmers has been developing a new way of thinking about memory, and developing tools in the C++ language to make it easier to work in this way.

Modern C++ code rarely or never uses “delete” or “free” to deallocate memory, but instead defines clearly which object owns each other object. When the owning object is no longer needed, everything it owns can be deleted, immediately freeing their memory. The top-level objects are owned by the current scope, so when the function or block of code we are in ends, the system knows these objects and the ones they own can be deleted. Objects that last for the whole life of the program are owned by the scope of the main function or equivalent.

One advantage of explicit ownership is that the right thing happens automatically when something unexpected happens (e.g. an exception is thrown, or we return early from a function). Because the objects are owned by a scope, as soon as we exit that scope they are automatically deleted, and no memory is “leaked”.

Because ownership is explicit, we can often group owned objects in memory immediately next to the objects that own them. This means we jump around to different memory locations less often, and we have to do less work to find and delete regions of memory. This makes our programs faster.

Here are some things I like:

  • Modern C++’s clarity about who owns what. By expressing ownership explicitly we make clear our intentions, and avoid memory leaks.
  • Modern C++’s fast and cache-friendly memory handling. Allocating memory for several objects together reduces time spent looking for space, and means caches are more likely to be used.

In my experience, the most frequent performance problems I have had to solve have really been memory problems. Explicit ownership can reduce unnecessary memory management overhead by taking back the work from the system (the garbage collector) and allowing programmers to be explicit about who owns what.

How to use git (the basics)

Series: Why git?, Basics, Branches, Merging, Remotes

Git is a very powerful tool, but somewhat intimidating at first. I will be making some videos working through how to use it step by step.

First, we look at how to track your own code on your own computer, and then get a brief look at a killer feature: stash, which lets you pause what you were doing and come back to it later.

Slides: How to use git (the basics) slides.

setUp and tearDown considered harmful

Some unit test frameworks provide methods (often called setUp and tearDown, or annotated with @Before and @After) that are called automatically before a unit test executes, and afterwards.

This structure is presumably intended to avoid repetition of code that is identical in all the tests within one test file.

I have always instinctively avoided using these methods, and when a colleague used them recently I thought I should try to write up why I feel negative about them. Here it is:

Update: note I am talking about unit tests here. I know these frameworks can be used for other types of test, and maybe in that context these methods could be useful.

1. It’s action at a distance

setUp and tearDown are called automatically, with no indication in your code that you use them, or don’t use them. They are “magic”, and everyone hates magic.

If someone is reading your test (because it broke, probably) they don’t know whether some setUp will be called without manually scanning your code to find out whether it exists. Do you hate them?

2. setUp contains useless stuff

How many tests do you have in one file? When you first write it, maybe, just maybe, all the tests need the exact same setup. Later, you’ll write new tests that only use part of it.

Very soon, you grow an uber-setUp that does all the setup for various different tests, creating objects you don’t need. This adds complexity for everyone who has to read your tests – they don’t know which bits of setUp are used in this test, and which are cruft for something else.

3. They require member variables

The only useful work you can do inside setUp and tearDown is creating and modifying member variables.

Now your tests aren’t self-contained – they use these member variables, and you must make absolutely sure that your test works no matter what state they are in. These member variables are not useful for anything else – they are purely an artifact of the choice to use setUp and tearDown.

4. A named function is better

When you have setup code to share, write a function or method. Give it a name, make it return the thing it creates. By giving it a name you make your test easier to read. By returning what it creates, you avoid the use of member variables. By avoiding the magic setUp method, you give yourself the option of calling more than one setup function, making code re-use more granular (if you want).

5. What goes in tearDown?

If you’re using tearDown, what are you doing?

Are you tearing down some global state? I thought this was a unit test?

Are you ensuring nothing is left in an unpredictable state for future tests? Surely those tests guarantee their state at the start?

What possible use is there for this function?

Conclusion

A unit test should be a self-contained pure function, with no dependencies on other state. setUp and tearDown force you to depend on member variables of your test class, for no benefits over named functions, except that you don’t have to type their names. I consider typing the name of a properly-named function to be a benefit.