Widely used programming languages: past, present, and future

Programming languages are like pop groups in that they have followers, fans and supporters; new ones are constantly being created and some eventually become widely popular, while those that were once popular slowly fade away or mutate into something else.

Creating a language is a relatively popular activity. Science fiction and fantasy authors have been doing it since before computers existed, e.g., the Elf language Quenya devised by Tolkien, and in the computer age Star Trek’s Klingon. Some very good how-to books have been written on the subject.

As soon as computers became available, people started inventing programming languages.

What have been the major factors influencing the growth to widespread use of a new programming languages (I’m ignoring languages that become widespread within application niches)?

Cobol and Fortran became widely used because there was widespread implementation support for them across computer manufacturers, and they did not have to compete with any existing widely used languages. Various niches had one or more languages that were widely used in that niche, e.g., Algol 60 in academia.

To become widely used during the mainframe/minicomputer age, a new language first had to be ported to the major computers of the day, whose products sometimes supported multiple, incompatible operating systems. No new languages became widely used, in the sense of across computer vendors. Some new languages were widely used by developers, because they were available on IBM computers; for several decades a large percentage of developers used IBM computers. Based on job adverts, RPG was widely used, but PL/1 not so. The use of RPG declined with the decline of IBM.

The introduction of microcomputers (originally 8-bit, then 16, then 32, and finally 64-bit) opened up an opportunity for new languages to become widely used in that niche (which would eventually grow to be the primary computing platform of its day). This opportunity occurred because compiler vendors for the major languages of the day did not want to cannibalize their existing market (i.e., selling compilers for a lot more than the price of a microcomputer) by selling a much lower priced product on microcomputers.

BASIC became available on practically all microcomputers, or rather some dialect of BASIC that was incompatible with all the other dialects. The availability of BASIC on a vendor’s computer promoted sales of the hardware, and it was not worthwhile for the major vendors to create a version of BASIC that reduced portability costs; the profit was in games.

The dominance of the Microsoft/Intel partnership removed the high cost of porting to lots of platforms (by driving them out of business), but created a major new obstacle to the wide adoption of new languages: Developer choice. There had always been lots of new languages floating around, but people only got to see the subset that were available on the particular hardware they targeted. Once the cpu/OS (essentially) became a monoculture most new languages had to compete for developer attention in one ecosystem.

Pascal was in widespread use for a few years on micros (in the form of Turbo Pascal) and university computers (the source of Wirth’s ETH compiler was freely available for porting), but eventually C won developer mindshare and became the most widely used language. In the early 1990s C++ compiler sales took off, but many developers were writing C with a few C++ constructs scattered about the code (e.g., use of new, rather than malloc/free).

Next, the Internet took off, and opened up an opportunity for new languages to become dominant. This opportunity occurred because Internet related software was being made freely available, and established compiler vendors were not interested in making their products freely available.

There were people willing to invest in creating a good-enough implementation of the language they had invented, and giving it away for free. Luck, plus being in the right place at the right time resulted in PHP and Javascript becoming widely used. Network effects prevent any other language becoming widely used. Compatible dialects of PHP and Javascript may migrate widespread usage to quite different languages over time, e.g., Facebook’s Hack.

Java rode to popularity on the coat-tails of the Internet, and when it looked like security issues would reduce it to niche status, it became the vendor supported language for one of the major smart-phone OSs.

Next, smart-phones took off, but the availability of Open Source compilers closed the opportunity window for new languages to become dominant through lack of interest from existing compiler vendors. Smart-phone vendors wanted to quickly attract developers, which meant throwing their weight behind a language that many developers were already familiar with; Apple went with Objective-C (which evolved to Swift), Google with Java (which evolved to Kotlin, because of the Oracle lawsuit).

Where does Python fit in this grand scheme? I don’t yet have an answer, or is my world-view wrong to treat Python usage as being as widespread as C/C++/Java?

New programming languages continue to be implemented; I don’t see this ever stopping. Most don’t attract more users than their implementer, but a few become fashionable amongst the young, who are always looking to attach themselves to something new and shiny.

Will a new programming language ever again become widely used?

Like human languages, programming languages experience strong networking effects. Widely used languages continue to be widely used because many companies depend on code written in it, and many developers who can use it can obtain jobs; what company wants to risk using a new language only to find they cannot hire staff who know it, and there are not many people willing to invest in becoming fluent in a language with no immediate job prospects.

Today’s widely used programmings languages succeeded in a niche that eventually grew larger than all the other computing ecosystems. The Internet and smart-phones are used by everybody on the planet, there are no bigger ecosystems to provide new languages with a possible route to widespread use. To be widely used a language first has to become fashionable, but from now on, new programming languages that don’t evolve from (i.e., be compatible with) current widely used languages are very unlikely to migrate from fashionable to widely used.

It has always been possible for a proficient developer to dedicate a year+ of effort to create a new language implementation. Adding the polish need to make it production ready used to take much longer, but these days tool chains such as LLVM supply a lot of the heavy lifting. The problem for almost all language creators/implementers is community building; they are terrible at dealing with other developers.

It’s no surprise that nearly all the new languages that become fashionable originate with language creators who work for a company that happens to feel a need for a new language. Examples include:

  • Go created by Google for internal use, and attracted an outside fan base. Company languages are not new, with IBM’s PL/1 being the poster child (or is there a more modern poster child). At the moment Go is a trendy language, and this feeds a supply of young developers willing to invest in learning it. Once the trendiness wears off, Google will start to have problems recruiting developers, the reason: Being labelled as a Go developer limits job prospects when few other companies use the language. Talk to a manager who has tried to recruit developers to work on applications written in Fortran, Pascal and other once-widely used languages (and even wannabe widely used languages, such as Ada),
  • Rust a vanity project from Mozilla, which they have now abandoned. Did Rust become fashionable because it arrived at the right time to become the not-Google language? I await a PhD thesis on the topic of the rise and fall of Rust,
  • Microsoft’s C# ceased being trendy some years ago. These days I don’t have much contact with developers working in the Microsoft ecosystem, so I don’t know anything about the state of the C# job market.

Every now and again a language creator has the social skills needed to start an active community. Zig caught my attention when I read that its creator, Andrew Kelley, had quit his job to work full-time on Zig. Two and a-half years later Zig has its own track at FOSEM’21.

Will Zig become the next fashionable language, as Rust/Go popularity fades? I’m rooting for Zig because of its name, there are relatively few languages whose name starts with Z; the start of the alphabet is over-represented with language names. It would be foolish to root for a language because of a belief that it has magical properties (e.g., powerful, readable, maintainable), but the young are foolish.

Impact of function size on number of reported faults

Are longer functions more likely to contain more coding mistakes than shorter functions?

Well, yes. Longer functions contain more code, and the more code developers write the more mistakes they are likely to make.

But wait, the evidence shows that most reported faults occur in short functions.

This is true, at least in Java. It is also true that most of a Java program’s code appears in short methods (in C 50% of the code is contained in functions containing 114 or fewer lines, while in Java 50% of code is contained in methods containing 4 or fewer lines). It is to be expected that most reported faults appear in short functions. The plot below shows, left: the percentage of code contained in functions/methods containing a given number of lines, and right: the cumulative percentage of lines contained in functions/methods containing less than a given number of lines (code+data):

left: the percentage of code contained in functions/methods containing a given number of lines, and right: the cumulative percentage of lines contained in functions/methods containing less than a given number of lines.

Does percentage of program source really explain all those reported faults in short methods/functions? Or are shorter functions more likely to contain more coding mistakes per line of code, than longer functions?

Reported faults per line of code is often referred to as: defect density.

If defect density was independent of function length, the plot of reported faults against function length (in lines of code) would be horizontal; red line below. If every function contained the same number of reported faults, the plotted line would have the form of the blue line below.

Number of reported faults in C++ classes (not methods) containing a given number of lines.

Two things need to occur for a fault to be experienced. A mistake has to appear in the code, and the code has to be executed with the ‘right’ input values.

Code that is never executed will never result in any fault reports.

In a function containing 100 lines of executable source code, say, 30 lines are rarely executed, they will not contribute as much to the final total number of reported faults as the other 70 lines.

How does the average percentage of executed LOC, in a function, vary with its length? I have been rummaging around looking for data to help answer this question, but so far without any luck (the llvm code coverage report is over all tests, rather than per test case). Pointers to such data very welcome.

Statement execution is controlled by if-statements, and around 17% of C source statements are if-statements. For functions containing between 1 and 10 executable statements, the percentage that don’t contain an if-statement is expected to be, respectively: 83, 69, 57, 47, 39, 33, 27, 23, 19, 16. Statements contained in shorter functions are more likely to be executed, providing more opportunities for any mistakes they contain to be triggered, generating a fault experience.

Longer functions contain more dependencies between the statements within the body, than shorter functions (I don’t have any data showing how much more). Dependencies create opportunities for making mistakes (there is data showing dependencies between files and classes is a source of mistakes).

The previous analysis makes a large assumption, that the mistake generating a fault experience is contained in one function. This is true for 70% of reported faults (in AspectJ).

What is the distribution of reported faults against function/method size? I don’t have this data (pointers to such data very welcome).

The plot below shows number of reported faults in C++ classes (not methods) containing a given number of lines (from a paper by Koru, Eman and Mathew; code+data):

Number of reported faults in C++ classes (not methods) containing a given number of lines.

It’s tempting to think that those three curved lines are each classes containing the same number of methods.

What is the conclusion? There is one good reason why shorter functions should have more reported faults, and another good’ish reason why longer functions should have more reported faults. Perhaps length is not important. We need more data before an answer is possible.

Extreme value theory in software engineering

As its name suggests, extreme value theory deals with extreme deviations from the average, e.g., how often will rainfall be heavy enough to cause a river to overflow its banks.

The initial list of statistical topics to I thought ought to be covered in my evidence-based software engineering book included extreme value theory. At the time, and even today, there were/are no books covering “Statistics for software engineering”, so I had no prior work to guide my selection of topics. I was keen to cover all the important topics, had heard of it in several (non-software) contexts and jumped to the conclusion that it must be applicable to software engineering.

Years pass: the draft accumulate a wide variety of analysis techniques applied to software engineering data, but, no use of extreme value theory.

Something else does not happen: I don’t find any ‘Using extreme value theory to analyse data’ books. Yes, there are some really heavy-duty maths books available, but nothing of a practical persuasion.

The book’s Extreme value section becomes a subsection, then a subsubsection, and ended up inside a comment (I cannot bring myself to delete it).

It appears that extreme value theory is more talked about than used. I can understand why. Extreme events are newsworthy; rivers that don’t overflow their banks are not news.

Just over a month ago a discussion cropped up on the UK’s C++ standards’ panel mailing list: was email traffic down because of COVID-19? The panel’s convenor, Roger Orr, posted some data on monthly volumes. Oh, data :-)

Monthly data is a bit too granular for detailed analysis over relatively short periods. After some poking around Roger was able to send me the date&time of every post to the WG21‘s Core and Lib reflectors, since February 2016 (there have been various changes of hosts and configurations over the years, and date of posts since 2016 was straightforward to obtain).

During our email exchanges, Roger had mentioned that every now and again a huge discussion thread comes out of nowhere. Woah, sounds like WG21 could do with some extreme value theory. How often are huge discussion threads likely to occur, and how huge is a once in 10-years thread that they might have to deal with?

There are two techniques for analysing the distribution of extreme values present in a sample (both based around the generalized extreme value distribution):

  • Generalized Extreme Value (GEV) uses block maxima, e.g., maximum number of daily emails sent in each month,
  • Generalized Pareto (GP) uses peak over threshold: pick a threshold and extract day values for when more than this threshold number of emails was sent.

The plots below show the maximum number of monthly emails that are expected to occur (y-axis) within a given number of months (x-axis), for WG21’s Core and Lib email lists. The circles are actual occurrences, and dashed lines 95% confidence intervals; GEP was used for these fits (code+data):

Expected maximum for emails appearing on C++'s core and lib reflectors within a given period

The 10-year return value for Core is around a daily maximum of 70 +-30, and closer to 200 +-100 for Lib.

The model used is very simplistic, and fails to take into account the growth in members joining these lists and traffic lost when a new mailing list is created for a new committee subgroup.

If any readers have suggests for uses of extreme value theory in software engineering, please let me know.

Postlude. This discussion has reordered events. My original interest in the mailing list data was the desire to find some evidence for the hypothesis that the volume of email increased as the date of the next WG21 meeting approached. For both Core and Lib, the volume actually decreases slightly as the date of the next meeting approaches; see code for details. Also, the volume of email at the weekend is around 60% lower than during weekdays.

C++ template usage

Generics are a programming construct that allow an algorithm to be coded without specifying the types of some variables, which are supplied later when a specific instance (for some type(s)) is instantiated. Generics sound like a great idea; who hasn’t had to write the same function twice, with the only difference being the types of the parameters.

All of today’s major programming languages support some form of generic construct, and developers have had the opportunity to use them for many years. So, how often generics are used in practice?

In C++, templates are the language feature supporting generics.

The paper: How C++ Templates Are Used for Generic Programming: An Empirical Study on 50 Open Source Systems contains lots of interesting data :-) The following analysis applies to the five largest projects analysed: Chromium, Haiku, Blender, LibreOffice and Monero.

As its name suggests, the Standard Template Library (STL) is a collection of templates implementing commonly used algorithms+other stuff (some algorithms were commonly used before the STL was created, and perhaps some are now commonly used because they are in the STL).

It is to be expected that most uses of templates will involve those defined in the STL, because these implement commonly used functionality, are documented and generally known about (code can only be reused when its existence is known about, and it has been written with reuse in mind).

The template instantiation measurements show a 17:1 ratio for STL vs. developer-defined templates (i.e., 149,591 vs. 8,887).

What are the usage characteristics of developer defined templates?

Around 25% of developer defined function templates are only instantiated once, while 15% of class templates are instantiated once.

Most templates are defined by a small number of developers. This is not surprising given that most of the code on a project is written by a small number of developers.

The plot below shows the percentage instantiations (of all developer defined function templates) of each developer defined function template, in rank order (code+data):

Number of tasks having a given estimate.

Lines are each a fitted power law, whose exponents vary between -1.5 and -2. Is it just me, or are these exponents surprising close?

The following is for developer defined class templates. Lines are fitted power law, whose exponents vary between -1.3 and -2.6. Not so close here.

Number of tasks having a given estimate.

What processes are driving use of developer defined templates?

Every project has its own specific few templates that get used everywhere, by all developers. I imagine these are tailored to the project, and are widely advertised to developers who work on the project.

Perhaps some developers don’t define templates, because that’s not what they do. Is this because they work on stuff where templates don’t offer much benefit, or is it because these developers are stuck in their ways (if so, is it really worth trying to change them?)

How are C functions different from Java methods?

According to the right plot below, most of the code in a C program resides in functions containing between 5-25 lines, while most of the code in Java programs resides in methods containing one line (code+data; data kindly supplied by Davy Landman):

Number of C/Java functions of a given length and percentage of code in these functions.

The left plot shows the number of functions/methods containing a given number of lines, the right plot shows the total number of lines (as a percentage of all lines measured) contained in functions/methods of a given length (6.3 million functions and 17.6 million methods).

Perhaps all those 1-line Java methods are really complicated. In C, most lines contain a few tokens, as seen below (code+data):

Number of lines containing a given number of C tokens.

I don’t have any characters/tokens per line data for Java.

Is Java code mostly getters and setters?

I wonder what pattern C++ will follow, i.e., C-like, Java-like, or something else? If you have data for other languages, please send me a copy.

for-loop usage at different nesting levels

When reading code, starting at the first line of a function/method, the probability of the next statement read being a for-loop is around 1.5% (at least in C, I don’t have decent data on other languages). Let’s say you have been reading the code a line at a time, and you are now reading lines nested within various if/while/for statements, you are at nesting depth d. What is the probability of the statement on the next line being a for-loop?

Does the probability of encountering a for-loop remain unchanged with nesting depth (i.e., developer habits are not affected by nesting depth), or does it decrease (aren’t developers supposed to using functions/methods rather than nesting; I have never heard anybody suggest that it increases)?

If you think the for-loop use probability is not affected by nesting depth, you are going to argue for the plot on the left (below, showing number of loops whose compound-statement contains appearing in C source at various nesting depths), with the regression model fitting really well after 3-levels of nesting. If you think the probability decreases with nesting depth, you are likely to argue for the plot on the right, with the model fitting really well down to around 10-levels of nesting (code+data).

Number of C for-loops whose enclosed compound-statement contains basic blocks nested to a given depth.

Both plots use the same data, but different scales are used for the x-axis.

If probability of use is independent of nesting depth, an exponential equation should fit the data (i.e., the left plot), decreasing probability is supported by a power-law (i.e, the right plot; plus other forms of equation, but let’s keep things simple).

The two cases are very wrong over different ranges of the data. What is your explanation for reality failing to follow your beliefs in for-loop occurrence probability?

Is the mismatch between belief and reality caused by the small size of the data set (a few million lines were measured, which was once considered to be a lot), or perhaps your beliefs are based on other languages which will behave as claimed (appropriate measurements on other languages most welcome).

The nesting depth dependent use probability plot shows a sudden change in the rate of decrease in for-loop probability; perhaps this is caused by the maximum number of characters that can appear on a typical editor line (within a window). The left plot (below) shows the number of lines (of C source) containing a given number of characters; the right plot counts tokens per line and the length effect is much less pronounced (perhaps developers use shorter identifiers in nested code). Note: different scales used for the x-axis (code+data).

Number of lines containing a given number of C tokens.

I don’t have any believable ideas for why the exponential fit only works if the first few nesting depths are ignored. What could be so special about early nesting depths?

What about fitting the data with other equations?

A bi-exponential springs to mind, with one exponential driven by application requirements and the other by algorithm selection; but reality is not on-board with this idea.

Ideas, suggestions, and data for other languages, most welcome.

Comparing expression usage in mathematics and C source

Why does a particular expression appear in source code?

One reason is that the expression is the coded form of a formula from the application domain, e.g., E=mc^2.

Another reason is that the expression calculates an algorithm/housekeeping related address, or offset, to where a value of interest is held.

Most people (including me, many years ago) think that the majority of source code expressions relate to the application domain, in one-way or another.

Work on a compiler related optimizer, and you will soon learn the truth; most expressions are simple and calculate addresses/offsets. Optimizing compilers would not have much to do, if they only relied on expressions from the application domain (my numbers tool throws something up every now and again).

What are the characteristics of application domain expression?

I like to think of them as being complicated, but that’s because it used to be in my interest for them to be complicated (I used to work on optimizers, which have the potential to make big savings if things are complicated).

Measurements of expressions in scientific papers is needed, but who is going to be interested in measuring the characteristics of mathematical expressions appearing in papers? I’m interested, but not enough to do the work. Then, a few weeks ago I discovered: An Analysis of Mathematical Expressions Used in Practice, by Clare So; an analysis of 20,000 mathematical papers submitted to arXiv between 2000 and 2004.

The following discussion uses the measurements made for my C book, as the representative source code (I keep suggesting that detailed measurements of other languages is needed, but nobody has jumped in and made them, yet).

The table below shows percentage occurrence of operators in expressions. Minus is much more common than plus in mathematical expressions, the opposite of C source; the ‘popularity’ of the relational operators is also reversed.

Operator  Mathematics   C source
=         0.39          3.08
-         0.35          0.19 
+         0.24          0.38
<=        0.06          0.04
>         0.041         0.11
<         0.037         0.22

The most common single binary operator expression in mathematics is n-1 (the data counts expressions using different variable names as different expressions; yes, n is the most popular variable name, and adding up other uses does not change relative frequency by much). In C source var+int_constant is around twice as common as var-int_constant

The plot below shows the percentage of expressions containing a given number of operators (I've made a big assumption about exactly what Clare So is counting; code+data). The operator count starts at two because that is where the count starts for the mathematics data. In C source, around 99% of expressions have less than two operators, so the simple case completely dominates.

Percentage of expressions containing a given number of operators.

For expressions containing between two and five operators, frequency of occurrence is sort of about the same in mathematics and C, with C frequency decreasing more rapidly. The data disagrees with me again...

2019 in the programming language standards’ world

Last Tuesday I was at the British Standards Institute for a meeting of IST/5, the committee responsible for programming language standards in the UK.

There has been progress on a few issues discussed last year, and one interesting point came up.

It is starting to look as if there might be another iteration of the Cobol Standard. A handful of people, in various countries, have started to nibble around the edges of various new (in the Cobol sense) features. No, the INCITS Cobol committee (the people who used to do all the heavy lifting) has not been reformed; the work now appears to be driven by people who cannot let go of their involvement in Cobol standards.

ISO/IEC 23360-1:2006, the ISO version of the Linux Base Standard, has been updated and we were asked for a UK position on the document being published. Abstain seemed to be the only sensible option.

Our WG20 representative reported that the ongoing debate over pile of poo emoji has crossed the chasm (he did not exactly phrase it like that). Vendors want to have the freedom to specify code-points for use with their own emoji, e.g., pineapple emoji. The heady days, of a few short years ago, when an encoding for all the world’s character symbols seemed possible, have become a distant memory (the number of unhandled logographs on ancient pots and clay tablets was declining rapidly). Who could have predicted that the dream of a complete encoding of the symbols used by all the world’s languages would be dashed by pile of poo emoji?

The interesting news is from WG9. The document intended to become the Ada20 standard was due to enter the voting process in June, i.e., the committee considered it done. At the end of April the main Ada compiler vendor asked for the schedule to be slipped by a year or two, to enable them to get some implementation experience with the new features; oops. I have been predicting that in the future language ‘standards’ will be decided by the main compiler vendors, and the future is finally starting to arrive. What is the incentive for the GNAT compiler people to pay any attention to proposals written by a bunch of non-customers (ok, some of them might work for customers)? One answer is that Ada users tend to be large bureaucratic organizations (e.g., the DOD), who like to follow standards, and might fund GNAT to implement the new document (perhaps this delay by GNAT is all about funding, or lack thereof).

Right on cue, C++ users have started to notice that C++20’s added support for a system header with the name version, which conflicts with much existing practice of using a file called version to contain versioning information; a problem if the header search path used the compiler includes a project’s top-level directory (which is where the versioning file version often sits). So the WG21 committee decides on what it thinks is a good idea, implementors implement it, and users complain; implementors now have a good reason to not follow a requirement in the standard, to keep users happy. Will WG21 be apologetic, or get all high and mighty; we will have to wait and see.

The Perils of Multi-Phase Construction

I’ve never really been a fan of C#’s object initializer syntax. Yes, it’s a little more convenient to write but it has a big downside which is it forces you to make your types mutable by default. Okay, that’s a bit strong, it doesn’t force you to do anything, but it does promote that way of thinking and allows people to take advantage of mutability outside the initialisation block [1].

This post is inspired by some buggy code I encountered where my suspicion is that the subtleties of the object initialisation syntax got lost along the way and partially constructed objects eventually found their way into the wild.

No Dragons Yet

The method, which was to get the next message from a message queue, was originally written something like this:

Message result = null;
RawMessage message = queue.Receive();

if (message != null)
{
  result = new Message
  {
    Priority = message.Priority,
    Type = GetHeader(message, “MessageType”),
    Body = message.Body, 
  };
}

return result;

This was effectively correct. I say “effectively correct” because it doesn’t contain the bug which came later but still relies on mutability which we know can be dangerous.

For example, what would happen if the GetHeader() method threw an exception? At the moment there is no error handling and so the exception propagates out the method and back up the stack. Because we make no effort to recover we let the caller decide what happens when a duff message comes in.

The Dragons Begin Circling

Presumably the behaviour when a malformed message arrived was undesirable because the method was changed slightly to include some recovery fairly soon after:

Message result = null;
RawMessage message = queue.Receive();

if (message != null)
{
  try
  {
    result = new Message
    {
      Priority = message.Priority,
      Type = GetHeader(message, “MessageType”),
      Body = message.Body,  
    };
  }
  catch (Exception e)
  {
    Log.Error(“Invalid message. Skipping.”);
  }
}

return result;

Still no bug yet, but that catch handler falling through to the return at the bottom is somewhat questionable; we are making the reader work hard to track what happens to result under the happy / sad paths to ensure it remains correct under further change.

Object Initialisation Syntax

Before showing the bug, here’s a brief refresher on how the object initialisation syntax works under the covers [2] in the context of our example code. Essentially it invokes the default constructor first and then performs assignments on the various other properties, e.g.

var __m = new Message();
__m.Priority = message.Priority;
__m.Type = GetHeader(message, “MessageType”);
__m.Body = message.Body,  
result = __m;

Notice how the compiler introduces a hidden temporary variable during the construction which it then assigns to the target at the end? This ensures that any exceptions during construction won’t create partially constructed objects that are bound to variables by accident. (This assumes you don’t use the constructor or property setter to attach itself to any global variables either.)

Hence, with respect to our example, if any part of the initialization fails then result will be left as null and therefore the message is indeed discarded and the caller gets a null reference back.

The Dragons Surface

Time passes and the code is then updated to support a new property which is also passed via a header. And then another, and another. However, being more complicated than a simple string value the logic to parse it is placed outside the object initialisation block, like this:

Message result = null;
RawMessage message = queue.Receive();

if (message != null)
{
  try
  {
    result = new Message
    {
      Priority = message.Priority,
      Type = GetHeader(message, “MessageType”),
      Body = message.Body,  
    };

    var str = GetHeader(message, “SomeIntValue”);
    if (str != null && TryParseInt(str, out var value))
      result.IntValue = value;

    // ... more of the same ...
  }
  catch (Exception e)
  {
    Log.Error(“Invalid message. Skipping.”);
  }
}

return result;

Now the problems start. With the latter header parsing code outside the initialisation block result is assigned a partially constructed object while the remaining parsing code runs. Any exceptions that occur [3] mean that result will be left only partially constructed and the caller will be returned the duff object because the exception handler falls out the bottom.

+1 for Tests

The reason I spotted the bug was because I was writing some tests around the code for a new header which also temporarily needed to be optional, like the others, to decouple the deployments. When running the tests there was an error displayed on the console output [4] telling me the message was being discarded, which I didn’t twig at first. It was when I added a retrospective test for the previous optional fields and I found my new one wasn’t be parsed correctly that I realised something funky was going on.

Alternatives

So, what’s the answer? Well, I can think of a number of approaches that would fix this particular code, ranging from small to large in terms of the amount of code that needs changing and our appetite for it.

Firstly we could avoid falling through in the exception handler and make it easier on the reader to comprehend what would be returned in the face of a parsing error:

catch (Exception e)  
{  
  Log.Error(“Invalid message. Skipping.”);
  return null;
}

Secondly we could reduce the scope of the result variable and return that at the end of the parsing block so it’s also clearer about what the happy path returns:

var result = new Message  
{  
  // . . .  
};

var str = GetHeader(message, “SomeIntValue”);
if (str != null && TryParseInt(str, out var value)
  result.IntValue = value;

return result;

We could also short circuit the original check too and remove the longer lived result variable altogether with:

RawMessage message = queue.Receive();

if (message == null)
    return null;

These are all quite simple changes which are also safe going forward should someone add more header values in the same way. Of course, if we were truly perverse and wanted to show how clever we were, we could fold the extra values back into the initialisation block by doing an Extract Function on the logic instead and leave the original dragons in place, e.g.

try
{  
  result = new Message  
  {  
    Priority = message.Priority,  
    Type = GetHeader(message, “MessageType”),  
    Body = message.Body,
    IntValue = GetIntHeader(message, “SomeIntValue”),
    // ... more of the same ...  
  };
}  
catch (Exception e)  
{  
  Log.Error(“Invalid message. Skipping.”);  
}

But we would never do that because the aim is to write code that helps stop people making these kinds of mistakes in the first place. If we want to be clever we should make it easier for the maintainers to fall into The Pit of Success.

Other Alternatives 

I said at the beginning that I was not a fan of mutability by default and therefore it would be remiss of me not to suggest that the entire Message type be made immutable and all properties set via the constructor instead:

result = new Message  
(  
  priority: message.Priority,  
  type: GetHeader(message, “MessageType”),  
  body: message.Body,
  IntValue: GetIntHeader(message, “SomeIntValue”),
  // ... more of the same ...  
);

Yes, adding a new property is a little more work but, as always, writing the tests to make sure it all works correctly will dominate here.

I would also prefer to see use of an Optional<> type instead of a null reference for signalling “no message” but that’s a different discussion.

Epilogue

While this bug was merely “theoretical” at the time I discovered it [5] it quickly came back to bite. A bug fix I made on the sending side got deployed before the receiving end and so the misleading error popped up in the logs after all.

Although the system appeared to be functioning correctly it had slowed down noticeably which we quickly discovered was down to the receiving process continually restarting. What I hadn’t twigged just from reading this nugget of code was that due to the catch handler falling through and passing the message on it was being acknowledged on the queue twice –– once in that catch handler, and again after processing it. This second acknowledgment attempt generated a fatal error that caused the process to restart. Deploying the fixed receiver code as well sorted the issue out.

Ironically the impetus for my blog post “Black Hole - The Fail Fast Anti-Pattern” way back in 2012 was also triggered by two-phase construction problems that caused a process to go into a nasty failure mode, but that time it processed messages much too quickly and stayed alive failing them all.

 

[1] Generally speaking the setting of multiple properties implies it’s multi-phase construction. The more common term Two-Phase Construction comes (I presume) from explicit constructor methods names like Initialise() or Create() which take multiple arguments, like the constructor, rather than setting properties one-by-one.

[2] This is based on my copy of The C# Programming Language: The Annotated Edition.

[3] When the header was missing it was passing a null byte[] reference into a UTF8 decoder which caused it to throw an ArgumentNullException.

[4] Internally it created a logger on-the-fly so it wasn’t an obvious dependency that initially needed mocking.

[5] It’s old, so possibly it did bite in the past but nobody knew why or it magically fixed itself when both ends where upgraded close enough together.

Modeling visual studio C++ compile times

Last week I spotted an interesting article on the compile-time performance of C++ compilers running under Microsoft Windows. The author had obviously put a lot of work into gathering the data, and had taken care to have multiple runs to reduce the impact of random effects (128 runs to be exact); but, as if often the case, the analysis of the data was lackluster. I posted a comment asking for the data, and a link was posted the next day :-)

The compilers benchmarked were: Visual Studio 2015, Visual Studio 2017 and clang 7.0.1; the compilers were configured to target: C++20, C++17, C++14, C++11, C++03, or C++98. The source code used was 100 system headers.

If we are interested in understanding the contribution of each component to overall compile-time, the obvious fist regression model to build is:

compile_time = header_x+compiler_y+language_z

where: header_x are the different headers, compiler_y the different compilers and language_z the different target languages. There might be some interaction between variables, so something more complicated was tried first; the final fitted model was (code+data):

compile_time = k+header_x+compiler_y+language_z+compiler_y*language_z

where k is a constant (the Intercept in R’s summary output). The following is a list of normalised numbers to plug into the equation (clang is the default compiler and C++03 the default language, and so do not appear in the list, the : symbol represents the multiplication; only a few of the 100 headers are listed, details are available):

                             Estimate Std. Error  t value Pr(>|t|)    
               (Intercept)                  headerany 
               1.000000000                0.051100398 
               headerarray             headerassert.h 
               0.522336397               -0.654056185 
...
            headerwctype.h            headerwindows.h 
              -0.648095154                1.304270250 
              compilerVS15               compilerVS17 
              -0.185795534               -0.114590143 
             languagec++11              languagec++14 
               0.032930014                0.156363433 
             languagec++17              languagec++20 
               0.192301727                0.184274629 
             languagec++98 compilerVS15:languagec++11 
               0.001149643               -0.058735591 
compilerVS17:languagec++11 compilerVS15:languagec++14 
              -0.038582437               -0.183708714 
compilerVS17:languagec++14 compilerVS15:languagec++17 
              -0.164031495                         NA 
compilerVS17:languagec++17 compilerVS15:languagec++20 
              -0.181591418                         NA 
compilerVS17:languagec++20 compilerVS15:languagec++98 
              -0.193587045                0.062414667 
compilerVS17:languagec++98 
               0.014558295 

As an example, the (normalised) time to compile wchar.h using VS15 with languagec++11 is:
1-0.514807638-0.183862162+0.033951731-0.059720131

Each component adds/substracts to/from the normalised mean.

Building this model didn’t take long. While waiting for the kettle to boil, I suddenly realised that an additive model was probably inappropriate for this problem; oops. Surely the contribution of each component was multiplicative, i.e., components have a percentage impact to performance.

A quick change to the form of the fitted model:

log(compile_time) = k+header_x+compiler_y+language_z+compiler_y*language_z

Taking the exponential of both side, the fitted equation becomes:

compile_time = e^{k}e^{header_x}e^{compiler_y}e^{language_z}e^{compiler_y*language_z}

The numbers, after taking the exponent, are:

               (Intercept)                  headerany 
              9.724619e+08               1.051756e+00 
...
            headerwctype.h            headerwindows.h 
              3.138361e-01               2.288970e+00 
              compilerVS15               compilerVS17 
              7.286951e-01               7.772886e-01 
             languagec++11              languagec++14 
              1.011743e+00               1.049049e+00 
             languagec++17              languagec++20 
              1.067557e+00               1.056677e+00 
             languagec++98 compilerVS15:languagec++11 
              1.003249e+00               9.735327e-01 
compilerVS17:languagec++11 compilerVS15:languagec++14 
              9.880285e-01               9.351416e-01 
compilerVS17:languagec++14 compilerVS15:languagec++17 
              9.501834e-01                         NA 
compilerVS17:languagec++17 compilerVS15:languagec++20 
              9.480678e-01                         NA 
compilerVS17:languagec++20 compilerVS15:languagec++98 
              9.402461e-01               1.058305e+00 
compilerVS17:languagec++98 
              1.001267e+00 

Taking the same example as above: wchar.h using VS15 with c++11. The compile-time (in cpu clock cycles) is:
9.724619e+08*3.138361e-01*7.286951e-01*1.011743e+00*9.735327e-01

Now each component causes a percentage change in the (mean) base value.

Both of these model explain over 90% of the variance in the data, but this is hardly surprising given they include so much detail.

In reality compile-time is driven by some combination of additive and multiplicative factors. Building a combined additive and multiplicative model is going to be like wrestling an octopus, and is left as an exercise for the reader :-)

Given a choice between these two models, I think the multiplicative model is probably closest to reality.

Guaranteed Copy Elision Does Not Elide Copies

This post is also available at the Microsoft Visual C++ Team Blog

C++17 merged in a paper called Guaranteed copy elision through simplified value categories. The changes mandate that no copies or moves take place in some situations where they were previously allowed, e.g.:

struct non_moveable {
    non_moveable() = default;
    non_moveable(non_moveable&&) = delete;
};
non_moveable make() { return {}; }
non_moveable x = make(); //compiles in C++17, error in C++11/14

You can see this behavior in compiler versions Visual Studio 2017 15.6, Clang 4, GCC 7, and above.

Despite the name of the paper and what you might read on the Internet, the new rules do not guarantee copy elision. Instead, the new value category rules are defined such that no copy exists in the first place. Understanding this nuance gives a deeper understanding of the current C++ object model, so I will explain the pre-C++17 rules, what changes were made, and how they solve real-world problems.

Value Categories

To understand the before-and-after, we first need to understand what value categories are (I’ll explain copy elision in the next section). Continuing the theme of C++ misnomers, value categories are not categories of values. They are characteristics of expressions. Every expression in C++ has one of three value categories: lvalue, prvalue (pure rvalue), or xvalue (eXpring value). There are then two parent categories: all lvalues and xvalues are glvalues, and all prvalues and xvalues are rvalues.

diagram expressing the taxonomy described above

For an explanation of what these are, we can look at the standard (C++17 [basic.lval]/1):

  • A glvalue ([generalised lvalue]) is an expression whose evaluation determines the identity of an object, bit-field, or function.
  • A prvalue is an expression whose evaluation initializes an object or a bit-field, or computes the value of an operand of an operator, as specified by the context in which it appears.
  • An xvalue is a glvalue that denotes an object or bit-field whose resources can be reused (usually because it is near the end of its lifetime).
  • An lvalue is a glvalue that is not an xvalue.
  • An rvalue is a prvalue or an xvalue.

Some examples:

std::string s; 
s //lvalue: identity of an object 
s + " cake" //prvalue: could perform initialization/compute a value 

std::string f(); 
std::string& g(); 
std::string&& h(); 

f() //prvalue: could perform initialization/compute a value 
g() //lvalue: identity of an object 
h() //xvalue: denotes an object whose resources can be reused 

struct foo { 
    std::string s; 
}; 

foo{}.s //xvalue: denotes an object whose resources can be reused

C++11

What are the properties of the expression std::string{"a pony"}?

It’s a prvalue. Its type is std::string. It has the value "a pony". It names a temporary.

That last one is the key point I want to talk about, and it’s the real difference between the C++11 rules and C++17. In C++11, std::string{"a pony"} does indeed name a temporary. From C++11 [class.temporary]/1:

Temporaries of class type are created in various contexts: binding a reference to a prvalue, returning a prvalue, a conversion that creates a prvalue, throwing an exception, entering a handler, and in some initializations.

Let’s look at how this interacts with this code:

struct copyable {
    copyable() = default;
    copyable(copyable const&) { /*...*/ }
};
copyable make() { return {}; }
copyable x = make();

make() results in a temporary. This temporary will be moved into x. Since copyable has no move constructor, this calls the copy constructor. However, this copy is unnecessary since the object constructed on the way out of make will never be used for anything else. The standard allows this copy to be elided by constructing the return value at the call-site rather than in make (C++11 [class.copy]). This is called copy elision.

The unfortunate part is this: even if all copies of the type are elided, the constructor still must exist.

This means that if we instead have:

struct non_moveable {
    non_moveable() = default;
    non_moveable(non_moveable&&) = delete;
};
non_moveable make() { return {}; }
auto x = make();

then we get a compiler error:

<source>(7): error C2280: 'non_moveable::non_moveable(non_moveable &&)': attempting to reference a deleted function
<source>(3): note: see declaration of 'non_moveable::non_moveable'
<source>(3): note: 'non_moveable::non_moveable(non_moveable &&)': function was explicitly deleted

Aside from returning non-moveable types by value, this presents other issues:

auto x = non_moveable{}; //compiler error
  • The language makes no guarantees that the constructors won’t be called (in practice this isn’t too much of a worry, but guarantees are more convincing than optional optimizations).
  • If we want to support some of these use-cases, we need to write copy/move constructors for types which they don’t make sense for (and do what? Throw? Abort? Linker error?)
  • You can’t pass non-moveable types to functions by value, in case you have some use-case which that would help with.

What’s the solution? Should the standard just say “oh, if you elide all copies, you don’t need those constructors”? Maybe, but then all this language about constructing temporaries is really a lie and building an intuition about the object model becomes even harder.

C++17

C++17 takes a different approach. Instead of guaranteeing that copies will be elided in these cases, it changes the rules such that the copies were never there in the first place. This is achieved through redefining when temporaries are created.

As noted in the value category descriptions earlier, prvalues exist for purposes of initialization. C++11 creates temporaries eagerly, eventually using them in an initialization and cleaning up copies after the fact. In C++17, the materialization of temporaries is deferred until the initialization is performed.

That’s a better name for this feature. Not guaranteed copy elision. Deferred temporary materialization.

Temporary materialization creates a temporary object from a prvalue, resulting in an xvalue. The most common places it occurs are when binding a reference to or performing member access on a prvalue. If a reference is bound to the prvalue, the materialized temporary’s lifetime is extended to that of the reference (this is unchanged from C++11, but worth repeating). If a prvalue initializes a class type of the same type as the prvalue, then the destination object is initialized directly; no temporary required.

Some examples:

struct foo {
    int i;
};

foo make();
auto& a = make();  //temporary materialized and lifetime-extended
auto&& b = make(); //ditto

foo{}.i //temporary materialized

auto c = make(); //no temporary materialized

That covers the most important points of the new rules. Now on to why this is actually useful past terminology bikeshedding and trivia to impress your friends.

Who cares?

I said at the start that understanding the new rules would grant a deeper understanding of the C++17 object model. I’d like to expand on that a bit.

The key point is that in C++11, prvalues are not “pure” in a sense. That is, the expression std::string{"a pony"} names some temporary std::string object with the contents "a pony". It’s not the pure notion of the list of characters “a pony”. It’s not the Platonic ideal of “a pony”.

In C++17, however, std::string{"a pony"} is the Platonic ideal of “a pony”. It’s not a real object in C++’s object model, it’s some elusive, amorphous idea which can be passed around your program, only being given form when initializing some result object, or materializing a temporary. C++17’s prvalues are purer prvalues.

If this all sounds a bit abstract, that’s okay, but internalising this idea will make it easier to reason about aspects of your program. Consider a simple example:

struct foo {};
auto x = foo{};

In the C++11 model, the prvalue foo{} creates a temporary which is used to move-construct x, but the move is likely elided by the compiler.

In the C++17 model, the prvalue foo{} initializes x.

A more complex example:

std::string a() {
    return "a pony";
}

std::string b() {
    return a();
}

int main() {
    auto x = b();
}

In the C++11 model, return "a pony"; initializes the temporary return object of a(), which move-constructs the temporary return object of b(), which move-constructs x. All the moves are likely elided by the compiler.

In the C++17 model, return "a pony"; initializes the result object of a(), which is the result object of b(), which is x.

In essence, rather than an initializer creating a series of temporaries which in theory move-construct a chain of return objects, the initializer is teleported to the eventual result object. In C++17, the code:

T a() { return /* expression */ ; }
auto x = a();

is identical to auto x = /* expression */;. For any T.

Closing

The “guaranteed copy elision” rules do not guarantee copy elision; instead they purify prvalues such that the copy doesn’t exist in the first place. Next time you hear or read about “guaranteed copy elision”, think instead about deferred temporary materialization.

The first commercially available (claimed) verified compiler

Yesterday, I read a paper containing a new claim by some of those involved with CompCert (yes, they of soap powder advertising fame): “CompCert is the first commercially available optimizing compiler that is formally verified, using machine assisted mathematical proofs, to be exempt from miscompilation”.

First commercially available; really? Surely there are earlier claims of verified compilers being commercial availability. Note, I’m saying claims; bits of the CompCert compiler have involved mathematical proofs (i.e., code generation), so I’m considering earlier claims having at least the level of intellectual honesty used in some CompCert papers (a very low bar).

What does commercially available mean? The CompCert system is open source (but is not free software), so I guess it’s commercially available via free downloading licensing from AbsInt (the paper does not define the term).

Computational Logic, Inc is the name that springs to mind, when thinking of commercial and formal verification. They were active from 1983 to 1997, and published some very interesting technical reports about their work (sadly there are gaps in the archive). One project was A Mechanically Verified Code Generator (in 1989) and their Gypsy system (a Pascal-like language+IDE) provided an environment for doing proofs of programs (I cannot find any reports online). Piton was a high-level assembler and there was a mechanically verified implementation (in 1988).

There is the Danish work on the formal specification of the code generators for their Ada compiler (while there was a formal specification of the Ada semantics in VDM, code generators tend to be much simpler beasts, i.e., a lot less work is needed in formal verification). The paper I have is: “Retargeting and rehosting the DDC Ada compiler system: A case study – the Honeywell DPS 6″ by Clemmensen, from 1986 (cannot find an online copy). This Ada compiler was used by various hardware manufacturers, so it was definitely commercially available for (lots of) money.

Are then there any earlier verified compilers with a commercial connection? There is A PRACTICAL FORMAL SEMANTIC DEFINITION AND VERIFICATION SYSTEM FOR TYPED LISP, from 1976, which has “… has proved a number of interesting, non-trivial theorems including the total correctness of an algorithm which sorts by successive merging, the total correctness of the McCarthy-Painter compiler for expressions, …” (which sounds like a code generator, or part of one, to me).

Francis Morris’s thesis, from 1972, proves the correctness of compilers for three languages (each language contained a single feature) and discusses how these features may be combined into a more “realistic” language. No mention of commercial availability, but I cannot see the demand being that great.

The definition of PL/1 was written in VDM, a formal language. PL/1 is a huge language and there were lots of subsets. Were there any claims of formal verification of a subset compiler for PL/1? I have had little contact with the PL/1 world, so am not in a good position to know. Anybody?

Over to you dear reader. Are there any earlier claims of verified compilers and commercial availability?

Adding a new scalar type to C

I think the time has arrived for a new scalar type in C, which for want of a better name I shall call the compendium type.

On today’s processors a compendium type behaves a lot like an integer type, except that nobody really wants to include it in the list of supported integer types, e.g., 128-bit scalars.

Why is a new scalar type needed? The Standard supports extended integer types, why not treat a scalar object that supports integer arithmetic as an integer type?

The C Standard says (section 6.2.5 Types):
“There are five standard signed integer types, designated as signed char, short int, int, long int, and long long int. (These and other types may be designated in several additional ways, as described in 6.7.2.) There may also be implementation-defined extended signed integer types.38) The standard and extended signed integer types are collectively called signed integer types.39)”

There is corresponding wording for unsigned integer types.

The standard header allows implementations to define a whole menagerie of integer types: section 7.20.1.1 Exact-width integer types
“The typedef name intN_t designates a signed integer type with width N, no padding bits, and a two’s complement representation. Thus, int8_t denotes such a signed integer type with a width of exactly 8 bits.”

This all sounds very feasible, but there is a catch. The Standard defines a greatest-width integer type, section 7.20.1.5 Greatest-width integer types
“The following type designates a signed integer type capable of representing any value of any signed integer type:
intmax_t

and various library functions have an argument type intmax_t (there is also an uintmax_t).

An ‘extra-large’ integer type is not something that can just sit there, in the list of available integer types, waiting to be used. Preprocessor arithmetic and a variety of library are based around the type intmax_t. An extra-large integer type would have a very visible impact on all developers, many of whom would want to ignore it.

GCC supports 128-bit integers, e.g., __int128. But some magic pixie dust is involved, this type has no connection with intmax_t.

What do developers do with these 128- and 256-bit scalar objects? Evaluating graphics algorithms, hashes and cryptographic calculations are obvious candidates; yes, perhaps even calculations involving integers that require this many bits. I have not seen any analysis of the uses of this kind of wide-integer-like type.

Extra-wide scalar types have a variety of uses and the term compendium type, captures this. Hardware support for such extra-width types is growing, with vendors looking to fill major niches.

Contorting existing wording, in the Standard, so accommodate these extra-wide types within the existing integer type machinery is a short term solution. Work on the upcoming revision of the C Standard should either do nothing and allow vendors to take the approach currently used by GCC, or create a new scalar type (perhaps using a TR).

Spaceship Operator

You write a class. It has a bunch of member data. At some point, you realise that you need to be able to compare objects of this type. You sigh and resign yourself to writing six operator overloads for every type of comparison you need to make. Afterwards your fingers ache and your previously clean code is lost in a sea of functions which do essentially the same thing. If this sounds familiar, then C++20’s spaceship operator is for you. This post will look at how the spaceship operator allows you to describe the strength of relations, write your own overloads, have them be automatically generated, and how correct, efficient two-way comparisons are automatically rewritten to use them.

Relation strength

The spaceship operator looks like <=> and its official C++ name is the “three-way comparison operator”. It is so-called, because it is used by comparing two objects, then comparing that result to 0, like so:

(a <=> b) < 0  //true if a < b
(a <=> b) > 0  //true if a > b
(a <=> b) == 0 //true if a is equal/equivalent to b

One might think that <=> will simply return -1, 0, or 1, similar to strcmp. This being C++, the reality is a fair bit more complex, but it’s also substantially more powerful.

Not all equality relations were created equal. C++’s spaceship operator doesn’t just let you express orderings and equality between objects, but also the characteristics of those relations. Let’s look at two examples.

Example 1: Ordering rectangles

We have a rectangle class and want to define comparison operators for it based on its size. But what does it mean for one rectangle to be smaller than another? Clearly if one rectangle is 1cm by 1cm, it is smaller than one which is 10cm by 10cm. But what about one rectangle which is 1cm by 5cm and another which is 5cm by 1cm? These are clearly not equal, but they’re also not less than or greater than each other. But speaking practically, having all of <, <=, ==, >, and >= return false for this case is not particularly useful, and it breaks some common assumptions at these operators, such as (a == b || a < b || a > b) == true. Instead, we can say that == in this case models equivalence rather than true equality. This is known as a weak ordering.

Example 2: Ordering squares

Similar to above, we have a square type which we want to define comparison operators for with respect to size. In this case, we don’t have the issue of two objects being equivalent, but not equal: if two squares have the same area, then they are equal. This means that == models equality rather than equivalence. This is known as a strong ordering.

Describing relations

Three-way comparisons allow you express the strength of your relation, and whether it allows just equality or also ordering. This is achieved through the return type of operator<=>. Five types are provided, and stronger relations can implicitly convert to weaker ones, like so:

relation conversion

Strong and weak orderings are as described in the above examples. Strong and weak equality means that only == and != are valid. Partial ordering is weaker than weak_ordering in that it also allows unordered values, like NaN for floats.

These types have a number of uses. Firstly they indicate to users of the types what kind of relation is modelled by the comparisons, so that their behaviour is more clear. Secondly, algorithms could potentially have more efficient specializations for when the orderings are stronger, e.g. if two strongly-ordered objects are equal, calling a function on one of them will give the same result as the other, so it would only need to be carried out once. Finally, they can be used in defining language features, such as class type non-type template parameters, which require the type to have strong equality so that equal values always name the same template instantiation.

Writing your own three-way comparison

You can provide a three-way comparison operator for your type in the usual way, by writing an operator overload:

struct foo {
  int i;

  std::strong_ordering operator<=> (foo const& rhs) {
    return i <=> rhs.i;
  }
};

Note that whereas two-way comparisons should be non-member functions so that implicit conversions are done on both sides of the operator, this is not necessary for operator<=>; we can make it a member and it’ll do the right thing.

Sometimes you may find that you want to do <=> on types which don’t have an operator<=> defined. The compiler won’t try to work out a definition for <=> based on the other operators, but you can use std::compare_3way instead, which will fall back on two-way comparisons if there is no three-way version available. For example, we could write a three-way comparison operator for a pair type like so:

template<class T, class U>
struct pair {
  T t;
  U u;

  auto operator<=> (pair const& rhs) const
    -> std::common_comparison_category_t<
         decltype(std::compare3_way(t, rhs.t)),
         decltype(std::compare3_way(u, rhs.u)> {
    if (auto cmp = std::compare3_way(t, rhs.t); cmp != 0) return cmp;
    return std::compare3_way(u, rhs.u);
  }

std::common_comparison_category_t there determines the weakest relation given a number of them. E.g. std::common_comparison_category_t<std::strong_ordering, std::partial_ordering> is std::partial_ordering.

If you found the previous example a bit too complex, you might be glad to know that C++20 will support automatic generation of comparison operators. All we need to do is =default our operator<=>:

auto operator<=>(x const&) = default;

Simple! This will carry out a lexicographic comparison for each base, then member of the type, in order of declaration.

Automatically-rewritten two-way comparisons

Although <=> is very powerful, most of the time we just want to know if some object is less than another, or equal to another. To facilitate this, two-way comparisons like < can be rewritten by the compiler to use <=> if a better match is not found.

The basic idea is that for some operator @, an expression a @ b can be rewritten as a <=> b @ 0. It can even be rewritten as 0 @ b <=> a if that is a better match, which means we get symmetry for free.

In some cases, this can actually provide a performance benefit. Quite often comparison operators are implemented by writing == and <, then writing the other operators in terms of those rather than duplicating the code. This can lead to situations where we want to check <= and end up doing an expensive comparison twice. This automatic rewriting can avoid that cost, since it will only call the one operator<=> rather than both operator< and operator==.

Conclusion

The spaceship operator is a very welcome addition to C++. It gives us more expressiveness in how we define our relations, lets us write less code to define them (sometimes even just a defaulted declaration), and avoids some of the performance pitfalls of manually implementing some comparison operators in terms of others.

For more details, have a look at the cppreference articles for comparison operators and the compare header, and Herb Sutter’s standards proposal. For a good example on how to write a spaceship operator, see Barry Revzin’s post on implementing it for std::optional. For more information on the mathematics of ordering with a C++ slant, see Jonathan Müller’s blog post series on the mathematics behind comparison.

Evolutionary pressures on C++, Java and Python

The future evolution of C++, Java and Python is being driven by very different interested parties, and it’s going to be interesting watching events unfold over the next 5-10 years.

I have previously written about how the C++ Standard’s committee is past its sell-by date, has taken off its ball and chain and is now in the hands of bored consultants.

Bjarne Stroustrup was once effectively treated as C++’s Benevolent Dictator For Life (during the production of the first C++ Standard some people were labeled as Bjarne groupees); things have moved on since then, but the ‘old-guard’ are trying to make a comeback. Suggesting that people ought to base their thinking on a book published almost 25-years ago (Stroustrup’s “The Design and Evolution of C++”; a very interesting book that is well worth reading) creates a rather backward looking image. Bored consultants are looking to work on exciting new ideas. The old-guard need to appear modern to attract followers (even if the ideas are old ideas with a fresh coat of paint).

The threat to C++ is from bored consultants, each adding their own pet idea to the language standard; a situation that Stroustrup thinks is starting to happen.

Java, the language, is owned by Oracle, the company (let’s not get too involved in exactly what they own, have copyright on, etc). Oracle are not shy about asking people for licensing fees. Java is now on a 6-month release cycle (at least the Oracle version, there are Open Source implementations) and the free support only applies to the current release; paying a license fee buys support for versions older than 6-months. In the short term, the cheapest solution is for companies to pay for support.

Oracle are always happy to send in the lawyers and if too many customers switch to non-Oracle implementations, I’m sure something can be found to introduce enough uncertainty to discourage work/distribution involving Open Source Java implementations.

Will Java survive Oracle’s licensing? It is not in their interest for Java to die; Oracle will adjust their terms to keep the money flowing in, but over the longer term I think willing Java developers are going to be hard to find.

Guido van Rossum recently removed himself from the post of Python’s Benevolent Dictator For Life. One of the jobs of a benevolent dictator is maintaining some degree of language coherence, which involves preventing people’s pet ideas from being added to the language. Does this mean that Python is slowly going to be become more and more bloated? Perhaps, but I think a more likely problem is a language fork, multiple implementations of slightly different (at first) languages all claiming to be Python.

These days, the strength of Python is its large collection of very useful, commercial grade, packages, and future language details may turn out to be irrelevant. There is a lot to learn from the Python 2/3 transition, but true believers like to think that things will turn out differently for them.

Super Simple Named Boolean Parameters

Quite often we come across interfaces with multiple boolean parameters, like this:

cake make_cake (bool with_dairy, bool chocolate_sauce, bool poison);

A call to this function might look like:

auto c = make_cake(true, true, false);

Unfortunately, this is not very descriptive. We need to look at the declaration of the function to know what each of those bools mean, and it would be way too easy to accidentally flip the wrong argument during maintainance and end up poisoning ourselves rather than having a chocolatey treat.

There are many solutions which people use, from just adding comments to each argument, to creating tagged types. I recently came across a very simple trick which I had not seen before:

#define K(name) true

Yep, it’s a macro which takes an argument and gets replaced by true. This may look strange, but it enables us to write this:

auto c = make_cake(K(with_dairy), !K(chocolate_sauce), !K(poison));

// or using `not` if you prefer
auto c = make_cake(K(with_dairy), not K(chocolate_sauce), not K(poison));

Of course, it’s up to you whether you think using a macro for this kind of thing is worth it, or if you think that macro should have a name which is less likely to collide. There’s also the issue that this looks a bit like Python named parameters, where you can pass them in any order, but the order matters here. There are much better solutions when you own the interface, but for the times when you don’t, I think this is a neat trick to solve the issue in a low-overhead manner.

A guess I need to come up with a marketing name for this. Since we already have X Macros, I hereby dub this trick “K Macros”.

Function Template Partial Ordering: Worked Examples

C++ function overloading rules are complex. C++ template rules are complex. Put the two together, and you unfortunately do not get something simple; you get a hideous monster of standardese which requires great patience and knowledge to overcome. However, since C++ is mostly corner-cases, it can pay to understand how the rules apply for those times where you just can’t work out why your code won’t compile. This post will present a few step-by-step examples of how partial ordering of function templates works in order to arm you for these times of need.

Partial ordering of function templates is a step of overload resolution. It occurs when you call a function template which is overloaded and the compiler needs to decide which one is more specialized than the other. Consider this code:

template<class T> void f(T);        //(1)
template<class T> void f(T const*); //(2)

int const* p = nullptr;
f(p);

We expect f(p) to call (2), because p is a int const*. In order to decide that (2) is more specialized than (1), the compiler needs to follow the function template partial ordering rules. Let’s see what the standard has to say.

Partial ordering selects which of two function templates is more specialized than the other by transforming each template in turn (see next paragraph) and performing template argument deduction using the function type.

This is not so complicated, even if the terms may be unfamiliar. Unfortunately, the “next paragraph” and the sections which it references are almost impossible to parse without a lot of background knowledge and re-readings, so I shall step through the algorithm rather than pasting the rules for you to cry over.

There are four steps which we to have work through:

  1. Transforming (1).
  2. Performing deduction on (2) with the transformed template from step 1.
  3. Transforming (2).
  4. Performing deduction on (1) with the transformed template from step 3.

If one and only one of the deductions succeeds, then the template with which the deduction was performed is more specialized than the other.

Step 1: The rules state that for each template parameter, we create some unique type to use in its stead1. Let’s call this unique type type_0. You can pretend that this was defined somewhere like class type_0{};. Now we take our function template template <class T> void f(T) and substitute in type_0 for T. This gives us void f(type_0). The transformation is complete.

Step 2: Now that we have transformed template <class T> void f(T) into void f(type_0), we will perform deduction on (2) using the transformed function type. To do this, we imagine a call to (2) where the arguments have the type of the parameters for (1). Concretely, it would look like this:

template <class T> void func_2(T const*);
func_2(type_0{}); //derived from void f(type_0)

Would this call succeed? We can put it into our compiler to find out. GCC 8.1 says:

<source>: In function 'int main()':
<source>:4:18: error: no matching function for call to 'func_2(type_0)'
   func_2(type_0{});
                  ^
<source>:1:25: note: candidate: 'template<class T> void func_2(const T*)'
 template <class T> void func_2(T const*);
                         ^~~~~~
<source>:1:25: note:   template argument deduction/substitution failed:
<source>:4:18: note:   mismatched types 'const T*' and 'type_0'
   func_2(type_0{});
                  ^   

So deduction from (1) to (2) fails, because the invented type type_0 cannot be used to deduce const T*.

Step 3: Let’s try from (2) to (1). Again, we’ll transform (2) from template <class T> void f(T const*) to void f(type_0 const*).

Step 4: Now we attempt deduction:

template <class T> void func_1(T);
type_0 const* arg = nullptr;
func_1(arg);

This succeeds because a type_0 const* can be used to deduce T. Since deduction from (1) to (2) fails, but deduction from (2) to (1) succeeds, (2) is more specialized than (1) and will be chosen by overload resolution.


Let’s try a different example. How about:

template<class T> void g(T);  //(1)
template<class T> void g(T&); //(2)
int i = 0;
g(i);

(1) transforms to void g(type_0). Before we try deduction, we need to apply one of the numerous additional rules from the standard, which says we need to replace references with the type being referred to. So template <class T> void g(T&) becomes template <class T> void g(T). Deduction time:

template<class T> void func_2(T);
func_2(type_0{});

This succeeds.

Now the other direction. template<class T> void g(T&) transforms to void g(type_0&), then we remove the reference to get void g(type_0). Our second deduction:

template<class T> void func_1(T);
func_1(type_0{});

This is effectively identical to the previous one, so of course it succeeds.

Since deduction succeeded in both directions, the call is ambiguous. Sure enough, GCC diagnoses:

<source>: In function 'int main()':
<source>:5:8: error: call of overloaded 'g(int&)' is ambiguous
     g(i);
        ^
<source>:1:24: note: candidate: 'void g(T) [with T = int]'
 template<class T> void g(T);  //(1)
                        ^
<source>:2:24: note: candidate: 'void g(T&) [with T = int]'
 template<class T> void g(T&); //(2)
                        ^ 

This is why the algorithm is a partial ordering: sometimes two function templates are not ordered.


I’ll give one more example. This one has multiple parameters and is a bit more subtle.

template<class T>struct identity { using type = T; };
template<class T>struct A{};

template<class T, class U> void h(typename identity<T>::type, U); //(1)
template<class T, class U> void h(T, A<U>);                       //(2)
h<int>(0,A<void>{});

identity here just evaluates to its template argument, but the important thing to note is that typename identity<T>::type is a non-deduced context, so T cannot be deduced from the argument for that parameter.

(1) transforms to void h(typename identity<type_0>::type, type_0), which is void h(type_0, type_0). Attempt deduction on (2):

template<class T, class U> void func_2(T, A<U>);
func_2(type_0{}, type_0{});

This fails because we can’t match type_0 against A<U>.

(2) transforms to void h(type_0, A<type_1>). Try deduction against (1):

template<class T, class U> void func_1(typename identity<T>::type, U);
func_1(type_0{}, A<type_1>{});

This fails because typename identity<T>::type is a non-deduced context, so we can’t deduce T.

In the example from the last section deduction succeeded both ways so the call was ambiguous. In this example, deduction fails both ways, which is also an ambiguous call.


That’s the last of the examples. Of course, there are a bunch of rules which I didn’t cover here, like Concepts, parameter packs, non-type/template template parameters, and cases where both the argument and parameter types are references. Hopefully you now have enough of an intuition that you can understand what the standard says when you inevitably hit those corner cases. If you have any partial ordering conundrums, drop them down in the comments below or send them to me on Twitter.


  1. I’ll ignore non-type template parameters and template template parameters for simplicity, but the rules are essentially the same. 

std::accumulate vs. std::reduce

std::accumulate has been a part of the standard library since C++98. It provides a way to fold a binary operation (such as addition) over an iterator range, resulting in a single value. std::reduce was added in C++17 and looks remarkably similar. This post will explain the difference between the two and when to use one or the other.

Let’s start by looking at their interfaces, beginning with std::accumulate.

template< class InputIt, class T >
T accumulate( InputIt first, InputIt last, T init );

template< class InputIt, class T, class BinaryOperation >
T accumulate( InputIt first, InputIt last, T init,
              BinaryOperation op );

std::accumulate takes an iterator range and an initial value for the accumulation. You can optionally give it a binary operation to do the reduction, which will default to addition. It will call this operation on the initial value and the first element of the range, then on the result and the second element of the range, etc. Here are two equivalent calls:

auto sum1 = std::accumulate(begin(vec), end(vec), 0);
auto sum2 = std::accumulate(begin(vec), end(vec), 0, std::plus<>{});

std::reduce has a fair few more overloads to get your head round, but has a very similar interface once you understand them:

template<class InputIt>
typename std::iterator_traits<InputIt>::value_type reduce(
    InputIt first, InputIt last);

template<class ExecutionPolicy, class ForwardIt>
typename std::iterator_traits<ForwardIt>::value_type reduce(
    ExecutionPolicy&& policy,
    ForwardIt first, ForwardIt last);

template<class InputIt, class T>
T reduce(InputIt first, InputIt last, T init);

template<class ExecutionPolicy, class ForwardIt, class T>
T reduce(ExecutionPolicy&& policy,
         ForwardIt first, ForwardIt last, T init);

template<class InputIt, class T, class BinaryOp>
T reduce(InputIt first, InputIt last, T init, BinaryOp binary_op);

template<class ExecutionPolicy, class ForwardIt, class T, class BinaryOp>
T reduce(ExecutionPolicy&& policy,
         ForwardIt first, ForwardIt last, T init, BinaryOp binary_op);

The differences here are:

I’ll talk about these points in turn.

Execution policies

Execution policies are a C++17 feature which allows programmers to ask for algorithms to be parallelised. There are three execution policies in C++17:

  • std::execution::seq – do not parallelise
  • std::execution::par – parallelise
  • std::execution::par_unseq – parallelise and vectorise (requires that the operation can be interleaved, so no acquiring mutexes and such)

The idea behind execution policies is that you can change a serial algorithm to a parallel algorithm simply by passing an additional argument to the function:

auto sum1 = std::reduce(begin(vec), end(vec));                      //sequential
auto sum2 = std::reduce(std::execution::seq, begin(vec), end(vec)); //sequential
auto sum3 = std::reduce(std::execution::par, begin(vec), end(vec)); //parallel

Allowing parallelisation is the main reason for the addition of std::reduce. Let’s look at an example where we want to sum up all the elements in an array. With std::accumulate it looks like this:

accumulate plus

Note that each step of the computation relies on the previous computation, i.e. this algorithm will execute serially and we make no use of hardware parallel processing capabilities. If we use std::reduce with the std::execution::par policy then it could look like this:

reduce plus

This is a trivial amount of data for processing in parallel, but the benefit gained when the data size is scaled up should be clear: some of the operations can be executed independently of others, so they can be done in parallel.

A common question is: why do we need an entirely new algorithm for this? Why can’t we just overload std::accumulate? For an example of why we can’t do this, let’s use std::minus<> instead of std::plus<> as our reduction operation. With std::accumulate we get this:

accumulate minus

However, if we try to use std::reduce, we could get something like:

reduce minus

Uh oh. We just broke our code.

We got the wrong answer because of the mathematical properties of subtraction. You can’t arbitrarily reorder the operands, or compute the operations out of order when doing subtraction. This is formalised in the properties of commutativity and associativity.

A binary operation ∗ on a set S is associative if the following equation holds for all x, y, and z in S:

(x ∗ y) ∗ z = x ∗ (y ∗ z)

An operation is commutative if:

x ∗ y = y ∗ x

Associativity lets the algorithm compute reduction steps on arbitrary adjacent pairs of elements. Commutativity allows carrying out the operation on intermediate results in whatever order they are produced in, rather than having to preserve the original ordering. Some people (e.g. here and here) find that commutativity is too strong a requirement on std::reduce, because it denies use of common fold operations like string concatenation, which is associative but not commutative. I agree that it’s a shame we don’t have a step between std::accumulate and std::reduce which only requires associativity, but maybe in the future!

We can understand how associativity and commutativity affect our algorithm as much as we want, but there’s no way for the compiler to reliably check this. As such, we’re stuck with having std::reduce as a separate algorithm. In concepts speak, these are called axioms1: requirements imposed on semantics which cannot generally be statically verified.

Input vs. Forward Iterators

The forward iterator overloads allow the implementation to chunk up the data and dispatch these subranges to different threads. The idea is that input iterators are single-pass, whereas forward iterators can be iterated through multiple times. An algorithm couldn’t chunk up data indexed by input iterators because by the time it had gone through the range to work out the sub-range boundaries, the iterators will have been invalidated and can’t be passed on.

For more information on iterator types and parallel algorithms, see p0467r2.

Optional Initial Element

std::reduce lets you not bother passing an initial element, in which case it will default-construct one using typename std::iterator_traits<InputIt>::value_type{}. I think this was a mistake. A default-constructed value is not always the identity element (such as in multiplication), and it’s very easy to for a programmer to miss out the initial element. The code will still compile, it will just give the wrong answer. I suspect that this choice will result in some hard-to-find bugs when this interface comes into heavier use.

Finishing Up

That covers the differences between std::reduce and std::accumulate. My three point guide to std::reduce is:

  • Use std::reduce when you want your accumulation to run in parallel
  • Ensure that the operation you want to use is both associative and commutative
  • Remember that the default initial value is produced by default construction, and that this may not be correct for your operation

Now you know how and when to use std::reduce over std::accumulate. More generally, the differences show some of the technical aspects you need to consider when parallelising any kind of algorithm. Keep in mind how your operations act with respect to common mathematical properties and you might save yourself some debugging down the line.

Acknowledgements

Thanks to Christopher Di Bella for reviewing this post and linking me to p0467r2. Thanks to Ben Steffan, Ben Deane, and TemplateRex for discussion about commutativity.


  1. For more about axioms and algorithms, see this post by Christopher Di Bella

The C++ committee has taken off its ball and chain

A step change in the approach to updates and additions to the C++ Standard occurred at the recent WG21 meeting, or rather a change that has been kind of going on for a few meetings has been documented and discussed. Two bullet points at the start of “C++ Stability, Velocity, and Deployment Plans [R2]”, grab reader’s attention:

● Is C++ a language of exciting new features?
● Is C++ a language known for great stability over a long period?

followed by the proposal (which was agreed at the meeting): “The Committee should be willing to consider the design / quality of proposals even if they may cause a change in behavior or failure to compile for existing code.”

We have had 30 years of C++/C compatibility (ok, there have been some nibbling around the edges over the last 15 years). A remarkable achievement, thanks to Bjarne Stroustrup over 30+ years and 64 full-week standards’ meetings (also, Tom Plum and Bill Plauger were engaged in shuttle diplomacy between WG14 and WG21).

The C/C++ superset/different issue has a long history.

In the late 1980s SC22 (the top-level ISO committee for programming languages) asked WG14 (the C committee) whether a standard should be created for C++, and if so did WG14 want to create it. WG14 considered the matter at its April 1989 meeting, and replied that in its view a standard for C++ was worth considering, but that the C committee were not the people to do it.

In 1990, SC22 started a study group to look into whether a working group for C++ should be created and in the U.S. X3 (the ANSI committee responsible for Information processing systems) set up X3J16. The showdown meeting of what would become WG21, was held in London, March 1992 (the only ISO C++ meeting I have attended).

The X3J16 people were in London for the ISO meeting, which was heated at times. The two public positions were: 1) work should start on a standard for C++, 2) C++ was not yet mature enough for work to start on a standard.

The, not so public, reason given for wanting to start work on a standard was to stop, or at least slow down, changes to the language. New releases, rumored and/or actual, of Cfront were frequent (in a pre-Internet time sense). Writing large applications in a version of C++ that was replaced with something sightly different six months later was has developers in large companies pulling their hair out.

You might have thought that compiler vendors would be happy for the language to be changing on a regular basis; changes provide an incentive for users to pay for compiler upgrades. In practice the changes were so significant that major rework was needed by somebody who knew what they were doing, i.e., expensive people had to be paid; vendors were more used to putting effort into marketing minor updates. It was claimed that implementing a C++ compiler required seven times the effort of implementing a C compiler. I have no idea how true this claim might have been (it might have been one vendor’s approximate experience). In the 1980s everybody and his dog had their own C compiler and most of those who had tried, had run into a brick wall trying to implement a C++ compiler.

The stop/slow down changing C++ vs. let C++ “fulfill its destiny” (a rallying call from the AT&T rep, which the whole room cheered) finally got voted on; the study group became a WG (I cannot tell you the numbers; the meeting minutes are not online and I cannot find a paper copy {we had those until the mid/late-90s}).

The creation of WG21 did not have the intended effect (slowing down changes to the language); Stroustrup joined the committee and C++ evolution continued apace. However, from the developers’ perspective language change did slow down; Cfront changes stopped because its code was collapsing under its own evolutionary weight and usable C++ compilers became available from other vendors (in the early days, Zortech C++ was a major boost to the spread of usage).

The last WG21 meeting had 140 people on the attendance list; they were not all bored consultants looking for a creative outlet (i.e., exciting new features), but I’m sure many would be happy to drop the ball-and-chain (otherwise known as C compatibility).

I think there will be lots of proposals that will break C compatibility in one way or another and some will it into a published standard. The claim will be that the changes will make life easier for future C++ developers (a claim made by proponents of every language, for which there is zero empirical evidence). The only way of finding out whether a change has long term benefit is to wait a long time and see what happens.

The interesting question is how C++ compiler vendors will react to breaking changes in the language standard. There are not many production compilers out there these days, i.e., not a lot of competition. What incentive does a compiler vendor have to release a version of their compiler that will likely break existing code? Compiler validation, against a standard, is now history.

If WG21 make too many breaking changes, they could find C++ vendors ignoring them and developers asking whether the ISO C++ standards’ committee is past its sell by date.

East End Functions

There has been a recent stirring of attention, in the C++ community, for the practice of always placing the const modifier to the right of the thing it modifies. The practice has even been gifted a catchy name: East Const (which, I think, is what has stirred up the interest).

As purely a matter of style it's fascinating that it seems to have split the community so strongly! There are cases for and against, but both sides seem to revolve around the idea of "consistency". For the East Const believers the consistency is in the sense that you can always apply one, simple, rule about what const means and where it goes. For the West Consters the consistency is with the majority of existing code out there - as well as the Core Guidelines recommendation!

Personally I've been an East Const advocate for many years (although not by that name, of course) - and converted the entire Catch codebase over to East Const quite early on.

But there's another style choice that I've not seen discussed quite as much, but has a number of parallels.

As with East vs West Const this is purely a matter of style (it doesn't change what the compiler generates), and one of the arguments in favour is consistency across application (there are some cases where you must do it this way) - but the main argument against is also consistency - with most existing code. Sound familiar? But what is it?

The issue is about where to specify return types on function signatures. For most of C++'s history the only choice has been to write the type before the name of the function (along with any modifiers). But since C++11 we've been able to write the type at the end of the function signature (after a -> - and the function must be prefixed with the keyword auto).

auto someFunc( int i ) -> std::string;
// instead of
std::string someFunc( int i );

So why would you prefer this style? Well, first there's that consistency argument. It's the only way to specify return types for lambdas. You're also required to use trailing return types if the type is a decltype that is dependent on the name of one of the function's arguments. Indeed, that's the motivating case for adding the syntax in the first place. e.g.:

template <typename Lhs, typename Rhs>
auto add( Lhs const& lhs, Rhs const& rhs ) -> decltype( lhs + rhs ) {
    return lhs + rhs;
}

A Foolish Consistency?

Given those cases where it is required, using the same syntax in all other cases would seem to be more consistent.

I'm not sure the consistency argument is as strong here as it is with East Const - there was never much confusion over what the return applied to, after all. But I think it's worth keeping in mind.

The next reason for is consistency with other languages. Many languages, especially functional programming languages, exclusively use the trailing syntax for return types. Quite a few, e.g. Swift, use the same -> syntax.

It's not a strong reason on its own, but combined with the internal consistency argument I think there's something there.

However, for me at least, the most compelling rationale is for readability. Why do I think it's more readable? There are actually two parts to this:

  1. Function declarations tend to line up. Certain qualifiers might spoil this effect, although one approach might be to group similarly qualified functions (e.g. all virtuals) together. This makes glancing through the list of function names much easier.

  2. The name of the function is usually the most important thing when you're browsing the code. If you're more interested in the return type it's usually because you already know which function you're interested in. So making the name the first thing you read (after the auto introducer) seems fitting.

auto doesItBlend() -> bool;
auto whatsYourFavouriteNumber() -> int;
auto add( double a, double b ) -> double;
void setTheControls();

(note that many who prefer this form, including myself, tend to still put void first)

For me the arguments for are compelling. The arguments against really boil down to the same argument against East Const - inconsistency with older code. As Jon Kalb deliberated on in A Foolish Consistency, this sort of thinking can hold us back.

I've been favouring this style for more than a couple of years now. In fact I tracked down a post to the ACCU mailing list (linked here, but I believe you have to be a subscriber to read it) where I talked about it - and made all the same points I'm making here. My opinion since then has not changed much. Other than feeling more confident that it's The Right Thing.

So I think it's time we gave it a catchy name. Unlike East Const it already has a name, "trailing return types". It's not especially galvanising, though. Given the parallels to East vs West Const - and the fact that it, also, relates to the thing in question being placed to the left or the right, I propose East End Functions (vs West End Functions).

What about the redundant auto keyword?

Think of auto, here, as the "function introducer". In other languages it might be spelt fun or func. If it makes you feel better you could always:

#define func auto

... actually don't. The point is, in languages that introduce a function with func, then have a trailing return type, nobody gives it a second thought. auto is the same number of characters as func. It's a shame it's not quite as expressive - but that's the price of legacy. It shouldn't mean we "can't have nice things".

No more pointers

One of the major changes at the most recent C++ standards meeting in Jacksonville was the decision to deprecate raw pointers in C++20, moving to remove them completely in C++23. This came as a surprise to many, with a lot of discussion as to how we’ll get by without this fundamental utility available any more. In this post I’ll look at how we can replace some of the main use-cases of raw pointers in C++20.

Three of the main reasons people use raw pointers are:

  • Dynamic allocation & runtime polymorphism
  • Nullable references
  • Avoiding copies

I’ll deal with these points in turn, but first, an answer to the main question people ask about this change.

The elephant in the room

What about legacy code? Don’t worry, the committee have come up with a way to boldly move the language forward without breaking all the millions of lines of C++ which people have written over the years: opt-in extensions.

If you want to opt-in to C++20’s no-pointers feature, you use #feature.

#feature <no_pointers> //opt-in to no pointers
#feature <cpp20>       //opt-in to all C++20 features

This is a really cool new direction for the language. Hopefully with this we can slowly remove features like std::initializer_list so that new code isn’t bogged down with legacy as much as it is today.

Dynamic allocation & runtime polymorphism

I’m sure most of you already know the answer to this one: smart pointers. If you need to dynamically allocate some resource, that resource’s lifetime should be managed by a smart pointer, such as std::unique_ptr or std::shared_ptr. These types are now special compiler-built-in types rather than normal standard library types. In fact, std::is_fundamental<std::unique_ptr<int>>::value now evaluates to true!

Nullable references

Since references cannot be rebound and cannot be null, pointers are often used to fulfil this purpose. However, with C++20, we have a new type to fulfil this purpose: std::optional<T&>. std::optional was first introduced in C++17, but was plagued with no support for references and no monadic interface. C++20 has fixed both of these, so now we have a much more usable std::optional type which can fill the gap that raw pointers have left behind.

Avoiding copies

Some people like to use raw pointers to avoid copies at interface boundaries, such as returning some resource from a function. Fortunately, we have much better options, such as (Named) Return Value Optimization. C++17 made some forms of copy elision mandatory, which gives us even more guarantees for the performance of our code.

Wrapping up

Of course there are more use-cases for raw pointers, but this covers three of the most common ones. Personally, I think this is a great direction to see the language going in, and I look forward to seeing other ways we can slowly make C++ into a simpler, better language.

emBO++ 2018 Trip Report

emBO++ is a conference focused on C++ on embedded systems in Bochum, Germany. This was it’s second year of operation, but the first that I’ve been along to. It was a great conference, so I’m writing a short report to hopefully convince more of you to attend next year!

Format

The conference took place over four days: an evening of lightning talks and burgers, a workshop day, a day of talks (what you might call the conference proper), and finally an unofficial standards meeting for those interested in SG14. This made for a lot of variety, and each day was valuable.

Venue

One thing I really enjoyed about emBO++ was that the different tech and social events were dotted around the city. This meant that I actually got to see some of Bochum, get lost navigating the train system, walk around town at night, etc., which made a nice change from being cooped up in a hotel for a few days.

The main conference venue was at the Zentrum für IT-Sicherheit (Centre for IT Security). It was a spacious building with a lot of light and large social areas, so it suited the conference environment well. The only problem was that it used to be a military building and was lined with copper, making the thing into one huge Faraday cage. This meant that WiFi was an issue for the first few hours of the day, but it got sorted eventually.

zits

Food and Drink

The catering at the main conference location was really excellent: a variety of tasty food with healthy options and large quantities. Even better were the selection of drinks available, which mostly consisted of interesting soft drinks which I’d never seen before, such as bottled Matcha with lime and a number of varieties of Mate. All the locations we went to for food and drinks were great – especially the speakers dinner. A lot of thought was obviously put into this area of the conference, and it showed.

Workshops

There were four workshops on the first day of the conference with two running in parallel. The two which I attended were very interesting and instructive, but I wish that they had been more hands-on.

Jörn Seger – Getting Started with Yocto

I was in two minds about attending this workshop. We need to use Yocto a little bit in my current project, so I could attend the workshop in order to gain more knowledge about it. On the other hand, I’d then be the most experienced member of my team in Yocto and would be forced to fix all the issues!

In the end I decided to go along, and it was definitely worthwhile. Previously I’d mostly muddled along without an understanding of the fundamentals of the system; this workshop provided those.

Kris Jusiak – Embedding a Compile-Time-State-Machine

Kris gave a workshop on Boost.SML, which is an embedded domain specific language (EDSL) for encoding expressive, high-performance state machines in C++. The library is very impressive, and it was valuable to see all the different use-cases it supports and how it supports switching out the frontend and backend of the system. I was particularly interested in this session as my talk the next day was on EDSLs, so it was an opportunity to steal some things to mention in my talk.

You can find Boost.SML here.

Talks

There were two tracks for most of the day, with the first and final ones being plenary sessions. There was a strong variety of talks, and I felt that my time was well-spent at all of them.

Simon Brand – Embedded DSLs for Embedded Programming

My talk! I think it went down well. I got some good engagement and questions from the audience, although not much feedback from the attendees later on in the day. I guess I’ll need to wait for it to get torn apart on YouTube.

me

Klemens Morgenstern – Developing high-performance Coroutines for ARMs

Klemens gave an excellent talk about an ARM coroutine library which he implemented. This talk has nothing to do with the C++ Coroutines TS, instead focusing on how coroutines can be implemented in a very low-overhead manner. In Klemens’ library, the user provides some memory to be used as the stack for the coroutine, then there are small pieces of ARM assembly which perform the context switch when you suspend or resume that coroutine. The talk went into the performance considerations, implementation, and use in just the right amount of detail, so I would definitely recommend watching if you want an overview of the ideas.

The library and presentation can be found here.

Emil Fresk – The compile-time, reactive scheduler: CRECT

CRECT is a task scheduler which carries out its work at compile time, therefore almost entirely disappearing from the generated assembly. Emil’s lofty goal for the talk was to present all of the necessary concepts such that those viewing the talk would feel like they could go off and implement something similar afterwards. I think he mostly succeeded in this – although a fair amount of metaprogramming skills would be required! He showed how to use the library to specify the jobs which need to be executed, the resources which they require, and when they should be scheduled to run. After we understood the fundamentals of the library, we learned how this actually gets executed at compile-time in order to produce the final scheduled output. Highly recommended for those who work with embedded systems and want a better way of scheduling their work.

You can find CRECT here.

Ben Craig – Standardizing an OS-less subset of C++

If you watch one talk from the conference it should be this one. C++ has had a “freestanding” variant for a long time, and it’s been neglected for the same amount of time. Ben talked about all the things which should not be available in freestanding mode but are, and those which should be but are not. He presented his vision for what should be standards-mandated facilities available in freestanding C++ implementations, and a tentative path to making this a reality. Particularly of interest were the odd edge cases which I hadn’t considered. For example, it turns out that std::array has to #include <string> somewhere down the line, because my_array.at(n) can throw an exception (std::out_of_range), and that exception has a constructor which takes std::string as an argument. These tiny issues will make getting a solid standard for freestanding difficult to pin down and agree on, but I think it’s a worthy cause.

Ben’s ISO C++ paper on a freestanding standard library can be found here.

Jacek Galowicz — Scalable test infrastructure for advanced bare-metal software development

In Jacek’s team, they have many different hardware versions to support. This creates a problem of creating regressions in some versions and not others when changes are made. This talk showed how they developed the testing infrastructure to enable them to test all hardware versions they needed on each merge request to ensure that bad commits aren’t merged in to the master branch. They wrote a simple testing framework in Haskell which was fine-tuned to their use case rather than using an existing solution like Jenkins (that’s what we use at Codeplay for solving the same problem). Jacek spoke about issues they faced and the creative solutions they put in place, such as putting a light detector over the CAPS LOCK button of a keyboard and making it blink in Morse code in order to communicate out from machines with no usable ports.

Odin Holmes – Bare-Metal-Roadmap

Odin’s talk summed up some current major problems that are facing the embedded community, and roped together all of the talks which had come before. It was cool to see the overlap in all the talks in areas of abstraction, EDSLs, making choices at compile time, etc.

Closing

I had a great time at emBO++ and would whole-heartedly recommend attending next year. The talks should be online in the next few months, so I look forward to watching those which I didn’t attend. The conference is mostly directed at embedded C++ developers, but I think it would be valuable to anyone doing low-latency programming on non-embedded systems, or those writing C/Rust/whatever for embedded platforms.

Thank you to Marie, Odin, Paul, Stephan, and Tabea for inviting me to talk and organising these great few days!

embo

Do compilers take inline as a hint?

If you’ve spent any time in C or C++ communities online, you’ve probably seen someone say this:

inline used to be a hint for compilers to inline the definition, but no compilers actually take that into account any more.

You shouldn’t believe everything you see on the internet.

Of course, inline has mandatory effects on linkage and whatnot, but this post is only interested in whether or not compilers might change the decision to inline a function or not based on whether you write inline in the declaration.

However, the point of this article isn’t to give you good rules for when to use inline, because as much as I like to debate technical minutiae, your compiler will likely be a better judge than you anyway. Instead I want to show that if you want to know how compilers or standard libraries implement things, you can just go look! It might be daunting the first time, but getting a knack for finding things in large code bases and understanding your tools can pay off. I’ll be diving into the codebases for Clang and GCC to see how they handle inlining hints and hopefully convince you that you can do the same for questions which you have about your tools.


Clang

Let’s do this bottom-up. Clang is a compiler frontend for LLVM, which means that it takes C-family languages, translates them to LLVM Intermediate Representation, and LLVM handles generating assembly/binaries from that. So we can dig around in LLVM to see if we can find code which deals with inlining. I armed myself with a text editor and grep and found the following code in llvm/lib/Analysis/InlineCost.cpp:

  // Adjust the threshold based on inlinehint attribute and profile based
// hotness information if the caller does not have MinSize attribute.
if (!Caller->optForMinSize()) {
if (Callee.hasFnAttribute(Attribute::InlineHint))
Threshold = MaxIfValid(Threshold, Params.HintThreshold);
if (PSI) {
uint64_t TotalWeight;
if (CS.getInstruction()->extractProfTotalWeight(TotalWeight) &&
PSI->isHotCount(TotalWeight)) {
Threshold = MaxIfValid(Threshold, Params.HotCallSiteThreshold);
} else if (PSI->isFunctionEntryHot(&Callee)) {
// If callsite hotness can not be determined, we may still know
// that the callee is hot and treat it as a weaker hint for threshold
// increase.
Threshold = MaxIfValid(Threshold, Params.HintThreshold);
} else if (PSI->isFunctionEntryCold(&Callee)) {
Threshold = MinIfValid(Threshold, Params.ColdThreshold);
}
}
}

By just looking at this code without knowledge of all the other functions, we won’t be able to totally understand what it’s doing. But there’s certainly some information we can gleam here:

    if (Callee.hasFnAttribute(Attribute::InlineHint))
Threshold = MaxIfValid(Threshold, Params.HintThreshold);

So now we know that LLVM takes the presence of an inlinehint attribute into account in its inlining cost model. But does Clang produce that attribute? Let’s look for that attribute in the Clang sources:

    // Otherwise, propagate the inline hint attribute and potentially use its
// absence to mark things as noinline.
if (auto *FD = dyn_cast<FunctionDecl>(D)) {
if (any_of(FD->redecls(), [&](const FunctionDecl *Redecl) {
return Redecl->isInlineSpecified();
})) {
B.addAttribute(llvm::Attribute::InlineHint);
} else if (CodeGenOpts.getInlining() ==
CodeGenOptions::OnlyHintInlining &&
!FD->isInlined() &&
!F->hasFnAttribute(llvm::Attribute::AlwaysInline)) {
B.addAttribute(llvm::Attribute::NoInline);
}
}

The above code from clang/lib/CodeGen/CodeGenModule.cpp shows Clang setting the inlinehint attribute. It will also possibly mark declarations as noinline if inline was not supplied (depending on compiler flags). Now we can look for code which affects isInlineSpecified:

// clang/include/clang/Sema/DeclSpec.h
bool isInlineSpecified() const {
return FS_inline_specified | FS_forceinline_specified;
}

// clang/lib/Sema/DeclSpec.cpp
bool DeclSpec::setFunctionSpecInline(SourceLocation Loc, const char *&PrevSpec,
unsigned &DiagID) {
// 'inline inline' is ok. However, since this is likely not what the user
// intended, we will always warn, similar to duplicates of type qualifiers.
if (FS_inline_specified) {
DiagID = diag::warn_duplicate_declspec;
PrevSpec = "inline";
return true;
}
FS_inline_specified = true;
FS_inlineLoc = Loc;
return false;
}

// Parse/ParseDecl.cpp:ParseDeclarationSpecifiers
case tok::kw_inline:
isInvalid = DS.setFunctionSpecInline(Loc, PrevSpec, DiagID);
break;

The above snippets show the functions which set and retrieve whether something was specified as inline, and the code in the parser which calls the setter.

So now we know that Clang propagates the presence of the inline keyword through to LLVM using the inlinehint attribute, and that LLVM takes this into account in its cost model for inlining. Our detective work has been successful!


GCC

How about GCC? The source for GCC is a fair bit less accessible than that of LLVM, but we can still make an attempt. After a bit of searching, I found that the DECL_DECLARED_INLINE_P macro seemed to be used lots of places which were relevant. So we can look for where that’s set:

        // c/c-decl.c:grokdeclarator

/* Record presence of `inline' and `_Noreturn', if it is
reasonable. */

if (flag_hosted && MAIN_NAME_P (declarator->u.id))
{
if (declspecs->inline_p)
pedwarn (loc, 0, "cannot inline function %<main%>");
if (declspecs->noreturn_p)
pedwarn (loc, 0, "%<main%> declared %<_Noreturn%>");
}
else
{
if (declspecs->inline_p)
/* Record that the function is declared `inline'. */
DECL_DECLARED_INLINE_P (decl) = 1;
if (declspecs->noreturn_p)
{
if (flag_isoc99)
pedwarn_c99 (loc, OPT_Wpedantic,
"ISO C99 does not support %<_Noreturn%>");
else
pedwarn_c99 (loc, OPT_Wpedantic,
"ISO C90 does not support %<_Noreturn%>");
TREE_THIS_VOLATILE (decl) = 1;
}
}

We can then look for uses of this macro which seem to affect the behaviour of the inliner. Here are a few:

  //ipa-inline.c:want_inline_small_function_p    

/* Do fast and conservative check if the function can be good
inline candidate. At the moment we allow inline hints to
promote non-inline functions to inline and we increase
MAX_INLINE_INSNS_SINGLE 16-fold for inline functions. */

else if ((!DECL_DECLARED_INLINE_P (callee->decl)
&& (!e->count.ipa ().initialized_p () || !e->maybe_hot_p ()))
&& ipa_fn_summaries->get (callee)->min_size
- ipa_call_summaries->get (e)->call_stmt_size
> MAX (MAX_INLINE_INSNS_SINGLE, MAX_INLINE_INSNS_AUTO))
{
e->inline_failed = CIF_MAX_INLINE_INSNS_AUTO_LIMIT;
want_inline = false;
}


//ipa-inline.c:edge_badness

/* ... and do not overwrite user specified hints. */
&& (!DECL_DECLARED_INLINE_P (edge->callee->decl)
|| DECL_DECLARED_INLINE_P (caller->decl)))))


//ipa-inline.c:early_inline_small_functions

/* Do not consider functions not declared inline. */
if (!DECL_DECLARED_INLINE_P (callee->decl)
&& !opt_for_fn (node->decl, flag_inline_small_functions)
&& !opt_for_fn (node->decl, flag_inline_functions))
continue;

That’s some evidence that the keyword is being taken into account. If we want to get a better idea of all of the effects, then we now have a starting point from which to conduct our expectations.


Conclusion

I hope I’ve managed to convince you of two things:

  1. Some modern compilers still take inline hints into account.
  2. If you want to understand how your compiler works and it’s open source, you can just go and look.

If you’re actually going to try and optimize your code using inline then Do The Right Thing And Measure It1. See what your compiler actually generates. Profile your code to make sure you’re opmitizing something that needs optimizing. Don’t guess.


  1. DTRTAMI (dee-tee-arr-tamee) kinda has a ring to it. 

Passing overload sets to functions

Passing functions to functions is becoming increasingly prevalent in C++. With common advice being to prefer algorithms to loops, new library features like std::visit, lambdas being incrementally beefed up12 and C++ function programming talks consistently being given at conferences, it’s something that almost all C++ programmers will need to do at some point. Unfortunately, passing overload sets or function templates to functions is not very well supported by the language. In this post I’ll discuss a few solutions and show how C++ still has a way to go in supporting this well.

An example

We have some generic operation called foo. We want a way of specifying this function which fulfils two key usability requirements.

1- It should be callable directly without requiring manually specifying template arguments:

auto a = foo(42);           //good
auto b = foo("hello"); //good
auto c = foo<double>(42.0); //bad
auto d = foo{}(42.0); //bad

2- Passing it to a higher-order function should not require manually specifying template arguments:

std::transform(first, last, target, foo);      //good
std::transform(first, last, target, foo<int>); //bad
std::transform(first, last, target, foo{}); //okay I guess

A simple first choice would be to make it a function template:

template <class T>
T foo(T t) { /*...*/ }

This fulfils the first requirement, but not the second:

//compiles, but not what we want
std::transform(first, last, target, foo<int>);

//uh oh
std::transform(first, last, target, foo);

7 : <source>:7:5: error: no matching function for call to 'transform'
std::transform(first, last, target, foo);
^~~~~~~~~~~~~~
/opt/compiler-explorer/gcc-7.2.0/lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/bits/stl_algo.h:4295:5: note: candidate template ignored: couldn't infer template argument '_UnaryOperation'
transform(_InputIterator __first, _InputIterator __last,
^
/opt/compiler-explorer/gcc-7.2.0/lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/bits/stl_algo.h:4332:5: note: candidate function template not viable: requires 5 arguments, but 4 were provided
transform(_InputIterator1 __first1, _InputIterator1 __last1,
^
1 error generated.

That’s no good.

A second option is to write foo as a function object with a call operator template:

struct foo {
template<class T>
T operator()(T t) { /*...*/ }
};

We are now required to create an instance of this type whenever we want to use the function, which is okay for passing to other functions, but not great if we want to call it directly:

//this looks okay
std::transform(first, last, target, foo{});

//this looks strange
auto x = foo{}(42.0);
auto x = foo()(42.0);

We have similar problems when we have multiple overloads, even when we’re not using templates:

int foo (int);
float foo (float);

std::transform(first, last, target, foo); //doesn't compile
// ew ew ew ew ew ew ew
std::transform(first, last, target, static_cast<int(*)(int)>(foo));

We’re going to need a different solution.


Lambdas and LIFT

As an intermediate step, we could use the normal function template approach, but wrap it in a lambda whenever we want to pass it to another function:

std::transform(first, last, target,
[](const auto&... xs) { return foo(xs...); });

That’s not great. It’ll work in some contexts where we don’t know what template arguments to supply, but it’s not yet suitable for all cases. One improvement would be to add perfect forwarding:

[](auto&&... xs) { return foo(std::forward<decltype(xs)>(xs)...); }

But wait, we want to be SFINAE friendly, so we’ll add a trailing return type:

[](auto&&... xs) -> decltype(foo(std::forward<decltype(xs)>(xs)...)) {
return foo(std::forward<decltype(xs)>(xs)...);
}

Okay, it’s getting pretty crazy and expert-only at this point. And we’re not even done! Some contexts will care about noexcept:

[](auto&&... xs)
noexcept(noexcept(foo(std::forward<decltype(xs)>(xs)...)))
-> decltype(foo(std::forward<decltype(xs)>(xs)...)) {
return foo(std::forward<decltype(xs)>(xs)...);
}

So the solution is to write this every time we want to pass an overloaded function to another function. That’s probably a good way to make your code reviewer cry.

What would be nice is if P0573: Abbreviated Lambdas for Fun and Profit and P0644: Forward without forward were accepted into the language. That’d let us write this:

[](xs...) => foo(>>xs...)

The above is functionally equivalent to the triplicated monstrosity in the example before. Even better, if P0834: Lifting overload sets into objects was accepted, we could write:

[]foo

That lifts the overload set into a single function object which we can pass around. Unfortunately, all of those proposals have been rejected. Maybe they can be renewed at some point, but for now we need to make do with other solutions. One such solution is to approximating []foo with a macro (I know, I know).

#define FWD(...) std::forward<decltype(__VA_ARGS__)>(__VA_ARGS__)

#define LIFT(X) [](auto &&... args) \
noexcept(noexcept(X(FWD(args)...))) \
-> decltype(X(FWD(args)...)) \
{ \
return X(FWD(args)...); \
}

Now our higher-order function call becomes:

std::transform(first, last, target, LIFT(foo));

Okay, so there’s a macro in there, but it’s not too bad (you know we’re in trouble when I start trying to justify the use of macros for this kind of thing). So LIFT is at least some solution.


Making function objects work for us

You might recall from a number of examples ago that the problem with using function object types was the need to construct an instance whenever we needed to call the function. What if we make a global instance of the function object?

struct foo_impl {
//template
template<class T>
T operator()(T t) { /*...*/ }

//overloads
int operator()(int) { /*...*/ }
float operator()(float) { /*...*/ }
};

extern const foo_impl foo;

// in some .cpp file
foo_impl foo;

This works if you’re able to have a single translation unit with the definition of the global object. If you’re writing a header-only library then you don’t have that luxury, so you need to do something different.

struct foo_impl {
template<class T>
T operator()(T t) { /*...*/ }
};

static constexpr foo_impl foo;

This might look innocent, but it can lead to One-Definition Rule (ODR) violations3:

//test.h header
struct foo_impl {
template<class T>
T operator()(T t) const { return t; }
};

static constexpr foo_impl foo;

template <class T>
int oh_no(T t) {
auto* foop = &foo;
return (*foop)(t);
}

//cpp1
#include "test.h"
int sad() {
return oh_no(42);
}

//cpp2
#include "test.h"
int also_sad() {
return oh_no(24);
}

Since foo is declared static, each Translation Unit (TU) will get its own definition of the variable. However, sad and also_sad will instantiate oh_no which will get different definitions of foo for &foo. This is undefined behaviour by [basic.def.odr]/12.2.

In C++17 the solution is simple:

inline constexpr foo_impl foo{};

The inline allows the variable to be multiply-defined, and the linker will throw away all but one of the definitions.

If you can’t use C++17, there are a few solutions given in N4424: Inline Variables. The Ranges V3 library uses a reference to a static member of a template class:

template<class T>
struct static_const {
static constexpr T value{};
};

template <class T>
constexpr T static_const<T>::value;

constexpr auto& foo = static_const<foo_impl>::value;

An advantage of the function object approach is that function objects designed carefully make for much better customisation points than the traditional techniques used in the standard library. See Eric Niebler’s blog post and standards paper for more information.

A disadvantage is that now we need to write all of the functions we want to use this way as function objects, which is not great at the best of times, and even worse if we want to use external libraries. One possible solution would be to combine the two techniques we’ve already seen:

// This could be in an external library
namespace lib {
template <class T>
T foo(T t) { /*...*/ }
}

namespace lift {
inline constexpr auto foo = LIFT(lib::foo);
}

Now we can use lift::foo instead of lib::foo and it’ll fit the requirements I laid out at the start of the post. Unfortunately, I think it’s possible to hit ODR-violations with this due to possible difference in closure types cross-TU. I’m not sure what the best workaround for this is, so input is appreciated.


Conclusion

I’ve given you a few solutions to the problem I showed at the start, so what’s my conclusion? C++ still has a way to go to support this paradigm of programming, and teaching these ideas is a nightmare. If a beginner or even intermediate programmer asks how to pass overloaded functions around – something which sounds like it should be fairly easy – it’s a real shame that the best answers I can come up with are “Copy this macro which you have no chance of understanding”, or “Make function objects, but make sure you do it this way for reasons which I can’t explain unless you understand the subtleties of ODR4”. I feel like the language could be doing more to support these use cases.

Maybe for some people “Do it this way and don’t ask why” is an okay answer, but that’s not very satisfactory to me. Maybe I lack imagination and there’s a better way to do this with what’s already available in the language. Send me your suggestions or heckles on Twitter @TartanLlama.


Thanks to Michael Maier for the motivation to write this post; Jayesh Badwaik, Ben Craig, Michał Dominiak and Kévin Boissonneault for discussion on ODR violations; and Eric Niebler, Barry Revzin, Louis Dionne, and Michał Dominiak (again) for their work on the libraries and standards papers I referenced.


  1. P0315: Lambdas in unevaluated contexts 

  2. P0624: Default constructible and assignable stateless lambdas 

  3. Example lovingly stolen from n4381

  4. Disclaimer: I don’t understand all the subtleties of ODR. 

A call for data on exceptions

Last week I released an article entitled “Functional exceptionless error-handling with optional and expected”. The idea was to talk about some good ways to handle errors for those cases where you don’t want to use exceptions, but without starting a huge argument about whether exceptions are good or bad. That post has since spawned arguments about exceptions on Hacker News, two other blog posts, and three Reddit threads. Ten points for effort, zero for execution. Since the argument has now started, I feel inclined to give my own view on the matter.

I know people who think exceptions are dangerous. I don’t personally know anyone who thinks ADT-based error handling is dangerous, but I’m sure the Internet is full of them. What I think is dangerous is cargo cult software engineering. Many people take a stance on either side of this divide for good reasons based on the constraints of their platform, but an unfortunately large number just hear a respected games programmer say “exceptions are evil!”, believe that this applies to all cases and assert it unconditionally. How do we fight this? With data.

I believe that as a community we need to be able to answer these questions:

  1. What do the different kinds of error handling cost when errors occur?
  2. What do they cost when no errors occur?
  3. How consistent are the costs, and what causes deviations?
  4. What is the cognitive overhead of the different techniques?
  5. How do they affect the design of our interfaces?

I have an intuition about all of these, backed by some understanding of the actual code which is generated and how it performs, as I’m sure many of you do. But we hit a problem when it comes to teaching. I don’t believe that teaching an intuition can build an intuition. Such a thing can only be built by experimentation or the reduction of hard data into a broad understanding of what that data points towards. If we answer all of questions above with hard data, then we can come up with a set of strong, measurable guidelines for when to use different kinds of error handling, and we can have something concrete to point people towards to help build that intuition. The C++ Core Guidelines provides the former, but not the latter. Furthermore, when I talk about “hard data”, I don’t just mean microbenchmarks. Microbenchmarks can help, sure, but writing them such that they have the realistic error patterns, cache behaviour, branch prediction, etc. is hard.

Of course, many people have tried to answer these questions already, so I’ve teamed up with Matt Dziubinski to put together a set of existing resources on error handling. Use it to educate yourselves and others, and please submit issues/pull requests to make it better. But I believe that there’s still work to be done, so if you’re a prospective PhD or Masters student and you’re looking for a topic, consider helping out the industry by researching the real costs of different error handling techniques in realistic code bases with modern compilers and modern hardware.

throw end_of_blog_post;

Functional exceptionless error-handling with optional and expected

In software things can go wrong. Sometimes we might expect them to go wrong. Sometimes it’s a surprise. In most cases we want to build in some way of handling these misfortunes. Let’s call them disappointments. I’m going to exhibit how to use std::optional and the proposed std::expected to handle disappointments, and show how the types can be extended with concepts from functional programming to make the handling concise and expressive.

One way to express and handle disappointments is exceptions:

void foo() {
try {
do_thing();
}
catch (...) {
//oh no
handle_error();
}
}

There are a myriad of discussions, resources, rants, tirades, debates about the value of exceptions123456, and I will not repeat them here. Suffice to say that there are cases in which exceptions are not the best tool for the job. For the sake of being uncontroversial, I’ll take the example of disappointments which are expected within reasonable use of an API.

The internet loves cats. The hypothetical you and I are involved in the business of producing the cutest images of cats the world has ever seen. We have produced a high-quality C++ library geared towards this sole aim, and we want it to be at the bleeding edge of modern C++.

A common operation in feline cutification programs is to locate cats in a given image. How should we express this in our API? One option is exceptions:

// Throws no_cat_found if a cat is not found.
image_view find_cat (image_view img);

This function takes a view of an image and returns a smaller view which contains the first cat it finds. If it does not find a cat, then it throws an exception. If we’re going to be giving this function a million images, half of which do not contain cats, then that’s a lot of exceptions being thrown. In fact, we’re pretty much using exceptions for control flow at that point, which is A Bad Thing™.

What we really want to express is a function which either returns a cat if it finds one, or it returns nothing. Enter std::optional.

std::optional<image_view> find_cat (image_view img);

std::optional was introduced in C++17 for representing a value which may or may not be present. It is intended to be a vocabulary type – i.e. the canonical choice for expressing some concept in your code. The difference between this signature and the last is powerful; we’ve moved the description of what happens on an error from the documentation into the type system. Now it’s impossible for the user to forget to read the docs, because the compiler is reading them for us, and you can be sure that it’ll shout at you if you use the type incorrectly.

The most common operations on a std::optional are:

std::optional<image_view> full_view = my_view;
std::optional<image_view> empty_view;
std::optional<image_view> another_empty_view = std::nullopt;

full_view.has_value(); //true
empty_view.has_value(); //false

if (full_view) { this_works(); }

my_view = full_view.value();
my_view = *full_view;

my_view = empty_view.value(); //throws bad_optional_access
my_view = *empty_view; //undefined behaviour

Now we’re ready to use our find_cat function along with some other friends from our library to make embarrassingly adorable pictures of cats:

std::optional<image_view> get_cute_cat (image_view img) {
auto cropped = find_cat(img);
if (!cropped) {
return std::nullopt;
}

auto with_tie = add_bow_tie(*cropped);
if (!with_tie) {
return std::nullopt;
}

auto with_sparkles = make_eyes_sparkle(*with_tie);
if (!with_sparkles) {
return std::nullopt;
}

return add_rainbow(make_smaller(*with_sparkles));
}

Well this is… okay. The user is made to explicitly handle what happens in case of an error, so they can’t forget about it, which is good. But there are two issues with this:

  1. There’s no information about why the operations failed.
  2. There’s too much noise; error handling dominates the logic of the code.

I’ll address these two points in turn.

Why did something fail?

std::optional is great for expressing that some operation produced no value, but it gives us no information to help us understand why this occurred; we’re left to use whatever context we have available, or (God forbid) output parameters. What we want is a type which either contains a value, or contains some information about why the value isn’t there. This is called std::expected.

Now don’t go rushing off to cppreference to find out about std::expected; you won’t find it there yet, because it’s currently a standards proposal rather than a part of C++ proper. However, its interface follows std::optional pretty closely, so you already understand most of it. Again, here are the most common operations:

std::expected<image_view,error_code> full_view = my_view;
std::expected<image_view,error_code> empty_view = std::unexpected(that_is_a_dog);

full_view.has_value(); //true
empty_view.has_value(); //false

if (full_view) { this_works(); }

my_view = full_view.value();
my_view = *full_view;

my_view = empty_view.value(); //throws bad_expected_access
my_view = *empty_view; //undefined behaviour

auto code = empty_view.error();
auto oh_no = full_view.error(); //undefined behaviour

With std::expected our code might look like this:

std::expected<image_view, error_code> get_cute_cat (image_view img) {
auto cropped = find_cat(img);p
if (!cropped) {
return no_cat_found;
}

auto with_tie = add_bow_tie(*cropped);
if (!with_tie) {
return cannot_see_neck;
}

auto with_sparkles = make_eyes_sparkle(*with_tie);
if (!with_sparkles) {
return cat_has_eyes_shut;
}

return add_rainbow(make_smaller(*with_sparkles));
}

Now when we call get_cute_cat and don’t get a lovely image back, we have some useful information to report to the user as to why we got into this situation.

Noisy error handling

Unfortunately, with both the std::optional and std::expected versions, there’s still a lot of noise. This is a disappointing solution to handling disappointments. It’s also the limit of what C++17’s std::optional and the most recent proposed std::expected give us.

What we really want is a way to express the operations we want to carry out while pushing the disappointment handling off to the side. As is becoming increasingly trendy in the world of C++, we’ll look to the world of functional programming for help. In this case, the help comes in the form of what I’ll call map and and_then.

If we have some std::optional and we want to carry out some operation on it if and only if there’s a value stored, then we can use map:

widget do_thing (const widget&);

std::optional<widget> result = maybe_get_widget().map(do_thing);
auto result = maybe_get_widget().map(do_thing); //or with auto

This code is roughly equivalent to:

widget do_thing (const widget&);

auto opt_widget = maybe_get_widget();
if (opt_widget) {
widget result = do_thing(*opt_widget);
}

If we want to carry out some operation which could itself fail then we can use and_then:

std::optional<widget> maybe_do_thing (const widget&);

std::optional<widget> result = maybe_get_widget().and_then(maybe_do_thing);
auto result = maybe_get_widget().and_then(maybe_do_thing); //or with auto

This code is roughly equivalent to:

std::optional<widget> maybe_do_thing (const widget&);

auto opt_widget = maybe_get_widget();
if (opt_widget) {
std::optional<widget> result = maybe_do_thing(*opt_widget);
}

The real power of this comes when we begin to chain operations together. Let’s look at that original get_cute_cat implementation again:

std::optional<image_view> get_cute_cat (image_view img) {e
auto cropped = find_cat(img);
if (!cropped) {
return std::nullopt;
}

auto with_tie = add_bow_tie(*cropped);
if (!with_tie) {
return std::nullopt;
}

auto with_sparkles = make_eyes_sparkle(*with_tie);
if (!with_sparkles) {
return std::nullopt;
}

return add_rainbow(make_smaller(*with_sparkles));
}

With map and and_then, our code transforms into this:

tl::optional<image_view> get_cute_cat (image_view img) {
return crop_to_cat(img)
.and_then(add_bow_tie)
.and_then(make_eyes_sparkle)
.map(make_smaller)
.map(add_rainbow);
}

With these two functions we’ve successfully pushed the error handling off to the side, allowing us to express a series of operations which may fail without interrupting the flow of logic to test an optional.

A theoretical aside

I didn’t make up map and and_then off the top of my head; other languages have had equivalent features for a long time, and the theoretical concepts are common subjects in Category Theory.

I won’t attempt to explain all the relevant concepts in this post, as others have done it far better than I could. The basic idea is that map comes from the concept of a functor, and and_then comes from monads. These two functions are called fmap and >>= (bind) in Haskell. The best description of these concepts which I have read is Functors, Applicatives’ And Monads In Pictures by Aditya Bhargava. Give it a read if you’d like to learn more about these ideas.

A note on overload sets

One use-case which is annoyingly verbose is passing overloaded functions to map or and_then. For example:

int foo (int);

tl::optional<int> o;
o.map(foo);

The above code works fine. But as soon as we add another overload to foo:

int foo (int);
int foo (double);

tl::optional<int> o;
o.map(foo);

then it fails to compile with a rather unhelpful error message:

test.cpp:7:3: error: no matching member function for call to 'map'
o.map(foo);
~~^~~
/home/simon/projects/optional/optional.hpp:759:52: note: candidate template ignored: couldn't infer template argument 'F'
  template <class F> TL_OPTIONAL_11_CONSTEXPR auto map(F &&f) & {
                                                   ^
/home/simon/projects/optional/optional.hpp:765:52: note: candidate template ignored: couldn't infer template argument 'F'
  template <class F> TL_OPTIONAL_11_CONSTEXPR auto map(F &&f) && {
                                                   ^
/home/simon/projects/optional/optional.hpp:771:37: note: candidate template ignored: couldn't infer template argument 'F'
  template <class F> constexpr auto map(F &&f) const & {
                                    ^
/home/simon/projects/optional/optional.hpp:777:37: note: candidate template ignored: couldn't infer template argument 'F'
  template <class F> constexpr auto map(F &&f) const && {
                                    ^
1 error generated.

One solution for this is to use a generic lambda:

tl::optional<int> o;
o.map([](auto x){return foo(x);});

Another is a LIFT macro:

#define FWD(...) std::forward<decltype(__VA_ARGS__)>(__VA_ARGS__)
#define LIFT(f) \
[](auto&&... xs) noexcept(noexcept(f(FWD(xs)...))) -> decltype(f(FWD(xs)...)) \
{ return f(FWD(xs)...); }

tl::optional<int> o;
o.map(LIFT(foo));

Personally I hope to see overload set lifting get into the standard so that we don’t need to bother with the above solutions.

Current status

Maybe I’ve persuaded you that these extensions to std::optional and std::expected are useful and you would like to use them in your code. Fortunately I have written implementations of both with the extensions shown in this post, among others. tl::optional and tl::expected are on GitHub as single-header libraries under the CC0 license, so they should be easy to integrate with projects new and old.

As far as the standard goes, there are a few avenues being entertained for adding this functionality. I have a proposal to extend std::optional with new member functions. Vicente Escribá has a proposal for a generalised monadic interface for C++. Niall Douglas’ operator try() paper suggests an analogue to Rust’s try! macro for removing some of the boilerplate associated with this style of programming. It turns out that you can use coroutines for doing this stuff, although my gut feeling puts this more to the “abuse” end of the spectrum. I’d also be interested in evaluating how Ranges could be leveraged for these goals.

Ultimately I don’t care how we achieve this as a community so long as we have some standardised solution available. As C++ programmers we’re constantly finding new ways to leverage the power of the language to make expressive libraries, thus improving the quality of the code we write day to day. Let’s apply this to std::optional and std::expected. They deserve it.


Meeting C++17 Trip Report

This year was my first time at Meeting C++. It was also the first time I gave a full-length talk at a conference. But most of all it was a wonderful experience filled with smart, friendly people and excellent talks. This is my report on the experience. I hope it gives you an idea of some talks to watch when they’re up on YouTube, and maybe convince you to go along and submit talks next year! I’ll be filling out links to the talks as they go online.

The Venue

The conference was held at the Andel’s Hotel in Berlin, which is a very large venue on the East side of town, about 15 minutes tram ride from Alexanderplatz. All the conference areas were very spacious and there were always places to escape to when you need a break from the noise, or places to sit down on armchairs when you need a break from sitting down on conference chairs.

The People

Some people say they go to conferences for the talks, and for the opportunity to interact with the speakers in the Q&As afterwards. The talks are great, but I went for the people. I had a wonderful time meeting those whom I knew from Twitter or the CppLang Slack, but had never encountered in real life; catching up with those who I’ve met at other events; and talking to those I’d never had interacted with before. There were around 600 people at the conference, primarily from Europe, but many from further-afield. I didn’t have any negative encounters with anyone at the conference, and Jens was very intentional about publicising the Code of Conduct and having a diverse team to enforce it, so it was a very positive atmosphere.

Day 1

Keynote: Sean Parent – Better Code: Human interface

After a short introduction from Jens, the conference kicked off with a keynote from Sean Parent. Although he didn’t say it in these words, the conclusion I took out of it was that all technology requires some thought about user experience (UX if we’re being trendy). Many of us will think of UX as something centred around user interfaces, but it’s more than that. If you work on compilers (like I do), then your UX work involves consistent compiler flags, friendly error messages and useful tooling. If you work on libraries, then your API is your experience. Do your functions do the most obvious thing? Are they well-documented? Do they come together to create terse, correct, expressive, efficient code? Sean’s plea was for us to consider all of these things, and to embed them into how we think about coding. Of course, it being a talk about good code from Sean Parent, it had some great content about composing standard algorithms. I would definitely recommend checking out the talk if what I’ve just written sounds thought-provoking to you.

Peter Goldsborough – Deep Learning with C++

Peter gave a very well-structured talk, which went down the stack of technology involved in deep learning – from high-level libraries to hardware – showing how it all works in theory. He then went back up the stack showing the practical side, including a few impressive live demos. It covers a lot of content, giving a pretty comprehensive introduction. It does move at a very quick pace though, so you probably don’t want to watch this one on 1.5x on YouTube!

Jonathan Boccara – Strong types for strong interfaces

This talk was quite a struggle against technical difficulties; starting 15 minutes late after laptop/projector troubles, then having Keynote crash as soon as the introduction was over. Fortunately, Jonathan handled it incredibly well, seeming completely unfazed as he gave an excellent presentation and even finished on time.

The talk itself was about using strong types (a.k.a. opaque typedefs) to strengthen your code’s resilience to programmer errors and increase its readability. This was a distillation of some of the concepts which Jonathan has written about in a series on his blog. He had obviously spent a lot of time rehearsing this talk and put a lot of thought into the visual design of the slides, which I really appreciated.

Simon Brand (me!) – How C++ Debuggers Work

This was my first full length talk at a conference, so I was pretty nervous about it, but I think it went really well: there were 150-200 people there, I got some good questions, and a bunch of people came to discuss the talk with me in the subsequent days. I screwed up a few things – like referring to the audience as “guys” near the start and finishing a bit earlier than I would have liked – but all in all it was a great experience.

My talk was an overview of almost all of the commonly-used parts of systems-level debuggers. I covered breakpoints, stepping, debug information, object files, operating system interaction, expression evaluation, stack unwinding and a whole lot more. It was a lot of content, but I think I managed to get a good presentation on it. I’d greatly appreciate any feedback of how the talk could be improved in case I end up giving it at some other conference in the future!

Joel Falcou – The Three Little Dots and the Big Bad Lambdas

I’m a huge metaprogramming nerd, so this talk on using lambdas as a kind of code injection was very interesting for me. Joel started the talk with an introduction to how OCaml allows code injection, and then went on to show how you could emulate this feature in C++ using templates, lambdas and inlining in C++. It turned out that he was going to mostly talk about the indices trick for unpacking tuple-like things, fold expressions (including how to emulate them in C++11/14), and template-driven loop unrolling – all of which are techniques I’m familiar with. However, I hadn’t thought about these tools in the context of compile-time code injection, and Joel presented them very well, so it was still a very enjoyable talk. I particularly enjoyed hearing that he uses these in production in his company in order to fine-tune performance. He showed a number of benchmarks of the linear algebra library he has worked on against another popular implementation which hand-tunes the code for different architectures, and his performance was very competitive. A great talk for demonstrating how templates can be used to optimise your code!

Diego Rodriguez-Losada – Conan C++ Quiz

This was one of the highlights of the whole conference. The questions were challenging, thought-provoking, and hilarious, and Diego was a fantastic host. He delivered all the content with charisma and wit, and was obviously enjoying himself: maybe my favourite part of the quiz was the huge grin he wore as everyone groaned in incredulity when the questions were revealed. I don’t know if the questions will go online at some point, but I’d definitely recommend giving them a shot if you get the chance.

Day 2

Keynote: Kate Gregory – It’s complicated!

Kate gave a really excellent talk which examined the complexities of writing good code, and those of C++. The slide which I saw the most people get their phones out for said “Is it important to your ego that you’re really good at a complicated language?”, which I think really hit home for many of us. C++ is like an incredibly intricate puzzle, and when you solve a piece of it, you feel good about it and want to share your success with others. However, sometimes, we can make that success, that understanding, part of our identity. Neither Kate or I are saying that we shouldn’t be proud of our domination over the obscure parts of the language, but a bit of introspection about how we reflect this understanding and share it with others will go a long way.

Another very valuable part of her talk was her discussion on the importance of good names, and how being able to describe some idiom or pattern helps us to talk about it and instruct others. Of course, this is what design patterns were originally all about, but Kate frames it in a way which helps build an intuition around how this common language works to make us better programmers. A few days after the conference, this is the talk which has been going around my head the most, so if you watch one talk from this year’s conference, you could do a lot worse than this one.

Felix Petriconi – There Is A New Future

Felix presented the std::future-a-like library which he and Sean Parent have been working on. It was a very convincing talk, showing the inadequacies of std::future, and how their implementation fixes these problems. In particular, Felix discussed continuations, cancellations, splits and forks, and demonstrated their utility. You can find the library here.

Lukas Bergdoll – Is std::function really the best we can do?

If you’ve read a lot of code written by others, I’m sure you’ll have seen people misusing std::function. This talk showed why so many uses are wrong, including live benchmarks, and gave a number of alternatives which can be used in different situations. I particularly liked a slide which listed the conditions for std::function to be a good choice, which were:

  • It has to be stored inside a container
  • You really can’t know the layout at compile time
  • All your callable types are move and copyable
  • You can constrain the user to exactly one signature

#include meeting

Guy Davidson lead an open discussion about #include, which is a diversity group for C++ programmers started by Guy and Kate Gregory (I’ve also been helping out a bit). It was a very positive discussion in which we shared stories, tried to grapple with some of the finer points of diversity and inclusion in the tech space, and came up with some aims to consider moving forward. Some of the key outcomes were:

  • We should be doing more outreach into schools and trying to encourage those in groups who tend to be discouraged from joining the tech industry.
  • Some people would greatly value a resource for finding companies which really value diversity rather than just listing it as important on their website.
  • Those of use who care about these issues can sometimes turn people away from the cause by not being sensitive to cultural, psychological or sociological issues; perhaps we need education on how to educate.

I thought this was a great session and look forward to helping develop this initiative and seeing where we can go with it. I’d encourage anyone who is interested to get on to the #include channel on the CppLang Slack, or to send pull requests to the information repository.

Juan Bolívar – The most valuable values

Juan gave a very energetic, engaging talk about value semantics and his Redux-/Elm-like C++ library, lager. He showed how these techniques can be used for writing clear, correct code without shared mutable state. I particularly like the time-travelling debugger which he presented, which allows you to step around the changes to your data model. I don’t want to spoil anything, but there’s a nice surprise in the talk which gets very meta.

Day 3

Lightning talks

I spent most of day three watching lightning talks. There were too many to discuss individually, so I’ll just mention my favourites.

Arvid Gerstmann – A very quick view into a compiler

Arvid gave a really clear, concise description of the different stages of compilation. Recommended if you want a byte-sized introduction to compilers.

Réka Nikolett Kovács – std::launder

It’s become a bit of a meme on the CppLang Slack that we don’t talk about std::launder, as it’s only needed if you’re writing standard-library-grade generic libraries. Still, I was impressed by this talk, which explains the problem well and shows how std::launder (mostly) solves it. I would have liked more of a disclaimer about who the feature is for, and that there are still unsolved problems, but it was still a good talk.

Andreas Weis – Type Punning Done Right

Strict aliasing violations are a common source of bugs if you’re writing low level code and aren’t aware of the rules. Andreas gave a good description of the problem, and showed how memcpy is the preferred way to solve it. I do wish he’d included an example of std::bit_cast, which was just voted in to C++20, but it’s a good lightning talk to send to your colleagues the next time they go type punning.

Miro Knejp – Is This Available?

Miro presented a method of avoiding awful #ifdef blocks by providing a template-based interface to everything you would otherwise preprocess. This is somewhat similar to what I talked about in my if constexpr blog post, and I do something similar in production for abstracting the differences in hardware features in a clean, composable manner. As such, it’s great to have a lightning talk to send to people when I want to explain the concept, and I’d recommend watching it if the description piques your interest. Now we just need a good name for it. Maybe the Template Abstracted Preprocessor (TAP) idiom? Send your thoughts to me and Miro!

Simon Brand (me!) – std::optional and the M word

Someone who was scheduled to do a lightning talk didn’t make the session, so I ended up giving an improvised talk about std::optional and monads. I didn’t have any slides, so described std::optional as a glass which could either be empty, or have a value (in this case represented by a pen). I got someone from the audience to act as a function which would give me a doily when I gave it a pen, and acted out functors and monads using that. It turned out very well, even if I forgot how to write Haskell’s bind operator (>>=) halfway through the talk!

Overall

I really enjoyed the selection of lightning talks. There was a very diverse range of topics, and everything flowed very smoothly without any technical problems. It would have been great if there were more speakers, since some people gave two or even three talks, but that means that more people need to submit! If you’re worried about talking, a lightning talk is a great way to practice; if things go terribly wrong, then the worse that can happen is that they move on to the next speaker. It would have also been more fun if it was on the main track, or if there wasn’t anything scheduled against them.

Secret Lightning Talks

Before the final keynote there was another selection of lightning talks given on the keynote stage. Again, I’ll just discuss my favourites.

Guy Davidson – Diversity and Inclusion

Guy gave a heartfelt talk about diversity in the C++ community, giving reasons why we should care, and what we can do to try and encourage involvement for those in under-represented groups in our communities. This is a topic which is also very important to me, so it was great to see it being given an important place in the conference.

Kate Gregory – Five things I learned when I should have been dying

An even more personal talk was given by Kate about important things she learned when she was given her cancer diagnosis. It was a very inspiring call to not care about the barriers which may perceive to be in the way of achieving something – like, say, going to talk at a conference – and instead just trying your best and “doing the work”. Her points really resonated with me; I, like many of us, have suffered from a lot of impostor syndrome in my short time in the industry, and talks like this help me to battle through it.

Phil Nash – A Composable Command Line Parser

Phil’s talk was on Clara, which is a simple, composable command line parser for C++. It was split off from his Catch test framework into its own library, and he’s been maintaining it independently of its parent. The talk gave a strong introduction to the library and why you might want to use it instead of one of the other hundreds of command line parsers which are out there. I’m a big fan of the design of Clara, so this was a pleasure to watch.

Keynote: Wouter van Ooijen – What can C++ offer embedded, what can embedded offer C++?

To close out the conference, Wouter gave a talk about the interaction between the worlds of C++ and embedded programming. The two have been getting more friendly in recent years thanks to libraries like Kvasir and talks by people like Dan Saks and Odin Holmes. Wouter started off talking about his work on space rockets, then motivated using C++ templates for embedded programming, showed us examples of how to do it, then finished off talking about what the C++ community should learn from embedded programming. If I’m honest, I didn’t enjoy this keynote as much as the others, as I already had an understanding of the techniques he showed, and I was unconvinced by the motivating examples, which seemed like they hadn’t been optimised properly (I had to leave early to catch a plane, so please someone tell me if he ended up clarifying them!). If you’re new to using C++ for embedded programming, this may be a good introduction.

Closing

That’s it for my report. I want to thank Jens for organising such a great conference, as well as his staff and volunteers for making sure everything ran smoothly. Also thanks to the other speakers for being so welcoming to new blood, and all the other attendees for their discussions and questions. I would whole-heartedly recommend attending the conference and submitting talks, especially if you haven’t done so before. Last but not least, a big thanks to my employer, Codeplay for sending me (we’re hiring, you get to work on compilers and brand new hardware and go to conferences, it’s pretty great).

The World’s First Distributed C++ Meet-up (*)

(* Probably)

Last week my London based C++ user group, C++ London, joined forces with SwedenCpp, based in Stockholm, for a distributed event where we shared video streams with each other. The whole thing was hosted by King, who took care of the audio-visual link up, so we just had to organise the speakers.

We did this as a series of lightning talks (5 or 10 minutes each) - 5 in Stockholm and 7 in London.

This was an experiment. The idea was that, if successful, we could do more like this. I know for a fact that other user groups have been waiting to hear how it went as they think about doing something similar too.

So how did it go? Well, you can see for yourself. The video is here (if you view it on YouTube you'll also find a Table Of Contents to jump to individual talks):

Overall I think it went very well. There were certainly some rough edges, and opportunities to do better next time - but the basic concept worked well, I think. Feedback from attendees in both locations also bore that out. Everyone is keen to do this again!

Survey results

Paul Dreik, in Stockholm, sent out a survey to all attendees after the meeting and collected some stats and comments. Regarding the overall event, over 80% said "Great, let's do it again!". Of the rest, all but one said it was at least as good as a normal meet-up. Only one person thought it was a step down (you can't please everyone).

Interestingly, while lightning talks, as a format, was popular (over 50% thought at least most of the time should be spent on them), a sizeable 40% thought more time should be spent on longer talks. So maybe some combination could work well?

Beyond that there were a couple of people who said that it was too long without a break, and some people wanted time for questions as we go (which is hard to make work for lightning talks).

Room for improvement?

So, some of the things that could be improved (and notes for other groups looking to try):

We could have planned more interaction between the locations. We were very much focused on our own schedules and, other than the handover in the middle, there was not much beyond two independent sites that happened to be watching each other. There was an "open questions" section at the end. Jean Guegant, in Sweden, had mentioned that we'd do this in the intro but I'd missed it - and I think others had too, so we weren't really prepared for it. Between that and everyone forgetting any questions they'd had during the talks meant we didn't get many questions. I think we can do better here - perhaps collecting questions during the talks via a web app, or maybe a Twitter hashtag?

Are there other points during the course of the event that we could interact more between the sites? It can be a tough balance, but I think there is scope to experiment here.

And finally, while we're very grateful to King, and their AV staff, for providing, and setting up, the live stream, as well as recording, I think we made too many assumptions about how that was going to work. In particular we were left with an audio recording that was not as good as we had hoped for. In the London audio, especially, the hand mic and ambient mics were mixed together before recording - so we couldn't separate them out - which would have been very useful in the editing.

So. Would we do it again? Yes, definitely! I think this is a great way to expand the reach of our speakers, and give our members more variety of speakers and topics to listen to. In that respect this was a great success, and I'm excited to see what happens next.

Catch2 released

Super Catch

I've been talking about Catch2 for a while - but now it's finally here! The big news for Catch2 is that it drops all support for pre-C++11 compilers. Other than meaning that some users will not be supported (you can still use Catch "Classic" (1.x) - which will get some bug fix updates for a while, at least) that's mostly an internal change - however it enables a number of user-facing changes, both now and in the future. Let's take a look at what they are.

New and shiny

New, composable, command line processor

Clara is the command line parser in Catch. In Catch 1.0 I spun it out into its own library (but it's still embedded in the Catch single header). Like Catch 1.0 itself, Clara was constrained to C++98 compatibility. For Catch2 I've rewritten Clara from the ground up, not only to fully embrace C++11, but also to be composable. What that means here is that each individual command line option or argument can be represented using its own, self-contained, parser. A composite parser of all the options is then assembled, or composed, from those smaller parsers.

The main advantage of this approach is that the set of available options is now trivially extendible outside of Catch, so users can easily specify command line options that can tune their test code.

See my lightning talk at CppCon this year for a bit more

Commas in assertions

As Catch assertions are implemented using macros, it was susceptible to the old problem of how macros interpret commas within macro arguments. Commas may occur in contexts that macros don't know about, such as within angle brackets (e.g. for template instantiations) - and so get interpreted as argument separators for the macro itself.

Now that we can rely on C++11, which includes variadic macros, we can make the assertions variadic, and just reassemble all the arguments again inside. That means we can now write code like the following:

  REQUIRE( getPair() == std::pair<true, "banana">() );

Microbenchmarking (experimental)

Catch2 gains initial support for micro-benchmarking. This is where small pieces of code are timed, usually in a loop so they are repeated enough times to be significant compared to the system clock accuracy. Some extra adjustments need to be made to allow for other sources of jitter and slowdown on the host machine - and, even then, multiple samples should be taken so they can be subject to statistical analysis.

There are many shortcomings with micro-benchmarks - not least that the performance of a piece of code in isolation can often be drastically different to how well it performs in conjunction with other code. This is not only due to the way the compiler may inline or otherwise optimise code together, but even on the CPU instructions can be reordered, pipelined or run in parallel - and with cache levels and branch prediction, the relationship between these things becomes hugely unpredictable.

Nonetheless they can still be useful - and it can be convenient to use the same test framework that you use to write functional tests - not least because there is much shared infrastructure.

Catch's benchmarking support is incomplete at time of this writing, lacking the multi-sampling, statistical analysis and richer reporting that fully-fledged frameworks offer. The intention is to grow this, but only if it can be done without any significant impact to non-benchmarking tests. In lieu of full documentation, see the example tests for now.

Performance

Both runtime and compile time performance are becoming increasingly important for Catch, and a lot of work is going into improving both. Runtime performance was a non-goal initially, so there has been plenty of low-hanging fruit. As a result we're already seeing some significant improvements, and there is more to come.

Compile time is harder. This has always been important, but as Catch has grown over the years, it has begun to suffer. Improving it significantly means making some trade-offs. So far some features that drag compile times have been made configurable - e.g. whether breaking into the debugger on a failed assertion happens in the code that caused it (meaning the debugger code gets compiled into every assertion macro) or one level up the stack (so can be "hidden" in a function). Other areas to look at are whether to use (non-standard, potentially brittle) forward declarations of some standard library types. Again, this is an ongoing area of active development - but much is already in Catch2 at launch.

See some of the toggle macros for more details

A new name

Believe it or not "Catch2" is now the name - it's not just a version reference! In fact the current intention is that, even when we move to v3.x it will still be called Catch2. E.g. Catch2 v3.0. Why? Well there have been calls for a name change - for searchability reasons. Catch is obviously a common keyword in C++, and also in unit testing. So getting search terms sufficiently narrow has been tricky. But I didn't want to use an entirely different name (although I did toy with the idea of "Catfish" for a while) - because that would lose too much of the momentum behind Catch overall. A derivative name doesn't fully solve the problem, because people will still refer to it as Catch, casually - but at least it gives a slight advantage. So that's what I've gone with. There are also a few interesting numerological aspects to it. It stops short of being Catch22 - but if you consider the C++11 requirement you could multiply them to get 22. And you can add the digits in C++11 to get 2.

Upcoming features

It's always dangerous to talk about what's planned - and I've fallen into this trap with Catch before. So there are some feature promises that have been outstanding for a long time now. In fact most of those have been deferred to Catch2 for quite some time - either because C++11 has features that make them much easier/ possible to implement (e.g. threading) - or just because they involved a lot of code that gets less noisy in C++11. So we'll talk about those again here.

Threading

This was unfeasible in Catch 1.x due not having C++11 threading primitives, or being able to use external dependencies like Boost for threading. To provide a basic level of support should now be fairly straightforward as a lot of groundwork has been laid (e.g. how singletons are organised).

The idea is that if, within a test case, you use additional threads, you should be able to make assertions from those threads - as long as the test case is still in scope at the time. The aim is for this to be done without locks in the assertions. Running multiple test cases in parallel is not immediately planned (and may be best implemented at the process level, anyway).

Generators/ Property Based Testing

Generators give you what other frameworks might call (Data-)Parameterised Tests - i.e. being able to use the same test code with different inputs. An experimental version of generators was included in Catch from very early on. Other than not being complete and having some limitations it also had a serious issue in that it didn't work at all with Sections! This is because both features relied on the ability to re-enter test cases - but they were independent of each other. I rewrote the test-case tracking code a couple of years ago now to be able to support this properly - and had a proof-of-concept new implementation of Generators working with it - enough to give a demo at a talk I gave. However the implementation was getting noisy with C++98 syntax so I deferred work on it for Catch2. Now that Catch2 is released I'll be looking at this again. Closely related to Generators - in fact it builds on it - is the idea of Property Based Testing. The proof-of-concept I mentioned actually had an initial version of this, too. There's more work involved here to getting it right, but having Generators is a first step.

Breaking Changes

As a major version change we've taken advantage of the permission that Semantic Versioning gives us to introduce a few breaking changes. These should have little, if any, impact on most users - but it's worth checking these before making the move to be sure you're ready.

toString() has been removed

This is probably the biggest change, and the most likely to affect people. For a long time there have been three ways to tell Catch how to convert values into strings for reporting purposes. In order, the pipeline was like this:
  1. toString() overload
  2. StringMaker<> specialisation
  3. ostream& operator << overload
  4. give up and use {?}

If your types already have << overloads for ostream then you're good. If not then, in theory, overloading toString() was the simplest option.

However toString() had a number of limitations - mostly due to the point of template instantiation. Compiler differences with two-phase lookup, and other factors which are implementation defined, mean that toString() overloads were unreliable and caused a lot of confusion - hardly the simplest option after all!

Specialising StringMaker<> is slightly more work, but is more reliable, stable, and flexible. So this is now the recommended way to provide string conversion functionality for your types. In Catch2, toString() has been completely removed!

If you have code that calls toString() there is a new function that plays that role: Catch::detail::stringify(). However, note that (a) this should never be overloaded - it just wraps the call into the pipeline that starts with StringMaker<> and (b) the detail part of the namespace should be a clue that this is really an internal part of Catch and is subject to change.

To specialise StringMaker<> see the documentation.

Other removals and changes

As well as C++98 support and toString(), a number of deprecated features and interfaces have been removed, as well as a few other tweaks and changes that may impact some code-bases. See the "Breaking Changes" section in the release notes for the full list. In fact the release notes in general give a good overview of all the many small changes and improvements that have gone into Catch2 that have not been mentioned here.

A new home

The Catch(2) repository has moved! You may not have noticed as it has been transferred in GitHub, and that means GitHub maintains redirects for all the old links. However they do recommend updating your own urls, in bookmarks, direct download links and, of course, git remotes. We've made this move for two reasons:

1. As Catch has grown it has become more of a community effort. It already has an additional lead maintainer in Martin Hořeňovský, and others may be added. But as the sole owner of my own personal GitHub account there are some things that only I could do (webhooks and other integrations, for example). So as not to be a bottleneck we've created a GitHub "Organization" account, CatchOrg, which allows multiple admin users. That's where we've moved it to.

2. For Catch2 to get any traction as a new name it was important for it to be reflected in the repo name, so we've taken advantage of the move to change the repo name, too. Catch "Classic" (1.x) has also moved here, but is now on a branch. If you cannot move to Catch2 for C++98 compatibility reasons you can stay on Catch Classic on this branch. It will continue to receive critical fixes, at least for now, but is no longer the active development branch. Please try to move to Catch2 as soon as possible.

If you notice anything broken as a result of this move, please let us know so we can fix it.

Thanks!

As always, a huge thanks to all who have supported and contributed to Catch and Catch2 - especially for your patience when I wasn't getting to issues and PRs as quickly as was needed!

An extra thanks to Martin, who has been doing the majority of the work on Catch this year!

Mutable

The mutable keyword seems to be one of the less known corners of C++. Yet it can be very useful, or even unavoidable if you want to write const-correct code […]

The post Mutable appeared first on .

Detection Idiom – A Stopgap for Concepts

Concepts is a proposed C++ feature which allows succinct, expressive, and powerful constraining of templates. They have been threatening to get in to C++ for a long time now, with the first proposal being rejected for C++11. They were finally merged in to C++20 a few months ago, which means we need to hold on for another few years before they’re in the official standard rather than a Technical Specification. In the mean time, there have been various attempts to implement parts of concepts as a library so that we can have access to some of the power of Concepts without waiting for the new standard. The detection idiom, which is part of the Library Fundamentals 2 Technical Specification, is one such solution. This post will outline the utility of it, and show the techniques which underlie its implementation.


A problem

We are developing a generic library. At some point on our library we need to calculate the foo factor for whatever types the user passes in.

template <class T>
int calculate_foo_factor (const T& t);

Some types will have specialized implementations of this function, but for types which don’t we’ll need some generic calculation.

struct has_special_support {
  int get_foo() const;
};

struct will_need_generic_calculation {
  // no get_foo member function
};

Using concepts we could write calculate_foo_factor like so:

template <class T>
concept bool SupportsFoo = requires (T t) {
  { t.get_foo() } -> int;
};

template <SupportsFoo T>
int calculate_foo_factor (const T& t) {
    return t.get_foo();
}

template <class T>
int calculate_foo_factor (const T& t) {
    // insert generic calculation here
    return 42;
}

This is quite succinct and clear: SupportsFoo is a concept which checks that we can call get_foo on t with no arguments, and that the type of that expression is int. The first calculate_foo_factor will be selected by overload resolution for types which satisfy the SupportsFoo concept, and the second will be chosen for those which don’t.

Unfortunately, our library has to support C++14. We’ll need to try something different. I’ll demonstrate a bunch of possible solutions to this problem in the next section. Some of them may seem complex if you aren’t familiar with the metaprogramming techniques used, but for now, just note the differences in complexity and abstraction between them. The metaprogramming tricks will all be explained in the following section.

Solutions

Here’s a possible solution using expression SFINAE:

namespace detail {
  template <class T>
  auto calculate_foo_factor (const T& t, int)
    -> decltype(std::declval<T>().get_foo()) {
    return t.get_foo();
  }

  template <class T>
  int calculate_foo_factor (const T& t, ...) {
    // insert generic calculation here
    return 42;
  }
}

template <class T>
int calculate_foo_factor (const T& t) {
  return detail::calculate_foo_factor(t, 0);
}

Well, it works, but it’s not exactly clear. What’s the int and ... there for? What’s std::declval? Why do we need an extra overload? The answers are not the important part here; what is important is that unless you’ve got a reasonable amount of metaprogramming experience, it’s unlikely you’ll be able to write this code offhand, or even copy-paste it without error.

The code could be improved by abstracting out the check for the presence of the member function into its own metafunction:

template <class... Ts>
using void_t = void;

template <class T, class=void>
struct supports_foo : std::false_type{};

template <class T>
struct supports_foo<T, void_t<decltype(std::declval<T>().get_foo())>>
: std::true_type{};

Again, some more expert-only template trickery which I’ll explain later. Using this trait, we can use std::enable_if to enable and disable the overloads as required:

template <class T, std::enable_if_t<supports_foo<T>::value>* = nullptr>
auto calculate_foo_factor (const T& t) {
  return t.get_foo();
}

template <class T, std::enable_if_t<!supports_foo<T>::value>* = nullptr>
int calculate_foo_factor (const T& t) {
  // insert generic calculation here
  return 42;
}

This works, but you’d need to understand how to implement supports_foo, and you’d need to repeat all the boilerplate again if you needed to write a supports_bar trait. It would be better if we could completely abstract away the mechanism for detecting the member function and just focus on what we want to detect. This is what the detection idiom provides.

template <class T>
using foo_t = decltype(std::declval<T>().get_foo());

template <class T>
using supports_foo = is_detected<foo_t,T>;

is_detected here is part of the detection idiom. Read it as “is it valid to instantiate foo_t with T?” std::declval<T>() essentially means “pretend that I have a value of type T” (more on this later). This still requires some metaprogramming magic, but it’s a whole lot simpler than the previous examples. With this technique we can also easily check for the validity of arbitrary expressions:

template <class T, class U>
using equality_t = decltype(std::declval<T>() == std::declval<U>());

template <class T, class U>
using supports_equality = is_detected<equality_t, T, U>;

struct foo{};
struct bar{};
struct baz{};

bool operator== (foo, bar);

static_assert( supports_equality<foo,bar>::value, "wat");
static_assert(!supports_equality<foo,baz>::value, "wat");

We can also compose traits using std::conjunction:

template <class T>
using is_regular = std::conjunction<std::is_default_constructible<T>,
                                    std::is_copy_constructible<T>,
                                    supports_equality<T,T>,
                                    supports_inequality<T,T>, //assume impl
                                    supports_less_than<T,T>>; //ditto

If you want to use is_detected today, then you can check if your standard library supports std::experimental::is_detected. If not, you can use the implementation from cppreference or the one which we will go on to write in the next section. If you aren’t interested in how this is written, then turn back, for here be metaprogramming dragons.


Metaprogramming demystified

I’ll now work backwards through the metaprogramming techniques used, leading up to implementing is_detected.

Type traits and _v and _t suffixes

A type trait is some template which can be used to get information about characteristics of a type. For example, you can find out if some type is an arithmetic type using std::is_arithmetic:

template <class T>
void foo(T t) {
     static_assert(std::is_arithmetic<T>::value, "Argument must be of an arithmetic type");
}

Type traits either “return” types with a ::type member alias, or values with a ::value alias. _t and _v suffixes are shorthand for these respectively. So std::is_arithmetic_v<T> is the same as std::is_arithmetic<T>::value, std::add_pointer_t<T> is the same as typename std::add_pointer<T>::type1.

decltype specifiers

decltype gives you access to the type of an entity or expression. For example, with int i;, decltype(i) is int.

std::declval trickery

std::declval is a template function which helps create values inside decltype specifiers. std::declval<foo>() essentially means “pretend I have some value of type foo”. This is needed because the types you want to inspect inside decltype specifiers may not be default-constructible. For example:

struct foo {
    foo() = delete;
    foo(int);
    void do_something();
};

decltype(foo{}.do_something())               // invalid
decltype(std::declval<foo>().do_something()) // fine

SFINAE and std::enable_if

SFINAE stands for Substitution Failure Is Not An Error. Due to this rule, some constructs which are usually hard compiler errors just cause a function template to be ignored by overload resolution. std::enable_if is a way to gain easy access to this. It takes a boolean template argument and either contains or does not contain a ::type member alias depending on the value of that boolean. This allows you to do things like this:

template <class T>
std::enable_if_t<std::is_integral_v<T>> foo(T t);

template <class T>
std::enable_if_t<std::is_floating_point_v<T>> foo(T t);

If T is integral, then the second overload will be SFINAEd out, so the first is the only candidate. If T is floating point, then the reverse is true.

Expression SFINAE

Expression SFINAE is a special form of SFINAE which applies to arbitrary expressions. This is what allowed this example from above to work:

namespace detail {
  template <class T>
  auto calculate_foo_factor (const T& t, int)
    -> decltype(std::declval<T>().to_foo()) {
    return t.get_foo();
  }

  template <class T>
  int calculate_foo_factor (const T& t, ...) {
    // insert generic calculation here
    return 42;
  }
}

int calculate_foo_factor (const T& t) {
  return calculate_foo_factor(t, 0);
}

The first overload will be SFINAEd out if calling to_foo on an instance of T is invalid. The difficulty here is that both overloads will be valid if the too_foo call is valid. For this reason, we add some dummy parameters to the overloads (int and ...) to specify an order for overload resolution to follow2.

void_t magic

void_t is a C++17 feature (although it’s implementable in C++11) which makes writing traits and using expression SFINAE a bit easier. The implementation is deceptively simple3:

template <class... Ts>
using void_t = void;

You can see this used in this example which we used above:

template <class T, class=void>
struct supports_foo : std::false_type{};

template <class T>
struct supports_foo<T, void_t<decltype(std::declval<T>().get_foo())>>
: std::true_type{};

The relevant parts are the class=void and void_t<...>. If the expression inside void_t is invalid, then that specialization of supports_foo will be SFINAEd out, so the primary template will be used. Otherwise, the whole expression will be evaluated to void due to the void_t and this will match the =void default argument to the template. This gives a pretty succinct way to check arbitrary expressions.

That covers all of the ingredients we need to implement is_detected.

The implementation

For sake of simplicity I’ll just implement is_detected rather than all of the entities which the Lib Fundamentals 2 TS provides.

namespace detail {
  template <template <class...> class Trait, class Enabler, class... Args>
  struct is_detected : std::false_type{};

  template <template <class...> class Trait, class... Args>
  struct is_detected<Trait, void_t<Trait<Args...>>, Args...> : std::true_type{};
}

template <template <class...> class Trait, class... Args>
using is_detected = typename detail::is_detected<Trait, void, Args...>::type;

Let’s start with that last alias template. We take a variadic template template parameter (yes really) called Trait and any number of other type template arguments. We forward these to our detail::is_detected helper with an extra void argument which will act as the class=void construct from the previous section on void_t4. We then have a primary template which will “return” false as the result of the trait. The magic then happens in the following partial specialization. Trait<Args...>> is evaluated inside void_t. If instantiating Traits with Args... is invalid, then the partial specialization will be SFINAEd out, and if it’s valid, it’ll evaluate to void due to the void_t. This successfully abstracts away the implementation of the detection and allows us to focus on what we want to detect.


That covers it for the detection idiom. This is a handy utility for clearing up some hairy metaprogramming, and is even more useful since it can be implemented in old versions of the standard. Let me know if there are any other parts of the code you’d like explained down in the comments or on Twitter.


  1. See here for information on why the typename keyword is needed in some places. 

  2. Have a look here for a better way to do this. 

  3. Due to standards defect CWG1558, implementations needed an extra level of abstraction to work on some compilers. 

  4. The reason we can’t use the same trick is that parameter packs must appear at the end of the template parameter list, so we need to put the void in the middle. 

Learning C++

C++ is one of the most powerful programming languages in the world. However, it can also be one of the most daunting for beginners. There is so much to learn, so many corner cases, so many little languages within the language itself, that mastering it can seem an insurmountable task. Fortunately, it’s not quite as impossible as it looks. This post outlines some resources and advice for learning C++ today. Some of these also will be useful for those who have used C++ in the past and want to get familiar with modern C++, which is like an entirely different language.

If you want to learn C++ from scratch, don’t rely on random online tutorials. Get a good book. One of these. I read the entirety of Accelerated C++ the day before the interview for my current job, and it was enough to get stuck in to real C++ development. It’s a bit outdated now, (i.e. it doesn’t cover C++11/14/17) but is easily the most important book in my career development. The Effective C++ series by Scott Meyers should be mandatory reading for C++ developers, particularly Effective Modern C++, which is the go-to introduction to C++11 and 14. If you prefer online resources, use reputable ones like Kate Gregory’s Pluralsight courses rather than whatever Google throws out. There are a lot of terrible C++ tutorials out there which will only harm you. A good rule of thumb: if you see #include <iostream.h>, run1.

However, no matter how much you read, you need to practice in order to improve. I find it helpful to have a mix of small, self-contained problems to try out new techniques, and larger applications to make sure you can use those ideas in a real-world context. Kind of like unit tests and integration tests for your programming skills; can you use these ideas in isolation, and can you fit them together into something comprehensible and maintainable.

Stack Overflow is a great source of skill unit tests. I learned most of my initial metaprogramming knowledge by answering Stack Overflow questions incorrectly and having people tell me why I was wrong. Unfortunately it can be an unforgiving community, and isn’t friendly to very simple questions. Be sure to read through the rules thoroughly before contributing, and feel free to send me an answer you were going to post to get feedback on it if you’re worried!

For skill integration tests, I’d recommend coming up with an application in a domain you’re interested in and trying to develop it from start to finish. If you haven’t done much of this kind of thing before, it can be a daunting task. The main piece of advice I have on this is to ensure your application has a strict scope so that you always know what you’re aiming towards and when you’re finished. Set a number of checkpoints or milestones so that you have explicit tasks to work on and don’t get lost. Some ideas for fun projects:

  • A compiler for a simple C subset
  • A mini text editor
  • Implement some simplified standard library classes, like std::vector
  • A clone of a simple video game like Snake
  • An ELF file manipulation tool

One of the most valuable knowledge sources is a good C++ community to be part of. Find out if your city or a nearby one has a User Group and go along to meet other developers. They might even have a mentorship program or advice channels for people learning the language. Definitely get on to the C++ Slack Channel and lurk, ask questions, join in discussions. Many of the foremost C++ experts in the world hang out there, and it’s also a very friendly and welcoming environment to those learning. There are even dedicated subchannels for those learning or having problems debugging C++.

Don’t be discouraged when you find things difficult, and, trust me, you will. C++ is hard, and even the experts get things wrong, or can’t even agree on what’s right.

If you do get good, pass on the knowledge. Start a blog, give advice, mentor those who need it, talk at conferences. You do not need to be an expert to do these things, and we need more speakers and writers from a diverse set of experience levels, as well as in background, race, gender, sexuality. The major C++ conferences all have codes of conduct and organisers who care about these issues. Many of them also have student or accessibility tickets available for those who need further assistance to attend. If in doubt, just ask the organisers, or drop me a message and I can get in touch with the relevant people.

Finally, find some good, regular blogs to learn from. Here are some I would recommend:

I hope that this post helps you in your journey to learn C++. It’s a long, but very rewarding path, with a lot of wonderful people to meet, work with, and debate metaprogramming techniques with along the way.


  1. iostream.h rather than just iostream the name for the header before C++ was standardised. People are still writing tutorials which teach using this form and ancient compilers. 

void C::C::C::A::A::A::foo() – valid syntax monstrosity

Here’s an odd bit of C++ syntax for you. Say we have the following class hierarchy:

class A {
public:
    virtual void foo() = 0;
};

class B {
public:
    virtual void foo() = 0;
};

class C : public A, public B {
public:
    virtual void foo();
};

The following definitions are all well-formed:

void C::foo(){
  std::cout << "C";
}
void C::A::foo(){
  std::cout << "A";
}
void C::B::foo(){
  std::cout << "B";
}

The first one defines C::foo, the second defines A::foo and the third defines B::foo. This is valid because of an entity known as the injected-type-name:

A class-name is inserted into the scope in which it is declared immediately after the class-name is seen. The class-name is also inserted into the scope of the class itself; this is known as the injected-class-name. For purposes of access checking, the injected-class-name is treated as if it were a public member name. […]

Since A is a base class of C and the name A is injected into the scope of A, A is also visible from C.

The injected-class-name exists to ensure that the class is found during name lookup instead of entities in an enclosing scope. It also makes referring to the class name in a template instantiation easier. But since we’re awful people, we can use this perfectly reasonable feature do horribly perverse things like this:

void C::C::C::A::A::A::foo(){
    std::cout << "A";
}

Yeah, don’t do that.

Writing a Linux Debugger Part 10: Advanced topics

We’re finally here at the last post of the series! This time I’ll be giving a high-level overview of some more advanced concepts in debugging: remote debugging, shared library support, expression evaluation, and multi-threaded support. These ideas are more complex to implement, so I won’t walk through how to do so in detail, but I’m happy to answer questions about these concepts if you have any.


Series index

  1. Setup
  2. Breakpoints
  3. Registers and memory
  4. Elves and dwarves
  5. Source and signals
  6. Source-level stepping
  7. Source-level breakpoints
  8. Stack unwinding
  9. Handling variables
  10. Advanced topics

Remote debugging

Remote debugging is very useful for embedded systems or debugging the effects of environment differences. It also sets a nice divide between the high-level debugger operations and the interaction with the operating system and hardware. In fact, debuggers like GDB and LLDB operate as remote debuggers even when debugging local programs. The general architecture is this:

debugarch

The debugger is the component which we interact with through the command line. Maybe if you’re using an IDE there’ll be another layer on top which communicates with the debugger through the machine interface. On the target machine (which may be the same as the host) there will be a debug stub, which in theory is a very small wrapper around the OS debug library which carries out all of your low-level debugging tasks like setting breakpoints on addresses. I say “in theory” because stubs are getting larger and larger these days. The LLDB debug stub on my machine is 7.6MB, for example. The debug stub communicates with the debugee process using some OS-specific features (in our case, ptrace), and with the debugger though some remote protocol.

The most common remote protocol for debugging is the GDB remote protocol. This is a text-based packet format for communicating commands and information between the debugger and debug stub. I won’t go into detail about it, but you can read all you could want to know about it here. If you launch LLDB and execute the command log enable gdb-remote packets then you’ll get a trace of all packets sent through the remote protocol. On GDB you can write set remotelogfile <file> to do the same.

As a simple example, here’s the packet to set a breakpoint:

$Z0,400570,1#43

$ marks the start of the packet. Z0 is the command to insert a memory breakpoint. 400570 and 1 are the argumets, where the former is the address to set a breakpoint on and the latter is a target-specific breakpoint kind specifier. Finally, the #43 is a checksum to ensure that there was no data corruption.

The GDB remote protocol is very easy to extend for custom packets, which is very useful for implementing platform- or language-specific functionality.


Shared library and dynamic loading support

The debugger needs to know what shared libraries have been loaded by the debuggee so that it can set breakpoints, get source-level information and symbols, etc. As well as finding libraries which have been dynamically linked against, the debugger must track libraries which are loaded at runtime through dlopen. To facilitate this, the dynamic linker maintains a rendezvous structure. This structure maintains a linked list of shared library descriptors, along with a pointer to a function which is called whenever the linked list is updated. This structure is stored where the .dynamic section of the ELF file is loaded, and is initialized before program execution.

A simple tracing algorithm is this:

  • The tracer looks up the entry point of the program in the ELF header (or it could use the auxillary vector stored in /proc/<pid>/aux)
  • The tracer places a breakpoint on the entry point of the program and begins execution.
  • When the breakpoint is hit, the address of the rendezvous structure is found by looking up the load address of .dynamic in the ELF file.
  • The rendezvous structure is examined to get the list of currently loaded libraries.
  • A breakpoint is set on the linker update function.
  • Whenever the breakpoint is hit, the list is updated.
  • The tracer infinitely loops, continuing the program and waiting for a signal until the tracee signals that it has exited.

I’ve written a small demonstration of these concepts, which you can find here. I can do a more detailed write up of this in the future if anyone is interested.


Expression evaluation

Expression evaluation is a feature which lets users evaluate expressions in the original source language while debugging their application. For example, in LLDB or GDB you could execute print foo() to call the foo function and print the result.

Depending on how complex the expression is, there are a few different ways of evaluating it. If the expression is a simple identifier, then the debugger can look at the debug information, locate the variable and print out the value, just like we did in the last part of this series. If the expression is a bit more complex, then it may be possible to compile the code to an intermediate representation (IR) and interpret that to get the result. For example, for some expressions LLDB will use Clang to compile the expression to LLVM IR and interpret that. If the expression is even more complex, or requires calling some function, then the code might need to be JITted to the target and executed in the address space of the debuggee. This involves calling mmap to allocate some executable memory, then the compiled code is copied to this block and is executed. LLDB does this by using LLVM’s JIT functionality.

If you want to know more about JIT compilation, I’d highly recommend Eli Bendersky’s posts on the subject.


Multi-threaded debugging support

The debugger shown in this series only supports single threaded applications, but to debug most real-world applications, multi-threaded support is highly desirable. The simplest way to support this is to trace thread creation and parse the procfs to get the information you want.

The Linux threading library is called pthreads. When pthread_create is called, the library creates a new thread using the clone syscall, and we can trace this syscall with ptrace (assuming your kernel is older than 2.5.46). To do this, you’ll need to set some ptrace options after attaching to the debuggee:

ptrace(PTRACE_SETOPTIONS, m_pid, nullptr, PTRACE_O_TRACECLONE);

Now when clone is called, the process will be signaled with our old friend SIGTRAP. For the debugger in this series, you can add a case to handle_sigtrap which can handle the creation of the new thread:

case (SIGTRAP | (PTRACE_EVENT_CLONE << 8)):
    //get the new thread ID
    unsigned long event_message = 0;
    ptrace(PTRACE_GETEVENTMSG, pid, nullptr, message);

    //handle creation
    //...

Once you’ve got that, you can look in /proc/<pid>/task/ and read the memory maps and suchlike to get all the information you need.

GDB uses libthread_db, which provides a bunch of helper functions so that you don’t need to do all the parsing and processing yourself. Setting up this library is pretty weird and I won’t show how it works here, but you can go and read this tutorial if you’d like to use it.

The most complex part of multithreaded support is modelling the thread state in the debugger, particularly if you want to support non-stop mode or some kind of heterogeneous debugging where you have more than just a CPU involved in your computation.


The end!

Whew! This series took a long time to write, but I learned a lot in the process and I hope it was helpful. Get in touch on Twitter @TartanLlama or in the comments section if you want to chat about debugging or have any questions about the series. If there are any other debugging topics you’d like to see covered then let me know and I might do a bonus post.

Writing a Linux Debugger Part 9: Handling variables

Variables are sneaky. At one moment they’ll be happily sitting in registers, but as soon as you turn your head they’re spilled to the stack. Maybe the compiler completely throws them out of the window for the sake of optimization. Regardless of how often variables move around in memory, we need some way to track and manipulate them in our debugger. This post will teach you more about handling variables in your debugger and demonstrate a simple implementation using libelfin.


Series index

  1. Setup
  2. Breakpoints
  3. Registers and memory
  4. Elves and dwarves
  5. Source and signals
  6. Source-level stepping
  7. Source-level breakpoints
  8. Stack unwinding
  9. Handling variables
  10. Advanced topics

Before you get started, make sure that the version of libelfin you are using is the fbreg branch of my fork. This contains some hacks to support getting the base of the current stack frame and evaluating location lists, neither of which are supported by vanilla libelfin. You might need to pass -gdwarf-2 to GCC to get it to generate compatible DWARF information. But before we get into the implementation, I’ll give a more detailed description of how locations are encoded in DWARF 5, which is the most recent specification. If you want more information than what I write here, then you can grab the standard from here.

DWARF locations

The location of a variable in memory at a given moment is encoded in the DWARF information using the DW_AT_location attribute. Location descriptions can be either single location descriptions, composite location descriptions, or location lists.

  • Simple location descriptions describe the location of one contiguous piece (usually all) of an object. A simple location description may describe a location in addressable memory, or in a register, or the lack of a location (with or without a known value).
    • Example:
      • DW_OP_fbreg -32
      • A variable which is entirely stored -32 bytes from the stack frame base
  • Composite location descriptions describe an object in terms of pieces, each of which may be contained in part of a register or stored in a memory location unrelated to other pieces.
    • Example:
      • DW_OP_reg3 DW_OP_piece 4 DW_OP_reg10 DW_OP_piece 2
      • A variable whose first four bytes reside in register 3 and whose next two bytes reside in register 10.
  • Location lists describe objects which have a limited lifetime or change location during their lifetime.
    • Example:
      • <loclist with 3 entries follows>
        • [ 0]<lowpc=0x2e00><highpc=0x2e19>DW_OP_reg0
        • [ 1]<lowpc=0x2e19><highpc=0x2e3f>DW_OP_reg3
        • [ 2]<lowpc=0x2ec4><highpc=0x2ec7>DW_OP_reg2
      • A variable whose location moves between registers depending on the current value of the program counter

The DW_AT_location is encoded in one of three different ways, depending on the kind of location description. exprlocs encode simple and composite location descriptions. They consist of a byte length followed by a DWARF expression or location description. loclists and loclistptrs encode location lists. They give indexes or offsets into the .debug_loclists section, which describes the actual location lists.

DWARF Expressions

The actual location of the variables is computed using DWARF expressions. These consist of a series of operations which operate on a stack of values. There are an impressive number of DWARF operations available, so I won’t explain them all in detail. Instead I’ll give a few examples from each class of expression to give you a taste of what is available. Also, don’t get scared off by these; libelfin will take care off all of this complexity for us.

  • Literal encodings
    • DW_OP_lit0, DW_OP_lit1, …, DW_OP_lit31
      • Push the literal value on to the stack
    • DW_OP_addr <addr>
      • Pushes the address operand on to the stack
    • DW_OP_constu <unsigned>
      • Pushes the unsigned value on to the stack
  • Register values
    • DW_OP_fbreg <offset>
      • Pushes the value found at the base of the stack frame, offset by the given value
    • DW_OP_breg0, DW_OP_breg1, …, DW_OP_breg31 <offset>
      • Pushes the contents of the given register plus the given offset to the stack
  • Stack operations
    • DW_OP_dup
      • Duplicate the value at the top of the stack
    • DW_OP_deref
      • Treats the top of the stack as a memory address, and replaces it with the contents of that address
  • Arithmetic and logical operations
    • DW_OP_and
      • Pops the top two values from the stack and pushes back the logical AND of them
    • DW_OP_plus
      • Same as DW_OP_and, but adds the values
  • Control flow operations
    • DW_OP_le, DW_OP_eq, DW_OP_gt, etc.
      • Pops the top two values, compares them, and pushes 1 if the condition is true and 0 otherwise
    • DW_OP_bra <offset>
      • Conditional branch: if the top of the stack is not 0, skips back or forward in the expression by offset
  • Type conversions
    • DW_OP_convert <DIE offset>
      • Converts value on the top of the stack to a different type, which is described by the DWARF information entry at the given offset
  • Special operations
    • DW_OP_nop
      • Do nothing!

DWARF types

DWARF’s representation of types needs to be strong enough to give debugger users useful variable representations. Users most often want to be able to debug at the level of their application rather than at the level of their machine, and they need a good idea of what their variables are doing to achieve that.

DWARF types are encoded in DIEs along with the majority of the other debug information. They can have attributes to indicate their name, encoding, size, endianness, etc. A myriad of type tags are available to express pointers, arrays, structures, typedefs, anything else you could see in a C or C++ program.

Take this simple structure as an example:

struct test{
    int i;
    float j;
    int k[42];
    test* next;
};

The parent DIE for this struct is this:

< 1><0x0000002a>    DW_TAG_structure_type
                      DW_AT_name                  "test"
                      DW_AT_byte_size             0x000000b8
                      DW_AT_decl_file             0x00000001 test.cpp
                      DW_AT_decl_line             0x00000001

The above says that we have a structure called test of size 0xb8, declared at line 1 of test.cpp. All there are then many children DIEs which describe the members.

< 2><0x00000032>      DW_TAG_member
                        DW_AT_name                  "i"
                        DW_AT_type                  <0x00000063>
                        DW_AT_decl_file             0x00000001 test.cpp
                        DW_AT_decl_line             0x00000002
                        DW_AT_data_member_location  0
< 2><0x0000003e>      DW_TAG_member
                        DW_AT_name                  "j"
                        DW_AT_type                  <0x0000006a>
                        DW_AT_decl_file             0x00000001 test.cpp
                        DW_AT_decl_line             0x00000003
                        DW_AT_data_member_location  4
< 2><0x0000004a>      DW_TAG_member
                        DW_AT_name                  "k"
                        DW_AT_type                  <0x00000071>
                        DW_AT_decl_file             0x00000001 test.cpp
                        DW_AT_decl_line             0x00000004
                        DW_AT_data_member_location  8
< 2><0x00000056>      DW_TAG_member
                        DW_AT_name                  "next"
                        DW_AT_type                  <0x00000084>
                        DW_AT_decl_file             0x00000001 test.cpp
                        DW_AT_decl_line             0x00000005
                        DW_AT_data_member_location  176(as signed = -80)

Each member has a name, a type (which is a DIE offset), a declaration file and line, and a byte offset into the structure where the member is located. The types which are pointed to come next.

< 1><0x00000063>    DW_TAG_base_type
                      DW_AT_name                  "int"
                      DW_AT_encoding              DW_ATE_signed
                      DW_AT_byte_size             0x00000004
< 1><0x0000006a>    DW_TAG_base_type
                      DW_AT_name                  "float"
                      DW_AT_encoding              DW_ATE_float
                      DW_AT_byte_size             0x00000004
< 1><0x00000071>    DW_TAG_array_type
                      DW_AT_type                  <0x00000063>
< 2><0x00000076>      DW_TAG_subrange_type
                        DW_AT_type                  <0x0000007d>
                        DW_AT_count                 0x0000002a
< 1><0x0000007d>    DW_TAG_base_type
                      DW_AT_name                  "sizetype"
                      DW_AT_byte_size             0x00000008
                      DW_AT_encoding              DW_ATE_unsigned
< 1><0x00000084>    DW_TAG_pointer_type
                      DW_AT_type                  <0x0000002a>

As you can see, int on my laptop is a 4-byte signed integer type, and float is a 4-byte float. The integer array type is defined by pointing to the int type as its element type, a sizetype (think size_t) as the index type, with 2a elements. The test* type is a DW_TAG_pointer_type which references the test DIE.


Implementing a simple variable reader

As mentioned, libelfin will deal with most of the complexity for us. However, it doesn’t implement all of the different methods for representing variable locations, and handling a lot of them in our code would get pretty complex. As such, I’ve chosen to only support exprlocs for now. Feel free to add support for more types of expression. If you’re really feeling brave, submit some patches to libelfin to help complete the necessary support!

Handling variables is mostly down to locating the different parts in memory or registers, then reading or writing is the same as you’ve seen before. I’ll only show you how to implement reading for the sake of simplicity.

First we need to tell libelfin how to read registers from our process. We do this by creating a class which inherits from expr_context and uses ptrace to handle everything:

class ptrace_expr_context : public dwarf::expr_context {
public:
    ptrace_expr_context (pid_t pid) : m_pid{pid} {}

    dwarf::taddr reg (unsigned regnum) override {
        return get_register_value_from_dwarf_register(m_pid, regnum);
    }

    dwarf::taddr pc() override {
        struct user_regs_struct regs;
        ptrace(PTRACE_GETREGS, m_pid, nullptr, &regs);
        return regs.rip;
    }

    dwarf::taddr deref_size (dwarf::taddr address, unsigned size) override {
        //TODO take into account size
        return ptrace(PTRACE_PEEKDATA, m_pid, address, nullptr);
    }

private:
    pid_t m_pid;
};

The reading will be handled by a read_variables function in our debugger class:

void debugger::read_variables() {
    using namespace dwarf;

    auto func = get_function_from_pc(get_pc());

    //...
}

The first thing we do above is find the function which we’re currently in. Then we need to loop through the entries in that function, looking for variables:

    for (const auto& die : func) {
        if (die.tag == DW_TAG::variable) {
            //...
        }
    }

We get the location information by looking up the DW_AT_location entry in the DIE:

            auto loc_val = die[DW_AT::location];

Then we ensure that it’s an exprloc and ask libelfin to evaluate the expression for us:

            if (loc_val.get_type() == value::type::exprloc) {
                ptrace_expr_context context {m_pid};
                auto result = loc_val.as_exprloc().evaluate(&context);

Now that we’ve evaluated the expression, we need to read the contents of the variable. It could be in memory or a register, so we’ll handle both cases:

                switch (result.location_type) {
                case expr_result::type::address:
                {
                    auto value = read_memory(result.value);
                    std::cout << at_name(die) << " (0x" << std::hex << result.value << ") = "
                              << value << std::endl;
                    break;
                }

                case expr_result::type::reg:
                {
                    auto value = get_register_value_from_dwarf_register(m_pid, result.value);
                    std::cout << at_name(die) << " (reg " << result.value << ") = "
                              << value << std::endl;
                    break;
                }

                default:
                    throw std::runtime_error{"Unhandled variable location"};
                }

As you can see I’ve simply printed out the value without interpreting it based on the type of the variable. Hopefully from this code you can see how you could support writing variables, or searching for variables with a given name.

Finally we can add this to our command parser:

    else if(is_prefix(command, "variables")) {
        read_variables();
    }

Testing it out

Write a few small functions which have some variables, compile it without optimizations and with debug info, then see if you can read the values of your variables. Try writing to the memory address where a variable is stored and see the behaviour of the program change.


Nine posts down, one to go! Next time I’ll be talking about some more advanced concepts which might interest you. For now you can find the code for this post here

C++17 attributes – maybe_unused, fallthrough and nodiscard

C++17 adds three new attributes for programmers to better express their intent to the compiler and readers of the code: maybe_unused, fallthrough, and nodiscard. This is a quick post to outline what they do and why they are useful.

In case you aren’t familiar with the concept, attributes in C++ allow you mark functions, variables, and other entities with compiler-specific or standard properties. The standard attributes prior to C++17 are noreturn (function doesn’t return), deprecated (entity is going to be removed in a later version) and carries_dependency (used for optimizing atomic operations). You mark an entity with an attribute like this:

[[deprecated]]
void do_thing_the_old_way();

void do_thing_the_new_way();

do_thing_the_old_way(); // Compiler warning: deprecated
do_thing_the_new_way(); // No warning

With that out of the way, on to the new attributes!


maybe_unused

[[maybe_unused]] suppresses compiler warnings about unused entities. Usually unused functions or variables indicates a programmer error – if you never use it, why is it there? – but sometimes they can be intentional, such as variables which are only used in release mode, or functions only called when logging is enabled.

Variables

void emits_warning(bool b) {
     bool result = get_result();
     // Warning emitted in release mode
     assert (b == result);
     //...
}

void warning_suppressed(bool b [[maybe_unused]]) {
     [[maybe_unused]] bool result = get_result();
     // Warning suppressed by [[maybe_unused]]
     assert (b == result);
     //...
}

Functions

static void log_with_warning(){}
[[maybe_unused]] static void log_without_warning() {}

#ifdef LOGGING_ENABLED
log_with_warning();    // Warning emitted if LOGGING_ENABLED is not defined
log_without_warning(); // Warning suppressed by [[maybe_unused]]
#endif

I feel like this attribute is more useful to supress compiler warnings than to document your code for others, but at least we now have a standard way to do so.


fallthrough

[[fallthrough]] indicates that a fallthrough in a switch statement is intentional. Missing a break or return in a switch case is a very common programmer error, so compilers usually warn about it, but sometimes a fallthrough can result in some very terse code.

Say we want to process an alert message. If it’s green, we do nothing; if it’s yellow, we record the alert; if it’s orange we record and trigger the alarm; if it’s red, we record, trigger the alarm and evacuate.

void process_alert (Alert alert) {
     switch (alert) {
     case Alert::Red:
         evacuate();
         // Warning: this statement may fall through

     case Alert::Orange:
         trigger_alarm();
         [[fallthrough]]; //yes, you do need the semicolon
         // Warning suppressed by [[fallthrough]]

     case Alert::Yellow:
         record_alert();
         return;

     case Alert::Green:
         return;
     }
}

The most important function of fallthrough is as documentation for maintainers. The presence of it in the code above shows anyone looking at the code that an orange alert is absolutely supposed to be recorded. Without the fallthrough in the Alert::Red case, it is not obvious whether a red alert is supposed to trigger the alarm and be recorded, or just evacuate everyone.


nodiscard

Functions declared with [[nodiscard]] should not have their return values ignored by the caller. This can be useful if you want to ensure that callers check a return value, or that some scope guard object has a reasonable lifetime. Types can be marked [[nodiscard]] to implicitly mark all functions returning that type as the same.

Functions

[[nodiscard]] error do_something (thing&);
error do_something_else (thing&);

do_something(my_thing); // Warning: ignored return value
do_something_else(my_thing);

Classes

[[nodiscard]]
struct lock_guard;

lock_guard lock (mutex& m);

{
    lock(my_mutex); // Warning: ignored return value
    // critical section
}

Those warnings will help the user notice that do_something_else might be given a bad object, or the critical section won’t be locked.


Compilers have shipped non-standard extensions to express these concepts for years, but it’s great that we now have standard methods to do the same. Leave a comment if you have ideas for other attributes you would like to be added to the language!

For some more examples of using these attributes, see posts by Kenneth Benzie, Bartłomiej Filipek, and Arne Mertz.

2017 in the programming language standards’ world

Yesterday I was at the British Standards Institution for a meeting of IST/5, the committee responsible for programming languages.

The amount of management control over those wanting to get to the meeting room, from outside the building, has increased. There is now a sensor activated sliding door between the car-park and side-walk from the rear of the building to the front, and there are now two receptions; the ground floor reception gets visitors a pass to the first floor, where a pass to the fifth floor is obtained from another reception (I was totally confused by being told to go to the first floor, which housed the canteen last time I was there, and still does, the second reception is perched just inside the automatic barriers to the canteen {these barriers are also new; the food is reasonable, but not free}).

Visitors are supposed to show proof that they are attending a meeting, such as a meeting calling notice or an agenda. I have always managed to look sufficiently important/knowledgeable/harmless to get in without showing any such documents. I was asked to show them this time, perhaps my image is slipping, but my obvious bafflement at the new setup rescued me.

Why does BSI do this? My theory is that it’s all about image, BSI is the UK’s standard setting body and as such has to be seen to follow these standards. There is probably some security standard for rules to follow to prevent people sneaking into buildings. It could be argued that the name British Standards is enough to put anybody off wanting to enter the building in the first place, but this does not sound like a good rationale for BSI to give. Instead, we have lots of sliding doors/gates, multiple receptions (I suspect this has more to do with a building management cat fight over reception costs), lifts with no buttons ‘inside’ for selecting floors, and proof of reasons to be in the building.

There are also new chairs in the open spaces. The chairs have very high backs and side-baffles that surround the head area, excellent for having secret conversations and in-tune with all the security. These open areas are an image of what people in the 1970s thought the future would look like (BSI is a traditional organization after all).

So what happened in the meeting?

Cobol standard’s work becomes even more dead. PL22.4, the US Cobol group is no more (there were insufficient people willing to pay membership fees, so the group was closed down).

People are continuing to work on Fortran (still the language of choice for supercomputer Apps), Ada (some new people have started attending meetings and support for @ is still being fought over), C, Internationalization (all about character sets these days). Unprompted somebody pointed out that the UK C++ panel seemed to be attracting lots of people from the financial industry (I was very professional and did not relay my theory that it’s all about bored consultants wanting an outlet for their creative urges).

SC22, the ISO committee responsible for programming languages, is meeting at BSI next month, and our chairman asked if any of us planned to attend. The chair’s response, to my request to sell the meeting to us, was that his vocabulary was not up to the task; a two-day management meeting (no technical discussions permitted at this level) on programming languages is that exciting (and they are setting up a special reception so that visitors don’t have to go to the first floor to get a pass to attend a meeting on the ground floor).

Metaclasses for embedded domain specific languages

Metaclasses are a proposed feature to C++ which will allow the creation of abstractions over classes and the extension of the language’s type definition system. As an example, consider this definition of IShape:

class IShape {
public:
 virtual int area() const =0;
 virtual void scale_by(double factor) =0;
 // ... etc.
 virtual ~IShape() noexcept { };
};

This class is obviously defining an interface of some kind: all of the member functions are public, pure virtual, and it has a virtual destructor. Other classes will need to be added which inherit from this interface for it to be useful. But those properties I listed are really just boilerplate; it’s just noise that we need to type to get the compiler to do The Right Thing™. It would be a lot clearer and less error-prone if we could push the boilerplate aside and just focus on the details which are important to us. This is the essence of metaclasses. If there were a metaclass called interface, you could write something like this:

interface IShape {
 int area() const;
 void scale_by(double factor);
};

I’m sure you’d agree this is a huge improvement over the last sample. The magic happens in the definition for the interface metaclass, which could look like this:

$class interface {
  ~interface() noexcept { }
  constexpr {
    compiler.require($interface.variables().empty(),
                     "interfaces may not contain data");
    for (auto f : $interface.functions()) {
      compiler.require(!f.is_copy() && !f.is_move(),
                       "interfaces may not copy or move; consider a"
                       " virtual clone() instead");
      if (!f.has_access()) f.make_public();
      compiler.require(f.is_public(),
                       "interface functions must be public");
      f.make_pure_virtual();
    }
  }
};

The ~interface() noexcept { } defines a destructor for instances of the metaclass; this is essentially a default implementation so that you don’t need to type one yourself. Inside the constexpr block we have code to ensure that there are no data members and no copy or move constructors, and to change each function to be virtual and public. Be sure to read over this code a few times to really understand what it’s doing. The first time I saw this code my mind was well and truly blown.

But enough of stealing examples from the paper. I’ve been interested in metaclasses since seeing Herb talk about them at ACCU 2017, and since then I’ve always been looking for places where they could make C++ programming safer, more elegant, more maintainable. In this post I’ll talk about how they can aid creation of embedded domain specific languages.


Embedded what?

Before I get into the meat of what I want to discuss, I have some terms to define.

A domain specific language (DSL) is a language which is specifically designed for some problem area. C++ is a general-purpose language because it can be used for solving any kind of programming problem. SQL is a domain-specific language because it is designed only for reading and manipulating databases.

An embedded DSL (EDSL) is a domain specific language hosted inside a more general language. For example, Boost.Spirit is a parser library which embeds a language for parsing code into C++ by using expression templates and operator overloading.

My case study

The example I’m going to be using is from my ETKF keyboard firmware library. A keyboard definition for ETKF looks like this:

struct my_keyboard {
    using rows = pin_set<b4, b6, f1, f0>;
    using columns = pin_set<f6, f7, c7, c6, d3, d2, d1, d0, b7, b3, b2, b1, b0>;

    using key_positions = typelist<
        row<1,1,1,1,1,1,1,1,1,1,1,1,1>,
        row<1,1,1,1,1,1,1,1,1,1,1,0,1>,
        row<1,0,1,1,1,1,1,1,1,1,1,1,1>,
        row<1,1,1,0,1,1,0,1,1,1,1,1,1>
    >;

    using layouts = typelist<
        row<tab, quot,comm,dot, p,   y,   f,   g,   c,   r,   l,   bspc, null>,
        row<lctl,a,   o,   e,   u,   i,   d,   h,   t,   n,   s,   ent>,
        row<lsft,scln,q,   j,   k,   x,   b,   m,   w,   v,   z,   del>,
        row<caps,lgui,esc,lalt, spc,      null,null,left,down,up,  righ>
    >;
};

This is an EDSL which is hosted inside the C++ template system. It is very declarative in nature; the class declares how the switches are wired up, which parts of the switch matrix have holes in them, and how the keymap should look, but it doesn’t say how the firmware should carry out any computations. We are declaring our problem at a high level of abstraction and the implementation of the library is in charge of mapping this abstraction onto real operations. There are no metaclasses in this example; the implementation is all in C++17. But what could metaclasses add to this?

The first improvement is as embarrassingly simple as it is expressive: changing struct my_keyboard to keyboard my_keyboard. Metaclasses allow us to express not only the name of our class, but also what kind of entity it is. If we had a bunch of declarations like this kicking around – maybe some define processors, some define hardware pins – having a strong indication of the classes’ nature right in the declaration is powerful. my_keyboard is no longer a struct, it is no longer a class, it is a keyboard.

The second is in compiler diagnostics. When I was writing ETKF I wanted to toe the line between carrying out crazy compile-time optimisations and having a clean interface which even non-programmers could use. “This’ll be great”, I thought naively, “I’ll just have a ton of static_asserts which run over the templates before generating the firmware code”. This is indeed possible, but have a look at the resulting compiler errors:

In file included from /home/simon/etkf/src/main.cpp:17:0:
/home/simon/etkf/include/validations.hpp: In instantiation of 'void validate_layout(etkf::typelist<Rows ...>) [with Kbd = test_keyboard; Rows = {etkf::row<(etkf::keys::key)43, (etkf::keys::key)52, (etkf::keys::key)54, (etkf::keys::key)55, (etkf::keys::key)19, (etkf::keys::key)28, (etkf::keys::key)9, (etkf::keys::key)10, (etkf::keys::key)6, (etkf::keys::key)21, (etkf::keys::key)15, (etkf::keys::key)42, (etkf::keys::key)111>, etkf::row<(etkf::keys::key)100, (etkf::keys::key)4, (etkf::keys::key)18, (etkf::keys::key)8, (etkf::keys::key)24, (etkf::keys::key)12, (etkf::keys::key)7, (etkf::keys::key)11, (etkf::keys::key)23, (etkf::keys::key)17, (etkf::keys::key)22, (etkf::keys::key)40>, etkf::row<(etkf::keys::key)57, (etkf::keys::key)103, (etkf::keys::key)41, (etkf::keys::key)102, (etkf::keys::key)44, (etkf::keys::key)109, (etkf::keys::key)111, (etkf::keys::key)80, (etkf::keys::key)81, (etkf::keys::key)82, (etkf::keys::key)79>}]':
/home/simon/etkf/include/validations.hpp:14:26:   required from 'void validate_layouts(etkf::typelist<Rows ...>) [with Kbd = test_keyboard; Layouts = {etkf::typelist<etkf::row<(etkf::keys::key)43, (etkf::keys::key)52, (etkf::keys::key)54, (etkf::keys::key)55, (etkf::keys::key)19, (etkf::keys::key)28, (etkf::keys::key)9, (etkf::keys::key)10, (etkf::keys::key)6, (etkf::keys::key)21, (etkf::keys::key)15, (etkf::keys::key)42, (etkf::keys::key)111>, etkf::row<(etkf::keys::key)100, (etkf::keys::key)4, (etkf::keys::key)18, (etkf::keys::key)8, (etkf::keys::key)24, (etkf::keys::key)12, (etkf::keys::key)7, (etkf::keys::key)11, (etkf::keys::key)23, (etkf::keys::key)17, (etkf::keys::key)22, (etkf::keys::key)40>, etkf::row<(etkf::keys::key)57, (etkf::keys::key)103, (etkf::keys::key)41, (etkf::keys::key)102, (etkf::keys::key)44, (etkf::keys::key)109, (etkf::keys::key)111, (etkf::keys::key)80, (etkf::keys::key)81, (etkf::keys::key)82, (etkf::keys::key)79> >, etkf::typelist<etkf::row<(etkf::keys::key)43, (etkf::keys::key)30, (etkf::keys::key)31, (etkf::keys::key)32, (etkf::keys::key)33, (etkf::keys::key)34, (etkf::keys::key)35, (etkf::keys::key)36, (etkf::keys::key)37, (etkf::keys::key)38, (etkf::keys::key)39, (etkf::keys::key)42, (etkf::keys::key)111>, etkf::row<(etkf::keys::key)100, (etkf::keys::key)47, (etkf::keys::key)48, (etkf::keys::key)56, (etkf::keys::key)46, (etkf::keys::key)45, (etkf::keys::key)50, (etkf::keys::key)49, (etkf::keys::key)53, (etkf::keys::key)17, (etkf::keys::key)22, (etkf::keys::key)40>, etkf::row<(etkf::keys::key)101, (etkf::keys::key)58, (etkf::keys::key)59, (etkf::keys::key)60, (etkf::keys::key)61, (etkf::keys::key)62, (etkf::keys::key)63, (etkf::keys::key)64, (etkf::keys::key)65, (etkf::keys::key)66, (etkf::keys::key)67, (etkf::keys::key)76>, etkf::row<(etkf::keys::key)57, (etkf::keys::key)103, (etkf::keys::key)41, (etkf::keys::key)102, (etkf::keys::key)44, (etkf::keys::key)109, (etkf::keys::key)111, (etkf::keys::key)80, (etkf::keys::key)81, (etkf::keys::key)82, (etkf::keys::key)79> >, etkf::typelist<etkf::row<(etkf::keys::key)43, (etkf::keys::key)20, (etkf::keys::key)26, (etkf::keys::key)8, (etkf::keys::key)21, (etkf::keys::key)23, (etkf::keys::key)28, (etkf::keys::key)24, (etkf::keys::key)12, (etkf::keys::key)18, (etkf::keys::key)19, (etkf::keys::key)42, (etkf::keys::key)111>, etkf::row<(etkf::keys::key)100, (etkf::keys::key)4, (etkf::keys::key)22, (etkf::keys::key)7, (etkf::keys::key)9, (etkf::keys::key)10, (etkf::keys::key)11, (etkf::keys::key)13, (etkf::keys::key)14, (etkf::keys::key)15, (etkf::keys::key)51, (etkf::keys::key)40>, etkf::row<(etkf::keys::key)101, (etkf::keys::key)29, (etkf::keys::key)27, (etkf::keys::key)6, (etkf::keys::key)25, (etkf::keys::key)5, (etkf::keys::key)17, (etkf::keys::key)16, (etkf::keys::key)54, (etkf::keys::key)55, (etkf::keys::key)56, (etkf::keys::key)76>, etkf::row<(etkf::keys::key)57, (etkf::keys::key)103, (etkf::keys::key)41, (etkf::keys::key)102, (etkf::keys::key)44, (etkf::keys::key)109, (etkf::keys::key)111, (etkf::keys::key)80, (etkf::keys::key)81, (etkf::keys::key)82, (etkf::keys::key)79> >}]'
/home/simon/etkf/include/validations.hpp:19:26:   required from 'void validate_keyboard() [with Kbd = test_keyboard]'
/home/simon/etkf/src/main.cpp:267:44:   required from here
/home/simon/etkf/include/validations.hpp:8:5: error: static assertion failed: A layout has the wrong number of rows
     static_assert(sizeof...(Rows) == variadic_size<typename Kbd::rows>::value,

Yeah.

I got my wish of a nice little assert message at the bottom which tells you exactly what you did wrong, but if you’re not a seasoned C++ developer you don’t get that far. You run. Fast.

However, with the help of metaclasses and requires clauses from concepts, we could write something like this1:

$class keyboard {
    constexpr {
        compiler.require(requires(keyboard kbd) { keyboard::layouts{}; },
                         "must have type 'layouts'");
        compiler.require(requires(keyboard kbd) { keyboard::rows{}; },
                         "must have type 'rows'");
        compiler.require(requires(keyboard kbd) { keyboard::columns{}; },
                         "must have type 'columns'");
        compiler.require(requires(keyboard kbd) { keyboard::key_positions{}; },
                         "must have type 'key_positions'");

        if (!all_types<keyboard::key_positions> ([](auto type){
                return variadic_size<decltype(type)::type> == variadic_size<keyboard::columns>;
            })) {
            compiler.error("Each row of 'key_positions' must be the same size as 'columns'");
            compiler.error("Check the definition of 'key_positions' here",
                           $(keyboard::key_positions).source_location());
            compiler.error("Check the definition of 'columns' here",
                           $(keyboard::columns).source_location());
        }
    }
}

Now the validations are run as part of generating a class from the metaclass, and the diagnostics should be placed depending on the source location which is given to the compiler.error calls. With sufficiently fine-grained control over the placement of diagnostics, all error messages can be emitted at the abstraction level of the EDSL rather than having C++ template guff injecting itself into the party.

Code to generate firmware from the high-level descriptions can now also be placed in the implementation of the keyboard metaclass, so that executing the firmware is carried out by calling my_keyboard::run_firmware():

$class keyboard {
    //validations as above
    
    template <class Layouts>
    static pressed_keys scan_matrix() {
        //...
    }
    
    static void run_firmware() {
        while (true) {
            auto pressed_keys = scan_matrix<keyboard::layouts>();
            //...
        }
    }
}

The above also somewhat addresses problem of cohesion and code locality. In my C++17 ETKF implementation, the validations which are run over the keyboard descriptions are quite separate from the code which generates the firmware from the template declarations. But really, these are both part of the abstraction which I’m trying to express in the interface. Metaclasses provide a means to tie together the constraints on the declaration as well as the code which lowers the EDSL into normal C++ land.


That’s it for my contribution to the metaclass hype train. Maybe I’ll write some more posts as I come up with more ideas, but I’m particularly interested in exploring the design space for declarative EDSLs in C++. Templates are a powerful host for other languages, and metaclasses only make them more so.


  1. There’s not an implementation fully-featured enough to compile this yet, but it seems to be in line with the paper’s definition. 

Writing a Linux Debugger Part 8: Stack unwinding

Sometimes the most important information you need to know about what your current program state is how it got there. This is typically provided with a backtrace command, which gives you the chain of function calls which have lead to the the program is right now. This post will show you how to implement stack unwinding on x86_64 to generate such a backtrace.


Series index

  1. Setup
  2. Breakpoints
  3. Registers and memory
  4. Elves and dwarves
  5. Source and signals
  6. Source-level stepping
  7. Source-level breakpoints
  8. Stack unwinding
  9. Handling variables
  10. Advanced topics

Take the following program as an example:

void a() {
    //stopped here
}

void b() {
     a();
}

void c() {
     a();
}

int main() {
    b();
    c();
}

If the debugger is stopped at the //stopped here line, there are two ways which it could have got there: main->b->a or main->c->a. If we set a breakpoint there with LLDB, continue, and ask for a backtrace, then we get the following:

* frame #0: 0x00000000004004da a.out`a() + 4 at bt.cpp:3
  frame #1: 0x00000000004004e6 a.out`b() + 9 at bt.cpp:6
  frame #2: 0x00000000004004fe a.out`main + 9 at bt.cpp:14
  frame #3: 0x00007ffff7a2e830 libc.so.6`__libc_start_main + 240 at libc-start.c:291
  frame #4: 0x0000000000400409 a.out`_start + 41

This says that we are currently in function a, which we got to from function b, which we got to from main and so on. Those final two frames are just how the compiler has bootstrapped the main function.

The question now is how we implement this on x86_64. The most robust way to do this is to parse the .eh_frame section of the ELF file and work out how to unwind the stack from there, but this is a pain. You could use libunwind or something similar to do it for you, but that’s boring. Instead, we’ll assume that the compiler has laid out the stack in a certain way and we’ll just walk it manually. In order to do this, we first need to understand how the stack is laid out.

            High
        |   ...   |
        +---------+
     +24|  Arg 1  |
        +---------+
     +16|  Arg 2  |
        +---------+
     + 8| Return  |
        +---------+
EBP+--> |Saved EBP|
        +---------+
     - 8|  Var 1  |
        +---------+
ESP+--> |  Var 2  |
        +---------+
        |   ...   |
            Low

As you can see, the frame pointer for the last stack frame is stored at the start of current stack frame, creating a linked list of frame pointers. The stack is unwound by following this linked list. We can find out which function the next frame in the list belongs to by looking up the return address in the DWARF info. Some compilers will omit tracking the frame base with the EBP, since this can be represented as an offset from ESP and it frees up an extra register. Passing -fno-omit-frame-pointer to GCC or Clang should force it to follow the convention we’re relying on, even when optimisations are enabled.

We’ll do all our work in a print_backtrace function:

void debugger::print_backtrace() {

Something to decide early is what format to print out the frame information in. I used a little lambda to push this out the way:

    auto output_frame = [frame_number = 0] (auto&& func) mutable {
        std::cout << "frame #" << frame_number++ << ": 0x" << dwarf::at_low_pc(func)
                  << ' ' << dwarf::at_name(func) << std::endl;
    };

The first frame to print out will be the one which is currently being executed. We can get the information for this frame by looking up the current program counter in the DWARF:

    auto current_func = get_function_from_pc(get_pc());
    output_frame(current_func);

Next we need to get the frame pointer and return address for the current function. The frame pointer is stored in the rbp register, and the return address is 8 bytes up the stack from the frame pointer.

    auto frame_pointer = get_register_value(m_pid, reg::rbp);
    auto return_address = read_memory(frame_pointer+8);

Now we have all the information we need to unwind the stack. I’m just going to keep unwinding until the debugger hits main, but you could also choose to stop when the frame pointer is 0x0, which will get you the functions which your implementation called before main as well. We’ll to grab the frame pointer and return address from each frame and print out the information as we go.

    while (dwarf::at_name(current_func) != "main") {
        current_func = get_function_from_pc(return_address);
        output_frame(current_func);
        frame_pointer = read_memory(frame_pointer);
        return_address = read_memory(frame_pointer+8);
    }
}

That’s it! The whole function is here for your convenience:

void debugger::print_backtrace() {
    auto output_frame = [frame_number = 0] (auto&& func) mutable {
        std::cout << "frame #" << frame_number++ << ": 0x" << dwarf::at_low_pc(func)
                  << ' ' << dwarf::at_name(func) << std::endl;
    };

    auto current_func = get_function_from_pc(get_pc());
    output_frame(current_func);

    auto frame_pointer = get_register_value(m_pid, reg::rbp);
    auto return_address = read_memory(frame_pointer+8);

    while (dwarf::at_name(current_func) != "main") {
        current_func = get_function_from_pc(return_address);
        output_frame(current_func);
        frame_pointer = read_memory(frame_pointer);
        return_address = read_memory(frame_pointer+8);
    }
}

Adding commands

Of course, we have to expose this command to the user.

    else if(is_prefix(command, "backtrace")) {
        print_backtrace();
    }

Testing it out

A good way to test this functionality is by writing a test program with a bunch of small functions which call each other. Set a few breakpoints, jump around the code a bit, and make sure that your backtrace is accurate.


We’ve come a long way from a program which can merely spawn and attach to other programs. The penultimate post in this series will finish up the implementation of the debugger by supporting the reading and writing of variables. Until then you can find the code for this post here.

Writing a Linux Debugger Part 7: Source-level breakpoints

Setting breakpoints on memory addresses is all well and good, but it doesn’t provide the most user-friendly tool. We’d like to be able to set breakpoints on source lines and function entry addresses as well, so that we can debug at the same abstraction level as our code.

This post will add source-level breakpoints to our debugger. With all of the support we already have available to us, this is a lot easier than it may first sound. We’ll also add a command to get the type and address of a symbol, which can be useful for locating code or data and understanding linking concepts.


Series index

  1. Setup
  2. Breakpoints
  3. Registers and memory
  4. Elves and dwarves
  5. Source and signals
  6. Source-level stepping
  7. Source-level breakpoints
  8. Stack unwinding
  9. Handling variables
  10. Advanced topics

Breakpoints

DWARF

The Elves and dwarves post described how DWARF debug information works and how it can be used to map the machine code back to the high-level source. Recall that DWARF contains the address ranges of functions and a line table which lets you translate code positions between abstraction levels. We’ll be using these capabilities to implement our breakpoints.

Function entry

Setting breakpoints on function names can be complex if you want to take overloading, member functions and such into account, but we’re going to iterate through all of the compilation units and search for functions with names which match what we’re looking for. The DWARF information will look something like this:

< 0><0x0000000b>  DW_TAG_compile_unit
                    DW_AT_producer              clang version 3.9.1 (tags/RELEASE_391/final)
                    DW_AT_language              DW_LANG_C_plus_plus
                    DW_AT_name                  /super/secret/path/MiniDbg/examples/variable.cpp
                    DW_AT_stmt_list             0x00000000
                    DW_AT_comp_dir              /super/secret/path/MiniDbg/build
                    DW_AT_low_pc                0x00400670
                    DW_AT_high_pc               0x0040069c

LOCAL_SYMBOLS:
< 1><0x0000002e>    DW_TAG_subprogram
                      DW_AT_low_pc                0x00400670
                      DW_AT_high_pc               0x0040069c
                      DW_AT_name                  foo
                      ...
...
<14><0x000000b0>    DW_TAG_subprogram
                      DW_AT_low_pc                0x00400700
                      DW_AT_high_pc               0x004007a0
                      DW_AT_name                  bar
                      ...

We want to match against DW_AT_name and use DW_AT_low_pc (the start address of the function) to set our breakpoint.

void debugger::set_breakpoint_at_function(const std::string& name) {
    for (const auto& cu : m_dwarf.compilation_units()) {
        for (const auto& die : cu.root()) {
            if (die.has(dwarf::DW_AT::name) && at_name(die) == name) {
                auto low_pc = at_low_pc(die);
                auto entry = get_line_entry_from_pc(low_pc);
                ++entry; //skip prologue
                set_breakpoint_at_address(entry->address);
            }
        }
    }
}

The only bit of that code which looks a bit weird is the ++entry. The problem is that the DW_AT_low_pc for a function doesn’t point at the start of the user code for that function, it points to the start of the prologue. The compiler will usually output a prologue and epilogue for a function which carries out saving and restoring registers, manipulating the stack pointer and suchlike. This isn’t very useful for us, so we increment the line entry by one to get the first line of the user code instead of the prologue. The DWARF line table actually has some functionality to mark an entry as the first line after the function prologue, but not all compilers output this, so I’ve taken the naive approach.

Source line

To set a breakpoint on a high-level source line, we translate this line number into an address by looking it up in the DWARF. We’ll iterate through the compilation units looking for one whose name matches the given file, then look for the entry which corresponds to the given line.

The DWARF will look something like this:

.debug_line: line number info for a single cu
Source lines (from CU-DIE at .debug_info offset 0x0000000b):

NS new statement, BB new basic block, ET end of text sequence
PE prologue end, EB epilogue begin
IS=val ISA number, DI=val discriminator value
<pc>        [lno,col] NS BB ET PE EB IS= DI= uri: "filepath"
0x004004a7  [   1, 0] NS uri: "/super/secret/path/a.hpp"
0x004004ab  [   2, 0] NS
0x004004b2  [   3, 0] NS
0x004004b9  [   4, 0] NS
0x004004c1  [   5, 0] NS
0x004004c3  [   1, 0] NS uri: "/super/secret/path/b.hpp"
0x004004c7  [   2, 0] NS
0x004004ce  [   3, 0] NS
0x004004d5  [   4, 0] NS
0x004004dd  [   5, 0] NS
0x004004df  [   4, 0] NS uri: "/super/secret/path/ab.cpp"
0x004004e3  [   5, 0] NS
0x004004e8  [   6, 0] NS
0x004004ed  [   7, 0] NS
0x004004f4  [   7, 0] NS ET

So if we want to set a breakpoint on line 5 of ab.cpp, we look up the entry which corresponds to that line (0x004004e3) and set a breakpoint there.

void debugger::set_breakpoint_at_source_line(const std::string& file, unsigned line) {
    for (const auto& cu : m_dwarf.compilation_units()) {
        if (is_suffix(file, at_name(cu.root()))) {
            const auto& lt = cu.get_line_table();

            for (const auto& entry : lt) {
                if (entry.is_stmt && entry.line == line) {
                    set_breakpoint_at_address(entry.address);
                    return;
                }
            }
        }
    }
}

My is_suffix hack is there so you can type c.cpp for a/b/c.cpp. Of course you should actually use a sensible path handling library or something; I’m lazy. The entry.is_stmt is checking that the line table entry is marked as the beginning of a statement, which is set by the compiler on the address it thinks is the best target for a breakpoint.


Symbol lookup

When we get down to the level of object files, symbols are king. Functions are named with symbols, global variables are named with symbols, you get a symbol, we get a symbol, everyone gets a symbol. In a given object file, some symbols might reference other object files or shared libraries, where the linker will patch things up to create an executable program from the symbol reference spaghetti.

Symbols can be looked up in the aptly-named symbol table, which is stored in ELF sections in the binary. Fortunately, libelfin has a fairly nice interface for doing this, so we don’t need to deal with all of the ELF nonsense ourselves. To give you an idea of what we’re dealing with, here is a dump of the .symtab section of a binary, produced with readelf:

Num:    Value          Size Type    Bind   Vis      Ndx Name
 0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
 1: 0000000000400238     0 SECTION LOCAL  DEFAULT    1
 2: 0000000000400254     0 SECTION LOCAL  DEFAULT    2
 3: 0000000000400278     0 SECTION LOCAL  DEFAULT    3
 4: 00000000004002c8     0 SECTION LOCAL  DEFAULT    4
 5: 0000000000400430     0 SECTION LOCAL  DEFAULT    5
 6: 00000000004004e4     0 SECTION LOCAL  DEFAULT    6
 7: 0000000000400508     0 SECTION LOCAL  DEFAULT    7
 8: 0000000000400528     0 SECTION LOCAL  DEFAULT    8
 9: 0000000000400558     0 SECTION LOCAL  DEFAULT    9
10: 0000000000400570     0 SECTION LOCAL  DEFAULT   10
11: 0000000000400714     0 SECTION LOCAL  DEFAULT   11
12: 0000000000400720     0 SECTION LOCAL  DEFAULT   12
13: 0000000000400724     0 SECTION LOCAL  DEFAULT   13
14: 0000000000400750     0 SECTION LOCAL  DEFAULT   14
15: 0000000000600e18     0 SECTION LOCAL  DEFAULT   15
16: 0000000000600e20     0 SECTION LOCAL  DEFAULT   16
17: 0000000000600e28     0 SECTION LOCAL  DEFAULT   17
18: 0000000000600e30     0 SECTION LOCAL  DEFAULT   18
19: 0000000000600ff0     0 SECTION LOCAL  DEFAULT   19
20: 0000000000601000     0 SECTION LOCAL  DEFAULT   20
21: 0000000000601018     0 SECTION LOCAL  DEFAULT   21
22: 0000000000601028     0 SECTION LOCAL  DEFAULT   22
23: 0000000000000000     0 SECTION LOCAL  DEFAULT   23
24: 0000000000000000     0 SECTION LOCAL  DEFAULT   24
25: 0000000000000000     0 SECTION LOCAL  DEFAULT   25
26: 0000000000000000     0 SECTION LOCAL  DEFAULT   26
27: 0000000000000000     0 SECTION LOCAL  DEFAULT   27
28: 0000000000000000     0 SECTION LOCAL  DEFAULT   28
29: 0000000000000000     0 SECTION LOCAL  DEFAULT   29
30: 0000000000000000     0 SECTION LOCAL  DEFAULT   30
31: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS init.c
32: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
33: 0000000000600e28     0 OBJECT  LOCAL  DEFAULT   17 __JCR_LIST__
34: 00000000004005a0     0 FUNC    LOCAL  DEFAULT   10 deregister_tm_clones
35: 00000000004005e0     0 FUNC    LOCAL  DEFAULT   10 register_tm_clones
36: 0000000000400620     0 FUNC    LOCAL  DEFAULT   10 __do_global_dtors_aux
37: 0000000000601028     1 OBJECT  LOCAL  DEFAULT   22 completed.6917
38: 0000000000600e20     0 OBJECT  LOCAL  DEFAULT   16 __do_global_dtors_aux_fin
39: 0000000000400640     0 FUNC    LOCAL  DEFAULT   10 frame_dummy
40: 0000000000600e18     0 OBJECT  LOCAL  DEFAULT   15 __frame_dummy_init_array_
41: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS /super/secret/path/MiniDbg/
42: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
43: 0000000000400818     0 OBJECT  LOCAL  DEFAULT   14 __FRAME_END__
44: 0000000000600e28     0 OBJECT  LOCAL  DEFAULT   17 __JCR_END__
45: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS
46: 0000000000400724     0 NOTYPE  LOCAL  DEFAULT   13 __GNU_EH_FRAME_HDR
47: 0000000000601000     0 OBJECT  LOCAL  DEFAULT   20 _GLOBAL_OFFSET_TABLE_
48: 0000000000601028     0 OBJECT  LOCAL  DEFAULT   21 __TMC_END__
49: 0000000000601020     0 OBJECT  LOCAL  DEFAULT   21 __dso_handle
50: 0000000000600e20     0 NOTYPE  LOCAL  DEFAULT   15 __init_array_end
51: 0000000000600e18     0 NOTYPE  LOCAL  DEFAULT   15 __init_array_start
52: 0000000000600e30     0 OBJECT  LOCAL  DEFAULT   18 _DYNAMIC
53: 0000000000601018     0 NOTYPE  WEAK   DEFAULT   21 data_start
54: 0000000000400710     2 FUNC    GLOBAL DEFAULT   10 __libc_csu_fini
55: 0000000000400570    43 FUNC    GLOBAL DEFAULT   10 _start
56: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
57: 0000000000400714     0 FUNC    GLOBAL DEFAULT   11 _fini
58: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@@GLIBC_
59: 0000000000400720     4 OBJECT  GLOBAL DEFAULT   12 _IO_stdin_used
60: 0000000000601018     0 NOTYPE  GLOBAL DEFAULT   21 __data_start
61: 00000000004006a0   101 FUNC    GLOBAL DEFAULT   10 __libc_csu_init
62: 0000000000601028     0 NOTYPE  GLOBAL DEFAULT   22 __bss_start
63: 0000000000601030     0 NOTYPE  GLOBAL DEFAULT   22 _end
64: 0000000000601028     0 NOTYPE  GLOBAL DEFAULT   21 _edata
65: 0000000000400670    44 FUNC    GLOBAL DEFAULT   10 main
66: 0000000000400558     0 FUNC    GLOBAL DEFAULT    9 _init

You can see lots of symbols for sections in the object file, symbols which are used by the implementation for setting up the environment, and at the end you can see the symbol for main.

We’re interested in the type, name, and value (address) of the symbol. We’ll have a symbol_type enum for the type and use a std::string for the name and std::uintptr_t for the address:

enum class symbol_type {
    notype,            // No type (e.g., absolute symbol)
    object,            // Data object
    func,              // Function entry point
    section,           // Symbol is associated with a section
    file,              // Source file associated with the
};                     // object file

std::string to_string (symbol_type st) {
    switch (st) {
    case symbol_type::notype: return "notype";
    case symbol_type::object: return "object";
    case symbol_type::func: return "func";
    case symbol_type::section: return "section";
    case symbol_type::file: return "file";
    }
}

struct symbol {
    symbol_type type;
    std::string name;
    std::uintptr_t addr;
};

We’ll need to map between the symbol type we get from libelfin and our enum since we don’t want the dependency poisoning this interface. Fortunately I picked the same names for everything, so this is dead easy:

symbol_type to_symbol_type(elf::stt sym) {
    switch (sym) {
    case elf::stt::notype: return symbol_type::notype;
    case elf::stt::object: return symbol_type::object;
    case elf::stt::func: return symbol_type::func;
    case elf::stt::section: return symbol_type::section;
    case elf::stt::file: return symbol_type::file;
    default: return symbol_type::notype;
    }
};

Lastly we want to look up the symbol. For illustrative purposes I loop through the sections of the ELF looking for symbol tables, then collect any symbols I find in them into a std::vector. A smarter implementation would build up a map from names to symbols so that you only have to look at all the data once.

std::vector<symbol> debugger::lookup_symbol(const std::string& name) {
    std::vector<symbol> syms;

    for (auto &sec : m_elf.sections()) {
        if (sec.get_hdr().type != elf::sht::symtab && sec.get_hdr().type != elf::sht::dynsym)
            continue;

        for (auto sym : sec.as_symtab()) {
            if (sym.get_name() == name) {
                auto &d = sym.get_data();
                syms.push_back(symbol{to_symbol_type(d.type()), sym.get_name(), d.value});
            }
        }
    }

    return syms;
}

Adding commands

As always, we need to add some more commands to expose the functionality to users. For breakpoints I’ve gone for a GDB-style interface, where the kind of breakpoint is inferred from the argument you pass rather than requiring explicit switches:

  • 0x<hexadecimal> -> address breakpoint
  • <line>:<filename> -> line number breakpoint
  • <anything else> -> function name breakpoint
    else if(is_prefix(command, "break")) {
        if (args[1][0] == '0' && args[1][1] == 'x') {
            std::string addr {args[1], 2};
            set_breakpoint_at_address(std::stol(addr, 0, 16));
        }
        else if (args[1].find(':') != std::string::npos) {
            auto file_and_line = split(args[1], ':');
            set_breakpoint_at_source_line(file_and_line[0], std::stoi(file_and_line[1]));
        }
        else {
            set_breakpoint_at_function(args[1]);
        }
    }

For symbols we’ll lookup the symbol and print out any matches we find:

else if(is_prefix(command, "symbol")) {
    auto syms = lookup_symbol(args[1]);
    for (auto&& s : syms) {
        std::cout << s.name << ' ' << to_string(s.type) << " 0x" << std::hex << s.addr << std::endl;
    }
}

Testing it out

Fire up your debugger on a simple binary, play around with setting source-level breakpoints. Setting a breakpoint on some foo and seeing my debugger stop on it was one of the most rewarding moments of this project for me.

Symbol lookup can be tested by adding some functions or global variables to your program and looking up the names of them. Note that if you’re compiling C++ code you’ll need to take name mangling into account as well.

That’s all for this post. Next time I’ll show how to add stack unwinding support to the debugger.

You can find the code for this post here.

Stack Overflow With Custom JsonConverter

[There is a Gist on GitHub that contains a minimal working example and summary of this post.]

We recently needed to change our data model so that what was originally a list of one type, became a list of objects of different types with a common base, i.e. our JSON deserialization now needed to deal with polymorphic types.

Naturally we googled the problem to see what support, if any, Newtonsoft’s JSON.Net had. Although it has some built-in support, like many built-in solutions it stores fully qualified type names which we didn’t want in our JSON, we just wanted simple technology-agnostic type names like “cat” or “dog” that we would be happy to map manually somewhere in our code. We didn’t want to write all the deserialization logic manually, but was happy to give the library a leg-up with the mapping of types.

JsonConverter

Our searching quickly led to the following question on Stack Overflow: “Deserializing polymorphic json classes without type information using json.net”. The lack of type information mentioned in the question meant the exact .Net type (i.e. name, assembly, version, etc.), and so the answer describes how to do it where you can infer the resulting type from one or more attributes in the data itself. In our case it was a field unsurprisingly called “type” that held a simplified name as described earlier.

The crux of the solution involves creating a JsonConverter and implementing the two methods CanConvert and ReadJson. If we follow that Stack Overflow post’s top answer we end up with an implementation something like this:

public class CustomJsonConverter : JsonConverter
{
  public override bool CanConvert(Type objectType)
  {
    return typeof(BaseType).
                       IsAssignableFrom(objectType);
  }

  public override object ReadJson(JsonReader reader,
           Type objectType, object existingValue,
           JsonSerializer serializer)
  {
    JObject item = JObject.Load(reader);

    if (item.Value<string>(“type”) == “Derived”)
    {
      return item.ToObject<DerivedType>();
    }
    else
    . . .
  }
}

This all made perfect sense and even agreed with a couple of other blog posts on the topic we unearthed. However when we plugged it in we ended up with an infinite loop in the ReadJson method that resulted in a StackOverflowException. Doing some more googling and checking the Newtonsoft JSON.Net documentation didn’t point out our “obvious” mistake and so we resorted to the time honoured technique of fumbling around with the code to see if we could get this (seemingly promising) solution working.

A Blind Alley

One avenue that appeared to fix the problem was manually adding the JsonConverter to the list of Converters in the JsonSerializerSettings object instead of using the [JsonConverter] attribute on the base class. We went back and forth with some unit tests to prove that this was indeed the solution and even committed this fix to our codebase.

However I was never really satisfied with this outcome and so decided to write this incident up. I started to work through the simplest possible example to illustrate the behaviour but when I came to repro it I found that neither approach worked – attribute or serializer settings - I always got into an infinite loop.

Hence I questioned our original diagnosis and continued to see if there was a more satisfactory answer.

ToObject vs Populate

I went back and re-read the various hits we got with those additional keywords (recursion, infinite loop and stack overflow) to see if we’d missed something along the way. The two main candidates were “Polymorphic JSON Deserialization failing using Json.Net” and “Custom inheritance JsonConverter fails when JsonConverterAttribute is used”. Neither of these explicitly references the answer we initially found and what might be wrong with it – they give a different answer to a slightly different question.

However in these answers they suggest de-serializing the object in a different way, instead of using ToObject<DerivedType>() to do all the heavy lifting, they suggest creating the uninitialized object yourself and then using Populate() to fill in the details, like this:

{
  JObject item = JObject.Load(reader);

  if (item.Value<string>(“type”) == “Derived”)
  {
    var @object = new DerivedType();
    serializer.Populate(item.CreateReader(), @object);
    return @object;
  }
  else
    . . .
}

Plugging this approach into my minimal example worked, and for both the converter techniques too: attribute and serializer settings.

Unanswered Questions

So I’ve found another technique that works, which is great, but I still lack closure around the whole affair. For example, how come the answer in the the original Stack Overflow question “Deserializing polymorphic json classes” didn’t work for us? That answer has plenty of up-votes and so should be considered pretty reliable. Has there been a change to Newtonsoft’s JSON.Net library that has somehow caused this answer to now break for others? Is there a new bug that we’ve literally only just discovered (we’re using v10)? Why don’t the JSON.Net docs warn against this if it really is an issue, or are we looking in the wrong part of the docs?

As described right at the beginning I’ve published a Gist with my minimal example and added a comment to the Stack Overflow answer with that link so that anyone else on the same journey has some other pieces of the jigsaw to work with. Perhaps over time my comment will also acquire up-votes to help indicate that it’s not so cut-and-dried. Or maybe someone who knows the right answer will spot it and point out where we went wrong.

Ultimately though this is probably a case of not seeing the wood for the trees. It’s so easy when you’re trying to solve one problem to get lost in the accidental complexity and not take a step back. Answers on Stack Overflow generally carry a large degree of gravitas, but they should not be assumed to be infallible. All documentation can go out of date even if there are (seemingly) many eyes watching over it.

When your mind-set is one that always assumes the bugs are of your own making, unless the evidence is overwhelming, then those times when you might actually not be entirely at fault seem to feel all the more embarrassing when you realise the answer was probably there all along but you discounted it too early because your train of thought was elsewhere.

Stack Overflow With Custom JsonConverter

[There is a Gist on GitHub that contains a minimal working example and summary of this post.]

We recently needed to change our data model so that what was originally a list of one type, became a list of objects of different types with a common base, i.e. our JSON deserialization now needed to deal with polymorphic types.

Naturally we googled the problem to see what support, if any, Newtonsoft’s JSON.Net had. Although it has some built-in support, like many built-in solutions it stores fully qualified type names which we didn’t want in our JSON, we just wanted simple technology-agnostic type names like “cat” or “dog” that we would be happy to map manually somewhere in our code. We didn’t want to write all the deserialization logic manually, but was happy to give the library a leg-up with the mapping of types.

JsonConverter

Our searching quickly led to the following question on Stack Overflow: “Deserializing polymorphic json classes without type information using json.net”. The lack of type information mentioned in the question meant the exact .Net type (i.e. name, assembly, version, etc.), and so the answer describes how to do it where you can infer the resulting type from one or more attributes in the data itself. In our case it was a field unsurprisingly called “type” that held a simplified name as described earlier.

The crux of the solution involves creating a JsonConverter and implementing the two methods CanConvert and ReadJson. If we follow that Stack Overflow post’s top answer we end up with an implementation something like this:

public class CustomJsonConverter : JsonConverter
{
  public override bool CanConvert(Type objectType)
  {
    return typeof(BaseType).
                       IsAssignableFrom(objectType);
  }

  public override object ReadJson(JsonReader reader,
           Type objectType, object existingValue,
           JsonSerializer serializer)
  {
    JObject item = JObject.Load(reader);

    if (item.Value<string>(“type”) == “Derived”)
    {
      return item.ToObject<DerivedType>();
    }
    else
    . . .
  }
}

This all made perfect sense and even agreed with a couple of other blog posts on the topic we unearthed. However when we plugged it in we ended up with an infinite loop in the ReadJson method that resulted in a StackOverflowException. Doing some more googling and checking the Newtonsoft JSON.Net documentation didn’t point out our “obvious” mistake and so we resorted to the time honoured technique of fumbling around with the code to see if we could get this (seemingly promising) solution working.

A Blind Alley

One avenue that appeared to fix the problem was manually adding the JsonConverter to the list of Converters in the JsonSerializerSettings object instead of using the [JsonConverter] attribute on the base class. We went back and forth with some unit tests to prove that this was indeed the solution and even committed this fix to our codebase.

However I was never really satisfied with this outcome and so decided to write this incident up. I started to work through the simplest possible example to illustrate the behaviour but when I came to repro it I found that neither approach worked – attribute or serializer settings - I always got into an infinite loop.

Hence I questioned our original diagnosis and continued to see if there was a more satisfactory answer.

ToObject vs Populate

I went back and re-read the various hits we got with those additional keywords (recursion, infinite loop and stack overflow) to see if we’d missed something along the way. The two main candidates were “Polymorphic JSON Deserialization failing using Json.Net” and “Custom inheritance JsonConverter fails when JsonConverterAttribute is used”. Neither of these explicitly references the answer we initially found and what might be wrong with it – they give a different answer to a slightly different question.

However in these answers they suggest de-serializing the object in a different way, instead of using ToObject<DerivedType>() to do all the heavy lifting, they suggest creating the uninitialized object yourself and then using Populate() to fill in the details, like this:

{
  JObject item = JObject.Load(reader);

  if (item.Value<string>(“type”) == “Derived”)
  {
    var @object = new DerivedType();
    serializer.Populate(item.CreateReader(), @object);
    return @object;
  }
  else
    . . .
}

Plugging this approach into my minimal example worked, and for both the converter techniques too: attribute and serializer settings.

Unanswered Questions

So I’ve found another technique that works, which is great, but I still lack closure around the whole affair. For example, how come the answer in the the original Stack Overflow question “Deserializing polymorphic json classes” didn’t work for us? That answer has plenty of up-votes and so should be considered pretty reliable. Has there been a change to Newtonsoft’s JSON.Net library that has somehow caused this answer to now break for others? Is there a new bug that we’ve literally only just discovered (we’re using v10)? Why don’t the JSON.Net docs warn against this if it really is an issue, or are we looking in the wrong part of the docs?

As described right at the beginning I’ve published a Gist with my minimal example and added a comment to the Stack Overflow answer with that link so that anyone else on the same journey has some other pieces of the jigsaw to work with. Perhaps over time my comment will also acquire up-votes to help indicate that it’s not so cut-and-dried. Or maybe someone who knows the right answer will spot it and point out where we went wrong.

Ultimately though this is probably a case of not seeing the wood for the trees. It’s so easy when you’re trying to solve one problem to get lost in the accidental complexity and not take a step back. Answers on Stack Overflow generally carry a large degree of gravitas, but they should not be assumed to be infallible. All documentation can go out of date even if there are (seemingly) many eyes watching over it.

When your mind-set is one that always assumes the bugs are of your own making, unless the evidence is overwhelming, then those times when you might actually not be entirely at fault seem to feel all the more embarrassing when you realise the answer was probably there all along but you discounted it too early because your train of thought was elsewhere.

Accelerating your C++ on GPU with SYCL

Leveraging the power of graphics cards for compute applications is all the rage right now in fields such as machine learning, computer vision and high-performance computing. Technologies like OpenCL expose this power through a hardware-independent programming model, allowing you to write code which abstracts over different architecture capabilities. The dream of this is “write once, run anywhere”, be it an Intel CPU, AMD discrete GPU, DSP, etc. Unfortunately, for everyday programmers, OpenCL has something of a steep learning curve; a simple Hello World program can be a hundred or so lines of pretty ugly-looking code. However, to ease this pain, the Khronos group have developed a new standard called SYCL, which is a C++ abstraction layer on top of OpenCL. Using SYCL, you can develop these general-purpose GPU (GPGPU) applications in clean, modern C++ without most of the faff associated with OpenCL. Here’s a simple vector multiplication example written in SYCL using the parallel STL implementation:

#include <vector>
#include <iostream>

#include <sycl/execution_policy>
#include <experimental/algorithm>
#include <sycl/helpers/sycl_buffers.hpp>

using namespace std::experimental::parallel;
using namespace sycl::helpers;

int main() {
  constexpr size_t array_size = 1024*512;
  std::array<cl::sycl::cl_int, array_size> a;
  std::iota(begin(a),end(a),0);

  {
    cl::sycl::buffer<int> b(a.data(), cl::sycl::range<1>(a.size()));
    cl::sycl::queue q;
    sycl::sycl_execution_policy<class Mul> sycl_policy(q);
    transform(sycl_policy, begin(b), end(b), begin(b),
              [](int x) { return x*2; });
  }
}

For comparison, here’s a mostly equivalent version written in OpenCL using the C++ API (don’t spend much time reading this, just note that it looks ugly and is really long):

#include <iostream>
#include <array>
#include <numeric>
#include <CL/cl.hpp>

int main(){
    std::vector<cl::Platform> all_platforms;
    cl::Platform::get(&all_platforms);
    if(all_platforms.size()==0){
        std::cout<<" No platforms found. Check OpenCL installation!\n";
        exit(1);
    }
    cl::Platform default_platform=all_platforms[0];

    std::vector<cl::Device> all_devices;
    default_platform.getDevices(CL_DEVICE_TYPE_ALL, &all_devices);
    if(all_devices.size()==0){
        std::cout<<" No devices found. Check OpenCL installation!\n";
        exit(1);
    }

    cl::Device default_device=all_devices[0];
    cl::Context context({default_device});

    cl::Program::Sources sources;
    std::string kernel_code=
        "   void kernel mul2(global int* A){"
        "       A[get_global_id(0)]=A[get_global_id(0)]*2;"
        "   }";
    sources.push_back({kernel_code.c_str(),kernel_code.length()});

    cl::Program program(context,sources);
    if(program.build({default_device})!=CL_SUCCESS){
        std::cout<<" Error building: "<<program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(default_device)<<"\n";
        exit(1);
    }

    constexpr size_t array_size = 1024*512;
    std::array<cl_int, array_size> a;
    std::iota(begin(a),end(a),0);

    cl::Buffer buffer_A(context,CL_MEM_READ_WRITE,sizeof(int)*a.size());
    cl::CommandQueue queue(context,default_device);

    if (queue.enqueueWriteBuffer(buffer_A,CL_TRUE,0,sizeof(int)*a.size(),a.data()) != CL_SUCCESS) {
        std::cout << "Failed to write memory;n";
        exit(1);
    }

    cl::Kernel kernel_add = cl::Kernel(program,"mul2");
    kernel_add.setArg(0,buffer_A);

    if (queue.enqueueNDRangeKernel(kernel_add,cl::NullRange,cl::NDRange(a.size()),cl::NullRange) != CL_SUCCESS) {
        std::cout << "Failed to enqueue kernel\n";
        exit(1);
    }

    if (queue.finish() != CL_SUCCESS) {
        std::cout << "Failed to finish kernel\n";
        exit(1);
    }

    if (queue.enqueueReadBuffer(buffer_A,CL_TRUE,0,sizeof(int)*a.size(),a.data()) != CL_SUCCESS) {
        std::cout << "Failed to read result\n";
        exit(1);
    }
}

In this post I’ll give an introduction on using SYCL to accelerate your C++ code on the GPU.


Lightning intro to GPGPU

Before I get started on how to use SYCL, I’ll give a brief outline of why you might want to run compute jobs on the GPU for those who are unfamiliar. If you’ve already used OpenCL, CUDA or similar, feel free to skip ahead.

The key difference between a GPU and a CPU is that, rather than having a small number of complex, powerful cores (1-8 for common consumer desktop hardware), a GPU has a huge number of small, simple processing elements.

CPU architecture

Above is a comically simplified diagram of a CPU with four cores. Each core has a set of registers and is attached to various levels of cache (some might be shared, some not), and then main memory.

GPU architecture

In the GPU, tiny processing elements are grouped into execution units. Each processing element has a bit of memory attached to it, and each execution unit has some memory shared between its processing elements. After that, there’s some GPU-wide memory, then the same main memory which the CPU uses. The elements within an execution unit execute in lockstep, where each element executes the same instruction on a different piece of data.

The benefit which GPUs bring is the amount of data which can be processed at the same time. If you’re on a CPU, maybe you can process a large handful of pixels at a given time if you use multithreading and vector instructions, but GPUs can process orders of magnitude more than this. The sheer size of data which can be processed at once in GPUs makes them very well-suited to applications like graphics (duh), mathematical processing, neural networks, etc.

There are many aspects of GPGPU programming which make it an entirely different beast to everyday CPU programming. For example, transferring data from main memory to the GPU is slow. Really slow. Like, kill all your performance and get you fired slow. Therefore, the tradeoff with GPU programming is to make as much of the ridiculously high throughput of your accelerator to hide the latency of shipping the data to and from it.

There are other issues which might not be immediately apparent, like the cost of branching. Since the processing elements in an execution unit work in lockstep, nested branches which cause them to take different paths (divergent control flow) is a real problem. This is often solved by executing all branches for all elements and masking out the unneeded results. That’s a polynomial explosion in complexity based on the level of nesting, which is A Bad Thing ™. Of course, there are optimizations which can aid this, but the idea stands: simple assumptions and knowledge you bring from the CPU world might cause you big problems in the GPU world.

Before we get back to SYCL, some short pieces of terminology. The host is the main CPU running on your machine which executes your normal C or C++ code, and the device is what will be running your OpenCL code. A device could be the same as the host, or it could be some accelerator sitting in your machine, a simulator, whatever. A kernel is a special function which is the entry point to the code which will run on your device. It will often be supplied with buffers for input and output data which have been set up by the host.


Back to SYCL

There are currently two implementations of SYCL available: triSYCL, an experimental open source version by Xilinx (mostly used as a testbed for the standard), and ComputeCpp, an industry-strength implementation by Codeplay1 (currently in open beta). Only ComputeCpp supports execution of kernels on the GPU, so we’ll be using that in this post.

Step 1 is to get ComputeCpp up and running on your machine. The main components are a runtime library which implements the SYCL API, and a Clang-based compiler which compiles both your host code and your device code. At the time of writing, Intel CPUs and some AMD GPUs are officially supported on Ubuntu and CentOS. It should be pretty easy to get it working on other Linux distributions (I got it running on my Arch system, for instance). Support for more hardware and operating systems is being worked on, so check the supported platforms document for an up-to-date list. The dependencies and components are listed here. You might also want to download the SDK, which contains samples, documentation, build system integration files, and more. I’ll be using the SYCL Parallel STL in this post, so get that if you want to play along at home.

Once you’re all set up, we can get GPGPUing! As noted in the introduction, my first sample used the SYCL parallel STL implementation. We’ll now take a look at how to write that code with bare SYCL.

#include <CL/sycl.hpp>

#include <array>
#include <numeric>
#include <iostream>

int main() {
      const size_t array_size = 1024*512;
      std::array<cl::sycl::cl_int, array_size> in,out;
      std::iota(begin(in),end(in),0);

      {
          cl::sycl::queue device_queue;
          cl::sycl::range<1> n_items{array_size};
          cl::sycl::buffer<cl::sycl::cl_int, 1> in_buffer(in.data(), n_items);
          cl::sycl::buffer<cl::sycl::cl_int, 1> out_buffer(out.data(), n_items);

          device_queue.submit([&](cl::sycl::handler &cgh) {
              constexpr auto sycl_read = cl::sycl::access::mode::read;
              constexpr auto sycl_write = cl::sycl::access::mode::write;

              auto in_accessor = in_buffer.get_access<sycl_read>(cgh);
              auto out_accessor = out_buffer.get_access<sycl_write>(cgh);

              cgh.parallel_for<class VecScalMul>(n_items,
                  [=](cl::sycl::id<1> wiID) {
                      out_accessor[wiID] = in_accessor[wiID]*2;
                  });
         });
     }
}

I’ll break this down piece-by-piece.

#include <CL/sycl.hpp>

The first thing we do is include the SYCL header file, which will put the SYCL runtime library at our command.

const size_t array_size = 1024*512;
std::array<cl::sycl::cl_int, array_size> in,out;
std::iota(begin(in),end(in),0);

Here we construct a large array of integers and initialize it with the numbers from 0 to array_size-1 (this is what std::iota does). Note that we use cl::sycl::cl_int to ensure compatibility.

{
    //...
}

Next we open up a new scope. This achieves two things:

  1. device_queue will be destructed at the end of the scope, which will block until the kernel has completed.
  2. in_buffer and out_buffer will also be destructed, which will force data tranfer back to the host and allow us to access the data from in and out.
cl::sycl::queue device_queue;

Now we create our command queue. The command queue is where all work (kernels) will be enqueued before being dispatched to the device. There are many ways to customise the queue, such as providing a device to enqueue on or setting up asynchronous error handlers, but the default constructor will do for this example; it looks for a compatible GPU and falls back on the host CPU if it fails.

cl::sycl::range<1> n_items{array_size};

Next we create a range, which describes the shape of the data which the kernel will be executing on. In our simple example, it’s a one-dimensional array, so we use cl::sycl::range<1>. If the data was two-dimensional we would use cl::sycl::range<2> and so on. Alongside cl::sycl::range, there is cl::sycl::ndrange, which allows you to specify work group sizes as well as an overall range, but we don’t need that for our example.

cl::sycl::buffer<cl::sycl::cl_int, 1> in_buffer(in.data(), n_items);
cl::sycl::buffer<cl::sycl::cl_int, 1> out_buffer(out.data(), n_items);

In order to control data sharing and transfer between the host and devices, SYCL provides a buffer class. We create two SYCL buffers to manage our input and output arrays.

      device_queue.submit([&](cl::sycl::handler &cgh) {/*...*/});

After setting up all of our data, we can enqueue our actual work. There are a few ways to do this, but a simple method for setting up a parallel execution is to call the .submit function on our queue. To this function we pass a command group functor2 which will be executed when the runtime schedules that task. A command group handler sets up any last resources needed by the kernel and dispatches it.

constexpr auto sycl_read = cl::sycl::access::mode::read;
constexpr auto sycl_write = cl::sycl::access::mode::write;

auto in_accessor = in_buffer.get_access<sycl_read>(cgh);
auto out_accessor = out_buffer.get_access<sycl_write>(cgh);

In order to control access to our buffers and to tell the runtime how we will be using the data, we need to create accessors. It should be clear that we are creating one accessor for reading from in_buffer, and one accessor for writing to out_buffer.

cgh.parallel_for<class VecScalMul>(n_items,
    [=](cl::sycl::id<1> wiID) {
         out_accessor[wiID] = in_accessor[wiID]*2;
    });

Now that we’ve done all the setup, we can actually do some computation on our device. Here we dispatch a kernel on the command group handler cgh over our range n_items. The actual kernel itself is a lambda which takes a work-item identifier and carries out our computation. In this case, we are reading from in_accessor at the index of our work-item identifier, multiplying it by 2, then storing the result in the relevant place in out_accessor. That <class VecScalMul> is an unfortunate byproduct of how SYCL needs to work within the confines of standard C++, so we need to give a unique class name to the kernel for the compiler to be able to do its job.

}

After this point, our kernel will have completed and we could access out and expect to see the correct results.

There are quite a few new concepts at play here, but hopefully you can see the power and expressibility we get using these techniques. However, if you just want to toss some code at your GPU and not worry about the customisation, then you can use the SYCL Parallel STL implementation.


SYCL Parallel STL

The SYCL Parallel STL is an implementation of the Parallelism TS which dispatches your algorithm function objects as SYCL kernels. We already saw an example of this at the top of the page, so let’s run through it quickly.

#include <vector>
#include <iostream>

#include <sycl/execution_policy>
#include <experimental/algorithm>
#include <sycl/helpers/sycl_buffers.hpp>

using namespace std::experimental::parallel;
using namespace sycl::helpers;

int main() {
  constexpr size_t array_size = 1024*512;
  std::array<cl::sycl::cl_int, array_size> in,out;
  std::iota(begin(in),end(in),0);

  {
    cl::sycl::buffer<int> in_buffer(in.data(), cl::sycl::range<1>(in.size()));
    cl::sycl::buffer<int> out_buffer(out.data(), cl::sycl::range<1>(out.size()));
    cl::sycl::queue q;
    sycl::sycl_execution_policy<class Mul> sycl_policy(q);
    transform(sycl_policy, begin(in_buffer), end(in_buffer), begin(out_buffer),
              [](int x) { return x*2; });
  }
}
  constexpr size_t array_size = 1024*512;
  std::array<cl::sycl::cl_int, array_size> in, out;
  std::iota(begin(in),end(in),0);

So far, so similar. Again we’re creating a couple of arrays to hold our input and output data.

cl::sycl::buffer<int> in_buffer(in.data(), cl::sycl::range<1>(in.size()));
cl::sycl::buffer<int> out_buffer(out.data(), cl::sycl::range<1>(out.size()));
cl::sycl::queue q;

Here we are creating our buffers and our queue like in the last example.

sycl::sycl_execution_policy<class Mul> sycl_policy(q);

Here’s where things get interesting. We create a sycl_execution_policy from our queue and give it a name to use for the kernel. This execution policy can then be used like std::execution::par or std::execution::seq.

transform(sycl_policy, begin(in_buffer), end(in_buffer), begin(out_buffer),
          [](int x) { return x*2; });

Now our kernel dispatch looks like a call to std::transform with an execution policy provided. That closure we pass in will be compiled for and executed on the device without us having to do any more complex set up.

Of course, you can do more than just transform. At the time of writing, the SYCL Parallel STL supports these algorithms:

  • sort
  • transform
  • for_each
  • for_each_n
  • count_if
  • reduce
  • inner_product
  • transform_reduce

That covers things for this short introduction. If you want to keep up to date with developments in SYCL, be sure to check out sycl.tech. Notable recent developments have been porting Eigen and Tensorflow to SYCL to bring expressive artificial intelligence programming to OpenCL devices. Personally, I’m excited to see how the high-level programming models can be exploited for automatic optimization of heterogeneous programs, and how they can support even higher-level technologies like HPX or SkelCL.


  1. I work for Codeplay, but this post was written in my own time with no suggestion from my employer. 

  2. Hey, “functor” is in the spec, don’t @ me. 

C++ iterator wrapping a stream not 1-1

Series: Iterator, Iterator Wrapper, Non-1-1 Wrapper

Sometimes we want to write an iterator that consumes items from some underlying iterator but produces its own items slower than the items it consumes, like this:

ColonSep items("aa:foo::x");
// Prints "aa, foo, , x"
for(auto s : items)
{
    std::cout << s << ", ";
}

When we pass a 9-character string (i.e. an iterator that yields 9 items) to ColonSep, above, we only repeat 4 times in our loop, because ColonSep provides an iterable range that yields one value for each whole word it finds in the underlying iterator of 9 characters.

To do something like this I'd recommend consuming the items of the underlying iterator early, so it is ready when requested with operator*. We also need our iterator class to hold on to the end of the underlying iterator as well as the current position.

First we need a small container to hold the next item we will provide:

struct maybestring
{
    std::string value_;
    bool at_end_;

    explicit maybestring(const std::string value)
    : value_(value)
    , at_end_(false)
    {}

    explicit maybestring()
    : value_("--error-past-end--")
    , at_end_(true)
    {}
};

A maybestring either holds the next item we will provide, or at_end_ is true, meaning we have reached the end of the underlying iterator and we will report that we are at the end ourself when asked.

Like the simpler iterators we have looked at, we still need a little container to return from the postincrement operator:

class stringholder
{
    const std::string value_;
public:
    stringholder(const std::string value) : value_(value) {}
    std::string operator*() const { return value_; }
};

Now we are ready to write our iterator class, which always has the next value ready in its next_ member, and holds on to the current and end positions of the underlying iterator in wrapped_ and wrapped_end_:

class myit
{
private:
    typedef std::string::const_iterator wrapped_t;
    wrapped_t wrapped_;
    wrapped_t wrapped_end_;
    maybestring next_;

The constructor holds on the underlying iterator pointers, and immediately fills next_ with the next value by calling next_item passing in true to indicate that this is the first item:

public:
    myit(wrapped_t wrapped, wrapped_t wrapped_end)
    : wrapped_(wrapped)
    , wrapped_end_(wrapped_end)
    , next_(next_item(true))
    {
    }

    // Previously provided by std::iterator
    typedef int                     value_type;
    typedef std::ptrdiff_t          difference_type;
    typedef int*                    pointer;
    typedef int&                    reference;
    typedef std::input_iterator_tag iterator_category;

next_item looks like this:

private:
    maybestring next_item(bool first_time)
    {
        if (wrapped_ == wrapped_end_)
        {
            return maybestring();  // We are at the end
        }
        else
        {
            if (!first_time)
            {
                ++wrapped_;
            }
            return read_item();
        }
    }

next_item recognises whether we've reached the end of the underlying iterator and saves the empty maybstring if so. Otherwise, it skips forward once (unless we are on the first element) and then calls read_item:

    maybestring read_item()
    {
        std::string ret = "";
        for (; wrapped_ != wrapped_end_; ++wrapped_)
        {
            char c = *wrapped_;
            if (c == ':')
            {
                break;
            }
            ret += c;
        }
        return maybestring(ret);
    }

read_item implements the real logic of looping through the underlying iterator and combining those values together to create the next item to provide.

The hard part of the iterator class is done, leaving only the more normal functions we must provide:

public:
    value_type operator*() const
    {
        assert(!next_.at_end_);
        return next_.value_;
    }

    bool operator==(const myit& other) const
    {
        // We only care about whether we are at the end
        return next_.at_end_ == other.next_.at_end_;
    }

    bool operator!=(const myit& other) const { return !(*this == other); }

    stringholder operator++(int)
    {
        assert(!next_.at_end_);
        stringholder ret(next_.value_);
        next_ = next_item(false);
        return ret;
    }

    myit& operator++()
    {
        assert(!next_.at_end_);
        next_ = next_item(false);
        return *this;
    }
}

Note that operator== is only concerned with whether or not we are an end iterator or not. Nothing else matters for providing correct iteration.

Our final bit of bookkeeping is the range class that allows our new iterator to be used in a for loop:

class ColonSep
{
private:
    const std::string str_;
public:
    ColonSep(const std::string str) : str_(str) {}
    myit begin() { return myit(std::begin(str_), std::end(str_)); }
    myit end()   { return myit(std::end(str_),   std::end(str_)); }
};

A lot of the code above is needed for all code that does this kind of job. Next time we'll look at how to use templates to make it useable in the general case.

Why I don’t like getter and setter functions in C++, reason #314.15

This is a post I wrote several years ago and it’s been languishing in my drafts folder ever since. I’m not working on this particular codebase any more. That said, the problems caused by using Java-like getter and setter functions as the sole interface to the object in the context described in the post have […]

The post Why I don’t like getter and setter functions in C++, reason #314.15 appeared first on The Lone C++ Coder's Blog.

LINQ: Did You Mean First(), or Really Single()?

TL;DR: if you see someone using the LINQ method First() without a comparator it’s probably a bug and they should have used Single().

I often see code where the author “knows” that a sequence (i.e. an Enumerable<T>) will result in just one element and so they use the LINQ method First() to retrieve the value, e.g.

var value = sequence.First();

However there is also the Single() method which could be used to achieve a similar outcome:

var value = sequence.Single();

So what’s the difference and why do I think it’s probably a bug if you used First?

Both First and Single have the same semantics for the case where the the sequence is empty (they throw) and similarly when the sequence contains only a single element (they return it). The difference however is when the sequence contains more than one element – First discards the extra values and Single will throw an exception.

If you’re used to SQL it’s the difference between using “top” to filter and trying extract a single scalar value from a subquery:

select top 1 x as [value] from . . .

and

select a, (select x from . . .) as [value] from . . .

(The latter tends to complain loudly if the result set from the subquery is not just a single scalar value or null.)

While you might argue that in the face of a single-value sequence both methods could be interchangeable, to me they say different things with Single begin the only “correct” choice.

Seeing First says to me that the author knows the sequence might contain multiple values and they have expressed an ordering which ensures the right value will remain after the others have been consciously discarded.

Whereas Single suggests to me that the author knows this sequence contains one (and only one) element and that any other number of elements is wrong.

Hence another big clue that the use of First is probably incorrect is the absence of a comparator function used to order the sequence. Obviously it’s no guarantee as the sequence might be being returned from a remote service or function which will do the sorting instead but I’d generally expect to see the two used together or some other clue (method or variable name, or parameter) nearby which defines the order.

The consequence of getting this wrong is that you don’t detect a break in your expectations (a multi-element sequence). If you’re lucky it will just be a test that starts failing for a strange reason, which is where I mostly see this problem showing up. If you’re unlucky then it will silently fail and you’ll be using the wrong data which will only manifest itself somewhere further down the road where it’s harder to trace back.

LINQ: Did You Mean First(), or Really Single()?

TL;DR: if you see someone using the LINQ method First() without a comparator it’s probably a bug and they should have used Single().

I often see code where the author “knows” that a sequence (i.e. an Enumerable<T>) will result in just one element and so they use the LINQ method First() to retrieve the value, e.g.

var value = sequence.First();

However there is also the Single() method which could be used to achieve a similar outcome:

var value = sequence.Single();

So what’s the difference and why do I think it’s probably a bug if you used First?

Both First and Single have the same semantics for the case where the the sequence is empty (they throw) and similarly when the sequence contains only a single element (they return it). The difference however is when the sequence contains more than one element – First discards the extra values and Single will throw an exception.

If you’re used to SQL it’s the difference between using “top” to filter and trying extract a single scalar value from a subquery:

select top 1 x as [value] from . . .

and

select a, (select x from . . .) as [value] from . . .

(The latter tends to complain loudly if the result set from the subquery is not just a single scalar value or null.)

While you might argue that in the face of a single-value sequence both methods could be interchangeable, to me they say different things with Single begin the only “correct” choice.

Seeing First says to me that the author knows the sequence might contain multiple values and they have expressed an ordering which ensures the right value will remain after the others have been consciously discarded.

Whereas Single suggests to me that the author knows this sequence contains one (and only one) element and that any other number of elements is wrong.

Hence another big clue that the use of First is probably incorrect is the absence of a comparator function used to order the sequence. Obviously it’s no guarantee as the sequence might be being returned from a remote service or function which will do the sorting instead but I’d generally expect to see the two used together or some other clue (method or variable name, or parameter) nearby which defines the order.

The consequence of getting this wrong is that you don’t detect a break in your expectations (a multi-element sequence). If you’re lucky it will just be a test that starts failing for a strange reason, which is where I mostly see this problem showing up. If you’re unlucky then it will silently fail and you’ll be using the wrong data which will only manifest itself somewhere further down the road where it’s harder to trace back.

Catch Up

Trolley

Stock image from Shutterstock

It's been just over six years since I first announced Catch to the world as a brand new C++ test framework!

In that time it has matured to the point that it can take on the heavyweights - while still staying true to its original goals of being lightweight, easy to get started with and low-friction to work with.

In the last couple of years or so it has also increased dramatically in popularity! That sounds like a good thing - and it is - but with that comes a greater diversity of environments and usage, and more people raising issues and submitting pull requests.

Again, it's great to have so much input from the community - especially in the form of pull requests - where other developers have gone to some effort to implement a change, or a fix, and present it back for inclusion in the main project. So it's been heart-breaking for me that, between this increase in volume and finding my meagre free-time stretched even further, so many issues and PRs have been left unacknowledged - many not even seen by me in the first place.

But two things have happened, recently, that completely change this state of affairs. We're moving firmly in the right direction again.

Firstly, as mentioned in On Joining JetBrains, I've recently changed jobs to one that should give me much more time and opportunity to work on Catch - as well as the opportunity to do so in my home office - with stable internet (as opposed to on the train while commuting to and from work). The first few months were a bit of a wash for the reasons discussed in that post, but, as I also suggested there, this year has seen that change and I've been able to put in quite a lot of work on Catch already.

But that's not really enough. There's a huge back-log - and I'm still only doing this part time - and I want to spend time working on Catch2 as well (more on that soon). I don't want to end up back in the situation where everything is backing up and there's no hope of recovery.

I've been hoping to find someone else to be a key maintainer of Catch for a couple of years now. I've not been very active in this search - for all the same reasons - but it's been on my mind.

But, just last month, after I appeared on CppCast talking about JetBrains and Catch, there was a thread on Reddit about it - with many expressing concern over the Catch situation. I brought the subject up on there again and got the attention of one of the commenters.

I didn't know it at the time, but Martin Hořeňovský has been responsible for a good number of those PRs and issues that had been left unaddressed - as well as an active community member in helping address other people's issues. So it's with great pleasure (and relief!) that I can announce that Martin now has full commit rights to Catch on GitHub and has been prolific in working through the currently outstanding tickets.

Martin seems to really "get" Catch, and the design goals around it - so working with him on this the last couple of weeks has been very rewarding. From some queries I just ran on GitHub it looks like 39 issues have been closed and 38 PRs merged or closed in that time! That's compared to 9 new issues and 7 PRs - about half of which were created by Martin and I in the process. And that's not to mention all the labels we've been using to categorise the other tickets - with many marked as "Resolved - pending review" - which usually means we think it's resolved but we're just waiting for feedback (or a chance for more testing).

With 219 open issues and 41 PRs still outstanding, at time of writing, there's a lot more work to do yet - but I hope this reassures you that we're going in the right direction - and fast!

And we're not stopping with Martin. We have at least one other volunteer that I'll be bringing up to speed soon.

Catch2

I've referred to Catch2 a number of times now, and talked a little about what it will be. The biggest reason for making it a major release, according to Semantic Versioning, is that it will drop support for pre-C++11. For that reason Catch Classic (1.x) will continue to receive at least bug fix updates - but no more new features once Catch2 is fully released. A few major features in the pipeline have been explicitly deferred to Catch2: concurrency support and generators/ property-based testing in particular.

Moving to C++11 provides a very large scope for cleaning up the code-base - which has a significant volume of code dedicated to platform-specific workarounds for compiler shortcomings, missing library features such as smart pointers, and boilerplate that will no longer be necessary with things like range-based-for, auto and others. Lambdas will be useful too, but are not quite so important.

Because taking advantage of C++11 has the potential to touch almost every line of code, I'm taking the opportunity to rewrite the core of Catch - primarily the assertion macros and the infrastructure to support that. This is code that is #included in every test file, and expanded (in the case of macros) in every test case or even every assertion. Keeping this code lightweight is essential to avoiding a compile time hit. There's a number of ways this foot-print can be reduced and the rewrite will strive for this as much as possible.

The rest of the code, concerned with maintaining the registry of tests, parsing and interpreting the command line, running tests and reporting results, will be updated more incrementally.

I already have a (not-yet-public) proof-of-concept version of the re-written code. It's not yet complete but, so far, has only one standard library dependency and minimal templates. The compile-time overhead is imperceptible.

In addition to compile-time, runtime performance is also a goal of Catch2. It's not an overriding goal - I won't be obfuscating the code in the name of wringing out the last few milliseconds of performance - but this is a definite change from Catch Classic where runtime performance was a non-goal. This is in recognition of the fact that Catch is used for more than just isolated unit tests - and will also become more important with property based testing.

I don't have a timeline, yet, for when I expect Catch2 to be ready - and in the immediate term getting Catch Classic back under control is the priority. Despite the partial re-write, and the major version increment, I expect tests written against Catch Classic to mostly "just work" with Catch2 - or require very minimal changes in a some rare cases.

You

As already mentioned many developers have also spent time and effort contributing issues, fixes and even feature PRs over the years. So Catch has really been a community project for years now and I'm very grateful for all the help and support. I think Catch has shown that having a low-friction approach to testing C++ code is very important to a lot of people and I'm hoping we'll continue to build on that. Thank you all.

Catch Up

Trolley

It's been just over six years since I first announced Catch to the world as a brand new C++ test framework!

In that time it has matured to the point that it can take on the heavyweights - while still staying true to its original goals of being lightweight, easy to get started with and low-friction to work with.

In the last couple of years or so it has also increased dramatically in popularity! That sounds like a good thing - and it is - but with that comes a greater diversity of environments and usage, and more people raising issues and submitting pull requests.

Again, it's great to have so much input from the community - especially in the form of pull requests - where other developers have gone to some effort to implement a change, or a fix, and present it back for inclusion in the main project. So it's been heart-breaking for me that, between this increase in volume and finding my meagre free-time stretched even further, so many issues and PRs have been left unacknowledged - many not even seen by me in the first place.

But two things have happened, recently, that completely change this state of affairs. We're moving firmly in the right direction again.

Firstly, as mentioned in On Joining JetBrains, I've recently changed jobs to one that should give me much more time and opportunity to work on Catch - as well as the opportunity to do so in my home office - with stable internet (as opposed to on the train while commuting to and from work). The first few months were a bit of a wash for the reasons discussed in that post, but, as I also suggested there, this year has seen that change and I've been able to put in quite a lot of work on Catch already.

But that's not really enough. There's a huge back-log - and I'm still only doing this part time - and I want to spend time working on Catch2 as well (more on that soon). I don't want to end up back in the situation where everything is backing up and there's no hope of recovery.

I've been hoping to find someone else to be a key maintainer of Catch for a couple of years now. I've not been very active in this search - for all the same reasons - but it's been on my mind.

But, just last month, after I appeared on CppCast talking about JetBrains and Catch, there was a thread on Reddit about it - with many expressing concern over the Catch situation. I brought the subject up on there again and got the attention of one of the commenters.

I didn't know it at the time, but Martin Hořeňovský has been responsible for a good number of those PRs and issues that had been left unaddressed - as well as an active community member in helping address other people's issues. So it's with great pleasure (and relief!) that I can announce that Martin now has full commit rights to Catch on GitHub and has been prolific in working through the currently outstanding tickets.

Martin seems to really "get" Catch, and the design goals around it - so working with him on this the last couple of weeks has been very rewarding. From some queries I just ran on GitHub it looks like 39 issues have been closed and 38 PRs merged or closed in that time! That's compared to 9 new issues and 7 PRs - about half of which were created by Martin and I in the process. And that's not to mention all the labels we've been using to categorise the other tickets - with many marked as "Resolved - pending review" - which usually means we think it's resolved but we're just waiting for feedback (or a chance for more testing).

With 219 open issues and 41 PRs still outstanding, at time of writing, there's a lot more work to do yet - but I hope this reassures you that we're going in the right direction - and fast!

And we're not stopping with Martin. We have at least one other volunteer that I'll be bringing up to speed soon.

Catch2

I've referred to Catch2 a number of times now, and talked a little about what it will be. The biggest reason for making it a major release, according to Semantic Versioning, is that it will drop support for pre-C++11. For that reason Catch Classic (1.x) will continue to receive at least bug fix updates - but no more new features once Catch2 is fully released. A few major features in the pipeline have been explicitly deferred to Catch2: concurrency support and generators/ property-based testing in particular.

Moving to C++11 provides a very large scope for cleaning up the code-base - which has a significant volume of code dedicated to platform-specific workarounds for compiler shortcomings, missing library features such as smart pointers, and boilerplate that will no longer be necessary with things like range-based-for, auto and others. Lambdas will be useful too, but are not quite so important.

Because taking advantage of C++11 has the potential to touch almost every line of code, I'm taking the opportunity to rewrite the core of Catch - primarily the assertion macros and the infrastructure to support that. This is code that is #included in every test file, and expanded (in the case of macros) in every test case or even every assertion. Keeping this code lightweight is essential to avoiding a compile time hit. There's a number of ways this foot-print can be reduced and the rewrite will strive for this as much as possible.

The rest of the code, concerned with maintaining the registry of tests, parsing and interpreting the command line, running tests and reporting results, will be updated more incrementally.

I already have a (not-yet-public) proof-of-concept version of the re-written code. It's not yet complete but, so far, has only one standard library dependency and minimal templates. The compile-time overhead is imperceptible.

In addition to compile-time, runtime performance is also a goal of Catch2. It's not an overriding goal - I won't be obfuscating the code in the name of wringing out the last few milliseconds of performance - but this is a definite change from Catch Classic where runtime performance was a non-goal. This is in recognition of the fact that Catch is used for more than just isolated unit tests - and will also become more important with property based testing.

I don't have a timeline, yet, for when I expect Catch2 to be ready - and in the immediate term getting Catch Classic back under control is the priority. Despite the partial re-write, and the major version increment, I expect tests written against Catch Classic to mostly "just work" with Catch2 - or require very minimal changes in a some rare cases.

You

As already mentioned many developers have also spent time and effort contributing issues, fixes and even feature PRs over the years. So Catch has really been a community project for years now and I'm very grateful for all the help and support. I think Catch has shown that having a low-friction approach to testing C++ code is very important to a lot of people and I'm hoping we'll continue to build on that. Thank you all.

C++17 – Why it’s better than you might think

C++20 Horizon

From Mark Isaacson's Meeting C++ talk, "Exploring C++ and beyond"

I was recently interviewed for CppCast and one the news items that came up was a trip report from a recent C++ standards meeting (Issaquah, Nov 2016). This was one of the final meetings before the C++17 standard is wrapped up, so things are looking pretty set at this point. During the discussion I made the point that, despite initially being disappointed that so many headline features were not making it in (Concepts, Modules, Coroutines and Ranges - as well as dot operator and uniform call syntax), I'm actually very happy with how C++17 is shaping up. There are some very nice refinements and features (const expr if is looking quite big on its own) - and including a few surprise ones (structured bindings being the main one for me).

But the part of what I said that surprised even me (because I hadn't really thought of it until a couple of hours before we recorded) was that perhaps it is for the best that we don't get those bigger features just yet! The thinking was that if you take them all together - or even just two or three of them - they have the potential to change the language - and the way we write "modern C++" perhaps even more so than C++11 did - and that's really saying something! Now that's a good thing, in my opinion, but I do wonder if it would be too soon for such large scale changes just yet.

After the 98 standard C++ went into a thirteen year period in the wilderness (there was C++03, which fixed a couple of problems with the 98 standard - but didn't actually add any new features - except value initialisation). As this period coincided with the rise of other mainstream languages - Java and C# in particular - it seemed that C++ was a dying language - destined for a drawn out, Cobolesque, old-age at best.

But C++11 changed all that and injected a vitality and enthusiasm into the community not seen since the late 90s - if ever! Again the timing was a factor - with Moore's Law no longer influencing single-core performance there was a resurgence of interest in low/ zero overhead systems languages - and C++11 was getting modern enough to be palatable again. "There's no such thing as a free lunch" turns out to be true if you wait long enough.

So the seismic changes in C++11 were overdue, welcome and much needed at that time. Since then the standardisation process has moved to the "train model", which has settled on a new standard every three years. Whatever is ready (and fits) makes it in. If it's not baked it's dropped - or is moved into a TS that can be given more real-world testing before being reconsidered. This has allowed momentum to be maintained and reassures us that we won't be stuck without an update to the standard for too long again.

On the other hand many code-bases are still catching up to C++11. There are not many breaking changes - and you can introduce newer features incrementally and to only parts of the code-base - but this can lead to some odd looking code and once you start converting things you tend to want to go all in. Even if that's not true for your own codebase it may be true of libraries and frameworks you depend on! Those features we wanted in C++17 could have a similar - maybe even greater - effect and my feeling is that, while they would certainly be welcomed by many (me included) - there would also be many more that might start to see the churn on the language as a sign of instability. "What? We've only just moved on to C++11 and you want us to adopt these features too?". Sometimes it can be nice to just know where you are with a language - especially after a large set of changes. 2011 might seem like a long time ago but there's a long lag in compiler conformance, then compiler adoption, then understanding and usage of newer features. Those just starting to experiment with C++11 language features are still very common.

I could be wrong about this, but it feels like there's something in it based on my experience. And I think the long gap between C++98 and C++11 is responsible for at least amplifying the effect. People got used to C++ being defined a single way and now we have three standards already in use, with another one almost ready. It's a lot to keep up with - even for those of us that enjoy that sort of thing!

So I'm really looking forward to those bigger features that we'll hopefully get in C++20 (and don't forget you can even use the TS's now if your compiler supports them - and the Ranges library is available on GitHub) - but I'm also looking forward to updating the language with C++17 and the community gaining a little more experience with the new, rapidly evolving, model of C++ before the next big push.

Surprising Defaults – HttpClient ExpectContinue

One of the things you quickly discover when moving from building services on-premise to “the cloud” is quite how many more bits of wire and kit suddenly sit between you and your consumer. Performance-wise this already elongated network path can then be further compounded when the framework you’re using invokes unintuitive behaviour by default [1].

The Symptoms

The system was a new REST API built in C# on the .Net framework (4.6) and hosted in the cloud with AWS. This AWS endpoint was then further fronted by Akamai for various reasons. The initial consumer was an on-premise adaptor (also written in C#) which itself had to go through an enterprise grade web proxy to reach the outside world.

Naturally monitoring was added in fairly early on so that we could start to get a feel for how much added latency moving to the cloud would bring. Our first order approximation to instrumentation allowed us to tell how long the HTTP requests took to handle along with a breakdown of the major functions, e.g. database queries and 3rd party requests. Outside the service we had some remote monitoring too that could tell us the performance from a more customer-like position.

When we integrated with the 3rd party service some poor performance stats caused us to look closer into our metrics. The vast majority of big delays were outside our control, but it also raised some other questions as the numbers didn’t quite add up. We had expected the following simple formula to account for virtually all the time:

HTTP Request Time ~= 3rd Party Time + Database Time

However we were seeing a 300 ms discrepancy in many (but not all) cases. It was not our immediate concern as there was bigger fish to fry but some extra instrumentation was added to the OWIN pipeline and we did a couple of quick local profile runs to look out for anything obviously out of place. The finger seemed to point to time lost somewhere in the Nancy part of the pipeline, but that didn’t entirely make sense at the time so it was mentally filed away and we moved on.

Serendipity Strikes

Whilst talking to the 3rd party about our performance woes with their service they came back to us and asked if we could stop sending them a “Expect: 100-Continue” header in our HTTP requests.

This wasn’t something anyone in the team was aware of and as far as we could see from the various RFCs and blog posts it was something “naturally occurring” on the internet. We also didn’t know if it was us adding it or one of the many proxies in between us and them.

We discovered how to turn it off, and did, but it made little difference to the performance problems we had with them, which were in the order of seconds, not milliseconds. Feeling uncomfortable about blindly switching settings off without really understanding them we reverted the change.

The mention of this header also cropped up when we started investigating some errors we were getting from Akamai that seemed to be more related to a disparity in idle connection timeouts.

Eventually, as we learned more about this mysterious header someone in the team put two-and-two together and realised this was possibly where our missing time was going too.

The Cause

Our REST API uses PUT requests to add resources and it appears that the default behaviour of the .Net HttpClient class is to enable the sending of this “Expect: 100-Continue” header for those types of requests. Its purpose is to tell the server that the headers have been sent but that it will delay sending the body until it receives a 100-Continue style response. At that point the client sends the body, the server can then process the entire request and the response is handled by the client as per normal.

Yes, that’s right, it splits the request up so that it takes two round trips instead of one!

Now you can probably begin to understand why our request handling time appeared elongated and why it also appeared to be consumed somewhere within the Nancy framework. The request processing is started and handled by the OWN middleware as that only depends on the headers, it then enters Nancy which finds a handler, and so requests the body in the background (asynchronously). When it finally arrives the whole request is then passed to our Nancy handler just as if it had been sent all as a single chunk.

The Cure

When you google this problem with relation to .Net you’ll see that there are a couple of options here. We were slightly nervous about choosing the nuclear option (setting it globally on the ServicePointManager) and instead added an extra line into our HttpClient factory so that it was localised:

var client = new HttpClient(...);
...
client.DefaultRequestHeaders.ExpectContinue = false;

We re-deployed our services, checked our logs to ensure the header was no longer being sent, and then checked the various metrics to see if the time was now all accounted for, and it was.

Epilogue

In hindsight this all seems fairly obvious, at least, once you know what this header is supposed to do, and yet none of the people in my team (who are all pretty smart) joined up the dots right away. When something like this goes astray I like to try and make sense of why we didn’t pick it up as quickly as perhaps we should have.

In the beginning there were so many new things for the team to grasp. The difference in behaviour between our remote monitoring and on-premise adaptor was assumed to be one of infrastructure especially when we had already battled the on-premise web proxy a few times [2]. We saw so many other headers in our requests that we never added so why would we assume this one was any different (given none of us had run across it before)?

Given the popularity and maturity of the Nancy framework we surmised that no one would use it if there was the kind of performance problems we were seeing, so once again were confused as to how the time could appear to be lost inside it. Although we were all aware of what the async/await construct does none of us had really spent any serious time trying to track down performance anomalies in code that used it so liberally and so once again we had difficulties understanding perhaps what the tool was really telling us.

Ultimately though the default behaviour just seems so utterly wrong that none of use could imagine the out-of-the-box settings would cause the HttpClient to behave this way. By choosing this default we are in essence optimising PUT requests for the scenario where the body does not need sending, which we all felt is definitely the exception not the norm. Aside from large file uploads or massive write contention we were struggling to come up with a plausible use case.

I don’t know what forces caused this decision to be made as I clearly wasn’t there and I can’t find any obvious sources that might explain it either. The internet and HTTP has evolved so much over the years that it’s possible this behaviour provides the best compatibility with web servers out-of-the-box. My own HTTP experience only covers the last few years along with few more around the turn of the millennium, but my colleagues easily cover the decades I’m missing so I don’t feel I’m missing anything obvious.

Hopefully some kind soul will use the comments section to link to the rationale so we can all get a little closure on the issue.

 

[1] Violating The Principle of Least Astonishment for configuration settings was something I covered more generally before in “Sensible Defaults”.

[2] See “The Curse of NTLM Based HTTP Proxies”.

Surprising Defaults – HttpClient ExpectContinue

One of the things you quickly discover when moving from building services on-premise to “the cloud” is quite how many more bits of wire and kit suddenly sit between you and your consumer. Performance-wise this already elongated network path can then be further compounded when the framework you’re using invokes unintuitive behaviour by default [1].

The Symptoms

The system was a new REST API built in C# on the .Net framework (4.6) and hosted in the cloud with AWS. This AWS endpoint was then further fronted by Akamai for various reasons. The initial consumer was an on-premise adaptor (also written in C#) which itself had to go through an enterprise grade web proxy to reach the outside world.

Naturally monitoring was added in fairly early on so that we could start to get a feel for how much added latency moving to the cloud would bring. Our first order approximation to instrumentation allowed us to tell how long the HTTP requests took to handle along with a breakdown of the major functions, e.g. database queries and 3rd party requests. Outside the service we had some remote monitoring too that could tell us the performance from a more customer-like position.

When we integrated with the 3rd party service some poor performance stats caused us to look closer into our metrics. The vast majority of big delays were outside our control, but it also raised some other questions as the numbers didn’t quite add up. We had expected the following simple formula to account for virtually all the time:

HTTP Request Time ~= 3rd Party Time + Database Time

However we were seeing a 300 ms discrepancy in many (but not all) cases. It was not our immediate concern as there was bigger fish to fry but some extra instrumentation was added to the OWIN pipeline and we did a couple of quick local profile runs to look out for anything obviously out of place. The finger seemed to point to time lost somewhere in the Nancy part of the pipeline, but that didn’t entirely make sense at the time so it was mentally filed away and we moved on.

Serendipity Strikes

Whilst talking to the 3rd party about our performance woes with their service they came back to us and asked if we could stop sending them a “Expect: 100-Continue” header in our HTTP requests.

This wasn’t something anyone in the team was aware of and as far as we could see from the various RFCs and blog posts it was something “naturally occurring” on the internet. We also didn’t know if it was us adding it or one of the many proxies in between us and them.

We discovered how to turn it off, and did, but it made little difference to the performance problems we had with them, which were in the order of seconds, not milliseconds. Feeling uncomfortable about blindly switching settings off without really understanding them we reverted the change.

The mention of this header also cropped up when we started investigating some errors we were getting from Akamai that seemed to be more related to a disparity in idle connection timeouts.

Eventually, as we learned more about this mysterious header someone in the team put two-and-two together and realised this was possibly where our missing time was going too.

The Cause

Our REST API uses PUT requests to add resources and it appears that the default behaviour of the .Net HttpClient class is to enable the sending of this “Expect: 100-Continue” header for those types of requests. Its purpose is to tell the server that the headers have been sent but that it will delay sending the body until it receives a 100-Continue style response. At that point the client sends the body, the server can then process the entire request and the response is handled by the client as per normal.

Yes, that’s right, it splits the request up so that it takes two round trips instead of one!

Now you can probably begin to understand why our request handling time appeared elongated and why it also appeared to be consumed somewhere within the Nancy framework. The request processing is started and handled by the OWN middleware as that only depends on the headers, it then enters Nancy which finds a handler, and so requests the body in the background (asynchronously). When it finally arrives the whole request is then passed to our Nancy handler just as if it had been sent all as a single chunk.

The Cure

When you google this problem with relation to .Net you’ll see that there are a couple of options here. We were slightly nervous about choosing the nuclear option (setting it globally on the ServicePointManager) and instead added an extra line into our HttpClient factory so that it was localised:

var client = new HttpClient(...);
...
client.DefaultRequestHeaders.ExpectContinue = false;

We re-deployed our services, checked our logs to ensure the header was no longer being sent, and then checked the various metrics to see if the time was now all accounted for, and it was.

Epilogue

In hindsight this all seems fairly obvious, at least, once you know what this header is supposed to do, and yet none of the people in my team (who are all pretty smart) joined up the dots right away. When something like this goes astray I like to try and make sense of why we didn’t pick it up as quickly as perhaps we should have.

In the beginning there were so many new things for the team to grasp. The difference in behaviour between our remote monitoring and on-premise adaptor was assumed to be one of infrastructure especially when we had already battled the on-premise web proxy a few times [2]. We saw so many other headers in our requests that we never added so why would we assume this one was any different (given none of us had run across it before)?

Given the popularity and maturity of the Nancy framework we surmised that no one would use it if there was the kind of performance problems we were seeing, so once again were confused as to how the time could appear to be lost inside it. Although we were all aware of what the async/await construct does none of us had really spent any serious time trying to track down performance anomalies in code that used it so liberally and so once again we had difficulties understanding perhaps what the tool was really telling us.

Ultimately though the default behaviour just seems so utterly wrong that none of use could imagine the out-of-the-box settings would cause the HttpClient to behave this way. By choosing this default we are in essence optimising PUT requests for the scenario where the body does not need sending, which we all felt is definitely the exception not the norm. Aside from large file uploads or massive write contention we were struggling to come up with a plausible use case.

I don’t know what forces caused this decision to be made as I clearly wasn’t there and I can’t find any obvious sources that might explain it either. The internet and HTTP has evolved so much over the years that it’s possible this behaviour provides the best compatibility with web servers out-of-the-box. My own HTTP experience only covers the last few years along with few more around the turn of the millennium, but my colleagues easily cover the decades I’m missing so I don’t feel I’m missing anything obvious.

Hopefully some kind soul will use the comments section to link to the rationale so we can all get a little closure on the issue.

 

[1] Violating The Principle of Least Astonishment for configuration settings was something I covered more generally before in “Sensible Defaults”.

[2] See “The Curse of NTLM Based HTTP Proxies”.

Automated Integration Testing with TIBCO

In the past few years I’ve worked on a few projects where TIBCO has been the message queuing product of choice within the company. Naturally being a test-oriented kind of guy I’ve used unit and component tests for much of the donkey work, but initially had to shy away from writing any automated integration tests due to the inherent difficulties of getting the system into a known state in isolation.

Organisational Barriers

For any automated integration tests to run reliably we need to control the whole environment, which ideally is our development workstations but also our CI build environment (see “The Developer’s Sandbox”). The main barriers to this with a commercial product like TIBCO are often technological, but also more often than not, organisational too.

In my experience middleware like this tends to be proprietary, very expensive, and owned within the organisation by a dedicated team. They will configure the staging and production queues and manage the fault-tolerant servers, which is probably what you’d expect as you near production. A more modern DevOps friendly company would recognise the need to allow teams to test internally first and would help them get access to the product and tools so they can build their test scaffolding that provides the initial feedback loop.

Hence just being given the client access libraries to the product is not enough, we need a way to bring up and tear down the service endpoint, in isolation, so that we can test connectivity and failover scenarios and message interoperability. We also need to be able develop and test our logic around poisoned messages and dead-letter queues. And all this needs to be automatable so that as we develop and refactor we can be sure that we’ve not broken anything; manually testing this stuff is not just not scalable in a shared test environment at the pace modern software is developed.

That said, the TIBCO EMS SDK I’ve been working with (v6.3.0) has all the parts I needed to do this stuff, albeit with some workarounds to avoid needing to run the tests with administrator rights which we’ll look into later.

The only other thorny issue is licensing. You would hope that software product companies would do their utmost to get developers on their side and make it easy for them to build and test their wares, but it is often hard to get clarity around how the product can be used outside of the final production environment. For example trying to find out if the TIBCO service can be run on a developer’s workstation or in a cloud hosted VM solely for the purposes of running some automated tests has been a somewhat arduous task.

This may not be solely the fault of the underlying product company, although the old fashioned licensing agreements often do little to distinguish production and modern development use [1]. No, the real difficulty is finding the right person within the client’s company to talk to about such matters. Unless they are au fait with the role modern automated integrated testing takes place in the development process you will struggle to convince them your intended use is in the interests of the 3rd party product, not stealing revenue from them.

Okay, time to step down from the soap box and focus on the problems we can solve…

Hosting TIBEMSD as a Windows Service

From an automated testing perspective what we need access to is the TIBEMSD.EXE console application. This provides us with one or more TIBCO message queues that we can host on our local machine. Owning thing process means we can therefore create, publish to and delete queues on demand and therefore tightly control the environment.

If you only want to do basic integration testing around the sending and receiving of messages you can configure it as a Windows service and just leave it running in the background. Then your tests can just rely on it always being there like a local database or the file-system. The build machine can be configured this way too.

Unfortunately because it’s a console application and not written to be hosted as a service (at least v6.3 isn’t), you need to use a shim like SRVANY.EXE from the Windows 2003 Resource Kit or something more modern like NSSM. These tools act as an adaptor to the console application so that the Windows SCM can control them.

One thing to be careful of when running TIBEMSD in this way is that it will stick its data files in the CWD (Current Working Directory), which for a service is %SystemRoot%\System32, unless you configure the shim to change it. Putting them in a separate folder makes them a little more obvious and easier to delete when having a clear out [2].

Running TIBEMSD On Demand

Running the TIBCO server as a service makes certain kinds of tests easier to write as you don’t have to worry about starting and stopping it, unless that’s exactly the kinds of test you want to write.

I’ve found it’s all too easy when adding new code or during a refactoring to accidentally break the service so that it doesn’t behave as intended when the network goes up and down, especially when you’re trying to handle poisoned messages.

Hence I prefer to have the TIBEMSD.EXE binary included in the source code repository, in a known place so that it can be started and stopped on demand to verify the connectivity side is working properly. For those classes of integration tests where you just need it to be running you can add it to your fixture-level setup and even keep it running across fixtures to ensure the tests running at an adequate pace.

If, like me, you don’t run as an Administrator all the time (or use elevated command prompts by default) you will find that TIBEMSD doesn’t run out-of-the-box in this way. Fortunately it’s easy to overcome these two issues and run in a LUA (Limited User Account).

Only Bind to the Localhost

One of the problems is that by default the server will try and listen for remote connections from anywhere which means it wants a hole in the firewall for its default port. This of course means you’ll get that firewall popup dialog which is annoying when trying to automate stuff. Whilst you could grant it permission with a one-off NETSH ADVFIREWALL command I prefer components in test mode to not need any special configuration if at all possible.

Windows will allow sockets that only listen for connections from the local host to avoid generating the annoying firewall popup dialog (and this was finally extended to include HTTP too). However we need to tell the TIBCO server to do just that, which we can achieve by creating a trivial configuration file (e.g. localhost.conf) with the following entry:

listen=tcp://127.0.0.1:7222

Now we just need to start it with the –conf switch:

> tibemsd.exe -config localhost.conf

Suppressing the Need For Elevation

So far so good but our other problem is that when you start TIBEMSD it wants you to elevate its permissions. I presume this is a legacy thing and there may be some feature that really needs it but so far in my automated tests I haven’t hit it.

There are a number of ways to control elevation for legacy software that doesn’t have a manifest, like using an external one, but TIBEMSD does and that takes priority. Luckily for us there is a solution in the form of the __COMPAT_LAYER environment variable [3]. Setting this, either through a batch file or within our test code, supresses the need to elevate the server and it runs happily in the background as a normal user, e.g.

> set __COMPAT_LAYER=RunAsInvoker
> tibemsd.exe -config localhost.conf

Spawning TIBEMSD From Within a Test

Once we know how to run TIBEMSD without it causing any popups we are in a position to do that from within an automated test running as any user (LUA), e.g. a developer or the build machine.

In C#, the language where I have been doing this most recently, we can either hard-code a relative path [4] to where TIBEMSD.EXE resides within the repo, or read it from the test assembly’s app.config file to give us a little more flexibility.

<appSettings>
  <add key=”tibemsd.exe”
       value=”..\..\tools\TIBCO\tibemsd.exe” />
  <add key=”conf_file”
       value=”..\..\tools\TIBCO\localhost.conf” />
</appSettings>

We can also add our special .conf file to the same folder and therefore find it in the same way. Whilst we could generate it on-the-fly it never changes so I see little point in doing this extra work.

Something to be wary of if you’re using, say, NUnit to write your integration tests is that it (and ReSharper) can copy the test assemblies to a random location to aid in insuring your tests have no accidental dependencies. In this instance we do, and a rather large one at that, so we need the relative distance between where the test assemblies are built and run (XxxIntTests\bin\Debug) and the TIBEMSD.EXE binary to remain fixed. Hence we need to disable this copying behaviour with the /noshadow switch (or “Tools | Unit Testing | Shadow-copy assemblies being tested” in ReSharper).

Given that we know where our test assembly resides we can use Assembly.GetExecutingAssembly() to create a fully qualified path from the relative one like so:

private static string GetExecutingFolder()
{
  var codebase = Assembly.GetExecutingAssembly()
                         .CodeBase;
  var folder = Path.GetDirectoryName(codebase);
  return new Uri(folder).LocalPath;
}
. . .
var thisFolder = GetExecutingFolder();
var tibcoFolder = “..\..\tools\TIBCO”;
var serverPath = Path.Combine(
            thisFolder, tibcoFolder, “tibemsd.exe”);
var configPath = Path.Combine(
            thisFolder, tibcoFolder, “localhost.conf”);

Now that we know where the binary and config lives we just need to stop the elevation by setting the right environment variable:

Environment.SetEnvironmentVariable("__COMPAT_LAYER", "RunAsInvoker");

Finally we can start the TIBEMSD.EXE console application in the background (i.e. no distracting console window) using Diagnostics.Process:

var process = new System.Diagnostics.Process
{
  StartInfo = new ProcessStartInfo(path, args)
  {
    UseShellExecute = false,
    CreateNoWindow = true,
  }
};
process.Start();

Stopping the daemon involves calling Kill(). There are more graceful ways of remotely stopping a console application which you can try first, but Kill() is always the fall-back approach and of course the TIBCO server has been designed to survive such abuse.

Naturally you can wrap this up with the Dispose pattern so that your test code can be self-contained:

// Arrange
using (RunTibcoServer())
{
  // Act
}

// Assert

Or if you want to amortise the cost of starting it across your tests you can use the fixture-level set-up and tear down:

private IDisposable _server;

[FixtureSetUp]
public void GivenMessageQueueIsAvailable()
{
  _server = RunTibcoServer();
}

[FixtureTearDown]
public void StopMessageQueue()
{
  _server?.Dispose();
  _server = null;
}

One final issue to be aware of, and it’s a common one with integration tests like this which start a process on demand, is that the server might still be running unintentionally across test runs. This can happen when you’re debugging a test and you kill the debugger whilst still inside the test body. The solution is to ensure that the server definitely isn’t already running before you spawn it, and that can be done by killing any existing instances of it:

Process.GetProcessesByName(“tibemsd”)
       .ForEach(p => p.Kill());

Naturally this is a sledgehammer approach and assumes you aren’t using separate ports to run multiple disparate instances, or anything like that.

Other Gottchas

This gets us over the biggest hurdle, control of the server process, but there are a few other little things worth noting.

Due to the asynchronous nature and potential for residual state I’ve found it’s better to drop and re-create any queues at the start of each test to flush them. I also use the Assume.That construct in the arrangement to make it doubly clear I expect the test to start with empty queues.

Also if you’re writing tests that cover background connect and failover be aware that the TIBCO reconnection logic doesn’t trigger unless you have multiple servers configured. Luckily you can specify the same server twice, e.g.

var connection= “tcp://localhost,tcp://localhost”;

If you expect your server to shutdown gracefully, even in the face of having no connection to the queue, you might find that calling Close() on the session and/or connection blocks whilst it’s trying to reconnect (at least in EMS v6.3 it does). This might not be an expected production scenario, but it can hang your tests if something goes awry, hence I’ve used a slightly distasteful workaround where the call to Close() happens on a separate thread with a timeout:

Task.Run(() => _connection.Close()).Wait(1000);

Conclusion

Writing automated integration tests against a middleware product like TIBCO is often an uphill battle that I suspect many don’t have the appetite or patience for. Whilst this post tackles the technical challenges, as they are at least surmountable, the somewhat harder problem of tackling the organisation is sadly still left as an exercise for the reader.

 

[1] The modern NoSQL database vendors appear to have a much simpler model – use it as much as you like outside production.

[2] If the data files get really large because you leave test messages in them by accident they can cause your machine to really grind after a restart as the service goes through recovery.

[3] How to Run Applications Manifested as Highest Available With a Logon Script Without Elevation for Members of the Administrators Group

[4] A relative path means the repo can then exist anywhere on the developer’s file-system and also means the code and tools are then always self-consistent across revisions.

Automated Integration Testing with TIBCO

In the past few years I’ve worked on a few projects where TIBCO has been the message queuing product of choice within the company. Naturally being a test-oriented kind of guy I’ve used unit and component tests for much of the donkey work, but initially had to shy away from writing any automated integration tests due to the inherent difficulties of getting the system into a known state in isolation.

Organisational Barriers

For any automated integration tests to run reliably we need to control the whole environment, which ideally is our development workstations but also our CI build environment (see “The Developer’s Sandbox”). The main barriers to this with a commercial product like TIBCO are often technological, but also more often than not, organisational too.

In my experience middleware like this tends to be proprietary, very expensive, and owned within the organisation by a dedicated team. They will configure the staging and production queues and manage the fault-tolerant servers, which is probably what you’d expect as you near production. A more modern DevOps friendly company would recognise the need to allow teams to test internally first and would help them get access to the product and tools so they can build their test scaffolding that provides the initial feedback loop.

Hence just being given the client access libraries to the product is not enough, we need a way to bring up and tear down the service endpoint, in isolation, so that we can test connectivity and failover scenarios and message interoperability. We also need to be able develop and test our logic around poisoned messages and dead-letter queues. And all this needs to be automatable so that as we develop and refactor we can be sure that we’ve not broken anything; manually testing this stuff is not just not scalable in a shared test environment at the pace modern software is developed.

That said, the TIBCO EMS SDK I’ve been working with (v6.3.0) has all the parts I needed to do this stuff, albeit with some workarounds to avoid needing to run the tests with administrator rights which we’ll look into later.

The only other thorny issue is licensing. You would hope that software product companies would do their utmost to get developers on their side and make it easy for them to build and test their wares, but it is often hard to get clarity around how the product can be used outside of the final production environment. For example trying to find out if the TIBCO service can be run on a developer’s workstation or in a cloud hosted VM solely for the purposes of running some automated tests has been a somewhat arduous task.

This may not be solely the fault of the underlying product company, although the old fashioned licensing agreements often do little to distinguish production and modern development use [1]. No, the real difficulty is finding the right person within the client’s company to talk to about such matters. Unless they are au fait with the role modern automated integrated testing takes place in the development process you will struggle to convince them your intended use is in the interests of the 3rd party product, not stealing revenue from them.

Okay, time to step down from the soap box and focus on the problems we can solve…

Hosting TIBEMSD as a Windows Service

From an automated testing perspective what we need access to is the TIBEMSD.EXE console application. This provides us with one or more TIBCO message queues that we can host on our local machine. Owning thing process means we can therefore create, publish to and delete queues on demand and therefore tightly control the environment.

If you only want to do basic integration testing around the sending and receiving of messages you can configure it as a Windows service and just leave it running in the background. Then your tests can just rely on it always being there like a local database or the file-system. The build machine can be configured this way too.

Unfortunately because it’s a console application and not written to be hosted as a service (at least v6.3 isn’t), you need to use a shim like SRVANY.EXE from the Windows 2003 Resource Kit or something more modern like NSSM. These tools act as an adaptor to the console application so that the Windows SCM can control them.

One thing to be careful of when running TIBEMSD in this way is that it will stick its data files in the CWD (Current Working Directory), which for a service is %SystemRoot%\System32, unless you configure the shim to change it. Putting them in a separate folder makes them a little more obvious and easier to delete when having a clear out [2].

Running TIBEMSD On Demand

Running the TIBCO server as a service makes certain kinds of tests easier to write as you don’t have to worry about starting and stopping it, unless that’s exactly the kinds of test you want to write.

I’ve found it’s all too easy when adding new code or during a refactoring to accidentally break the service so that it doesn’t behave as intended when the network goes up and down, especially when you’re trying to handle poisoned messages.

Hence I prefer to have the TIBEMSD.EXE binary included in the source code repository, in a known place so that it can be started and stopped on demand to verify the connectivity side is working properly. For those classes of integration tests where you just need it to be running you can add it to your fixture-level setup and even keep it running across fixtures to ensure the tests running at an adequate pace.

If, like me, you don’t run as an Administrator all the time (or use elevated command prompts by default) you will find that TIBEMSD doesn’t run out-of-the-box in this way. Fortunately it’s easy to overcome these two issues and run in a LUA (Limited User Account).

Only Bind to the Localhost

One of the problems is that by default the server will try and listen for remote connections from anywhere which means it wants a hole in the firewall for its default port. This of course means you’ll get that firewall popup dialog which is annoying when trying to automate stuff. Whilst you could grant it permission with a one-off NETSH ADVFIREWALL command I prefer components in test mode to not need any special configuration if at all possible.

Windows will allow sockets that only listen for connections from the local host to avoid generating the annoying firewall popup dialog (and this was finally extended to include HTTP too). However we need to tell the TIBCO server to do just that, which we can achieve by creating a trivial configuration file (e.g. localhost.conf) with the following entry:

listen=tcp://127.0.0.1:7222

Now we just need to start it with the –conf switch:

> tibemsd.exe -config localhost.conf

Suppressing the Need For Elevation

So far so good but our other problem is that when you start TIBEMSD it wants you to elevate its permissions. I presume this is a legacy thing and there may be some feature that really needs it but so far in my automated tests I haven’t hit it.

There are a number of ways to control elevation for legacy software that doesn’t have a manifest, like using an external one, but TIBEMSD does and that takes priority. Luckily for us there is a solution in the form of the __COMPAT_LAYER environment variable [3]. Setting this, either through a batch file or within our test code, supresses the need to elevate the server and it runs happily in the background as a normal user, e.g.

> set __COMPAT_LAYER=RunAsInvoker
> tibemsd.exe -config localhost.conf

Spawning TIBEMSD From Within a Test

Once we know how to run TIBEMSD without it causing any popups we are in a position to do that from within an automated test running as any user (LUA), e.g. a developer or the build machine.

In C#, the language where I have been doing this most recently, we can either hard-code a relative path [4] to where TIBEMSD.EXE resides within the repo, or read it from the test assembly’s app.config file to give us a little more flexibility.

<appSettings>
  <add key=”tibemsd.exe”
       value=”..\..\tools\TIBCO\tibemsd.exe” />
  <add key=”conf_file”
       value=”..\..\tools\TIBCO\localhost.conf” />
</appSettings>

We can also add our special .conf file to the same folder and therefore find it in the same way. Whilst we could generate it on-the-fly it never changes so I see little point in doing this extra work.

Something to be wary of if you’re using, say, NUnit to write your integration tests is that it (and ReSharper) can copy the test assemblies to a random location to aid in insuring your tests have no accidental dependencies. In this instance we do, and a rather large one at that, so we need the relative distance between where the test assemblies are built and run (XxxIntTests\bin\Debug) and the TIBEMSD.EXE binary to remain fixed. Hence we need to disable this copying behaviour with the /noshadow switch (or “Tools | Unit Testing | Shadow-copy assemblies being tested” in ReSharper).

Given that we know where our test assembly resides we can use Assembly.GetExecutingAssembly() to create a fully qualified path from the relative one like so:

private static string GetExecutingFolder()
{
  var codebase = Assembly.GetExecutingAssembly()
                         .CodeBase;
  var folder = Path.GetDirectoryName(codebase);
  return new Uri(folder).LocalPath;
}
. . .
var thisFolder = GetExecutingFolder();
var tibcoFolder = “..\..\tools\TIBCO”;
var serverPath = Path.Combine(
            thisFolder, tibcoFolder, “tibemsd.exe”);
var configPath = Path.Combine(
            thisFolder, tibcoFolder, “localhost.conf”);

Now that we know where the binary and config lives we just need to stop the elevation by setting the right environment variable:

Environment.SetEnvironmentVariable("__COMPAT_LAYER", "RunAsInvoker");

Finally we can start the TIBEMSD.EXE console application in the background (i.e. no distracting console window) using Diagnostics.Process:

var process = new System.Diagnostics.Process
{
  StartInfo = new ProcessStartInfo(path, args)
  {
    UseShellExecute = false,
    CreateNoWindow = true,
  }
};
process.Start();

Stopping the daemon involves calling Kill(). There are more graceful ways of remotely stopping a console application which you can try first, but Kill() is always the fall-back approach and of course the TIBCO server has been designed to survive such abuse.

Naturally you can wrap this up with the Dispose pattern so that your test code can be self-contained:

// Arrange
using (RunTibcoServer())
{
  // Act
}

// Assert

Or if you want to amortise the cost of starting it across your tests you can use the fixture-level set-up and tear down:

private IDisposable _server;

[FixtureSetUp]
public void GivenMessageQueueIsAvailable()
{
  _server = RunTibcoServer();
}

[FixtureTearDown]
public void StopMessageQueue()
{
  _server?.Dispose();
  _server = null;
}

One final issue to be aware of, and it’s a common one with integration tests like this which start a process on demand, is that the server might still be running unintentionally across test runs. This can happen when you’re debugging a test and you kill the debugger whilst still inside the test body. The solution is to ensure that the server definitely isn’t already running before you spawn it, and that can be done by killing any existing instances of it:

Process.GetProcessesByName(“tibemsd”)
       .ForEach(p => p.Kill());

Naturally this is a sledgehammer approach and assumes you aren’t using separate ports to run multiple disparate instances, or anything like that.

Other Gottchas

This gets us over the biggest hurdle, control of the server process, but there are a few other little things worth noting.

Due to the asynchronous nature and potential for residual state I’ve found it’s better to drop and re-create any queues at the start of each test to flush them. I also use the Assume.That construct in the arrangement to make it doubly clear I expect the test to start with empty queues.

Also if you’re writing tests that cover background connect and failover be aware that the TIBCO reconnection logic doesn’t trigger unless you have multiple servers configured. Luckily you can specify the same server twice, e.g.

var connection= “tcp://localhost,tcp://localhost”;

If you expect your server to shutdown gracefully, even in the face of having no connection to the queue, you might find that calling Close() on the session and/or connection blocks whilst it’s trying to reconnect (at least in EMS v6.3 it does). This might not be an expected production scenario, but it can hang your tests if something goes awry, hence I’ve used a slightly distasteful workaround where the call to Close() happens on a separate thread with a timeout:

Task.Run(() => _connection.Close()).Wait(1000);

Conclusion

Writing automated integration tests against a middleware product like TIBCO is often an uphill battle that I suspect many don’t have the appetite or patience for. Whilst this post tackles the technical challenges, as they are at least surmountable, the somewhat harder problem of tackling the organisation is sadly still left as an exercise for the reader.

 

[1] The modern NoSQL database vendors appear to have a much simpler model – use it as much as you like outside production.

[2] If the data files get really large because you leave test messages in them by accident they can cause your machine to really grind after a restart as the service goes through recovery.

[3] How to Run Applications Manifested as Highest Available With a Logon Script Without Elevation for Members of the Administrators Group

[4] A relative path means the repo can then exist anywhere on the developer’s file-system and also means the code and tools are then always self-consistent across revisions.

Sharing Code with Git Subtree

The codebase I currently work on is split into a number of repositories. For example the infrastructure and deployment scripts are in separate repos as are each service-style “component”.

Manual Syncing

To keep things moving along the team decided that the handful of bits of code that were shared between the two services could easily be managed by a spot of manual copying. By keeping the shared code in a separate namespace it was also partitioned off to help make it apparent that this code was at some point going to be elevated to a more formal “shared” status.

This approach was clearly not sustainable but sufficed whilst the team was still working out what to build. Eventually we reached a point where we needed to bring the logging and monitoring stuff in-sync and I also wanted to share some other useful code like an Optional<T> type. It also became apparent that the shared code was missing quite a few unit tests as well.

Share Source or Binaries?

The gut reaction to such a problem in a language like C# would probably be to hive off the shared code into a separate repo and create another build pipeline for it that would result in publishing a package via a NuGet feed. And that is certainly what we expected to do. However the problem was where to publish the packages to as this was closed source. The organisation had its own license for an Enterprise-scale product but it wasn’t initially reachable from outside the premises where our codebase lay. Also there were some problems with getting NuGet to publish to it with an API key that seemed to lay with the way the product’s permissions were configured.

Hence to keep the ball rolling we decided to share the code at the source level by pulling the shared repo into each component’s solution. There are two common ways of doing this with Git – subtrees and submodules.

Git Submodules

It seemed logical that we should adopt the more modern submodule approach as it felt easier to attach, update and detach later. It also appeared to have support in the Jenkins 1.x plugin for doing a recursive clone so we wouldn’t have to frig it with some manual Git voodoo.

As always there is a difference between theory and practice. Whilst I suspect the submodule feature in the Jenkins plugin works great with publicly accessible open-source repos it’s not quite up to scratch when it comes to private repos that require credentials. After much gnashing of teeth trying to convince the Jenkins plugin to recursively clone the submodules, we conceded defeat assuming that we’re another victim of JENKINS-20941.

Git Subtree

Given that our long term goal was to move to publishing a NuGet feed we decided to try using a Git subtree instead so that we could at least move forward and share code. This turned out (initially) to be much simpler because for tooling like Jenkins it appears no different to a single repo.

Our source tree looked (unsurprisingly) like this:

<solution>
  +- src
     +- app
     +- shared-lib
        +- .csproj
        +- *.cs

All we needed to do was replace the shared-lib folder with the contents of the new Shared repository.

First we needed to set up a Git remote. Just as the remote main branch of a cloned repo goes by the name origin/master, so we set up a remote for the Shared repository’s main branch:

> git remote add shared https://github/org/Shared.git

Next we removed the old shared library folder:

> git rm src\shared-lib

…and grafted the new one in from the remote branch:

> git subtree add --prefix src/shared shared master --squash

This effectively takes the shared/master branch and links it further down the repo source tree to src/shared which is where we had it before.

However the organisation of the new Shared repo is not exactly the same as the old shared-lib project folder. A single child project usually sits in it’s own folder, but a full-on repo has it’s own src folder and build scripts and so the source tree now looked like this:

<solution>
  +- src
     +- app
     +- shared
        +- src
           +- shared-lib
              +- .csproj
              +- *.cs

There is now two extra levels of indirection. First there is the shared folder which corresponds to the external repo, plus there is that repo’s src folder.

At this point all that was left to do was to fix up the build, i.e. fix up the path to the shared-lib project in the Visual Studio solution file (.sln) and push the changes.

We chose to use the --squash flag when creating the subtree as we weren’t interested in seeing the entire history of the shared library in the solution’s repository.

Updating the Subtree

Flowing changes from the parent repo down into the subtree of the child repo is as simple as a fetch & pull:

> git fetch shared master
> git subtree pull --prefix src/shared shared master --squash

The latter command is almost the same as the one we used earlier but we pull rather than add. Once again we’re squashing the entire history as we’re not interested in it.

Pushing Changes Back

Naturally you might want to make a change in the subtree in the context of the entire solution and then push it back up to the parent repo. This is doable but involves using git subtree push to normalise the change back into the folder structure of the parent repo.

Personally we decided just to make the changes test-first in the parent and always flow down to the child. In the few cases the child solution helped in debugging we decided to work on the fix in the child solution workspace and then simply manually copy the change over to the shared workspace and push it out through the normal route. It’s by no means optimal but a NuGet feed was always our end game so we tolerated the little bit of friction in the short term.

The End of the Road

If we were only sucking in libraries that had no external dependencies themselves (up to that point our small shared code only relied on the .Net BCL) we might have got away with this technique for longer. But in the end the need to pull in 3rd party dependencies via NuGet in the shared project pushed it over the edge.

The problem is that NuGet packages are on a per-solution basis and the <HintPath> element in the project file assumes a relative path (essentially) from the solution file. When working in the real repo as part of the shared solution it was “..\..\packages\Xxx”, but when it’s part of the subtree based solution it needed to be two levels further up as “..\..\..\..\packages\Xxx”.

Although I didn’t spend long looking I couldn’t find a simple way to easily overcome this problem and so we decided it was time to bite-the-bullet and fix the real issue which was publishing the shared library as a NuGet feed.

Partial Success

This clearly is not anything like what you’d call an extensive use of git subtree to share code, but it certainly gave me a feel for it can do and I think it was relatively painless. What caused us to abandon it was tooling specific (the relationship between the enclosing solution’s NuGet packages folder and the shared assembly project itself) and so a different toolchain may well fair much better if build configuration is only passed down from parent to subtree.

I suspect the main force that might deter you from this technique is how much you know, or feel you need to know, about how git works. When you’re inside a tool like Visual Studio it’s very easy to make a change in the subtree folder and check it in and not necessarily realise you’re modifying what is essentially read-only code. When you next update the subtree things get sticky. Hence you really need to be diligent about your changes and pay extra attention when you commit to ensure you don’t accidentally include edits within the subtree (if you’re not planning on pushing back that way). Depending on how experienced your team are this kind of tip-toeing around the codebase might be just one more thing you’re not willing to take on.

A Game of Tag

One of the tent-pole features of Catch is the ability to write test names as free-form strings. When you run a Catch executable from the command line you can specify a test case by name, to run just that one:

./MyTestExe "a very nice test case"

or you can use wildcards to run a group of test cases (or just one with less typing):

./MyTestExe "*very nice*"

If you want to use wildcards but you're not sure what they'll match you can combine this with the listing option, -l, to see which test cases match the pattern:

./MyTestExe "*very nice*" -l
Matching test cases:
  a very nice test case
  a not very nice test case
2 matching test cases

This is already quite a powerful way to group test cases into ad-hoc "suites". However we don't want to twist our test names into artificial schemes for this purposes (although, early on, that's exactly what I proposed). Instead Catch allows you to add "tags" to test cases.

TEST_CASE( "a very nice test case", "[nice][good]" ) { /* ... */ }
TEST_CASE( "a not very nice test case", "[nice][bad]" ) { /* ... */ }

Now we can run all tests with a certain tag:

./MyTestExe [good]

or combination of tags:

./MyTestExe [nice][good]

also with exclusions:

./MyTestExe [nice]~[bad]

unions are supported with ,:

./MyTestExe [nice],[pleasant]

Very powerful! And this functionality has been around for a while.

More recent, and less well known (mostly because they weren't documented until recently) are a set of "special tags": Instruction Tags, Hiding Tags, Tag Aliases and some automatically generated tags.

Let's see what they're all about.

Instruction Tags

In general all tags that start with a symbol are reserved by Catch (or, put another way, user defined tag names must start with an alpha-numeric character). This allows a nice rich range of namespaces for special tags. Tags that start with the ! character are Instruction tags. They inform Catch something about the test case that they apply to. At time of writing the following are defined:

  • [!hide] This "hides" the test from the default run (i.e. if you run the test executable without specifying any names or tags). This feature was originally introduced with the [hide] tag (note, no: !) - and is still supported, though deprecated. There is also a shortcut form, [.] which we'll revisit in a moment.
  • [!throws] This tells Catch that an exception may be thrown in the course of executing the test - even if it is caught and dealt with. If you've ever tried to track down a rogue exception in your debugger - and so have set the debugger to break on exceptions as they're thrown - you'll know how frustrating all the false positives coming from such tests are! So Catch provides a way to suppress exceptions it is expecting - through the -e or --nothrow options on the command line. This already skips over REQUIRE_THROWS... or CHECK_THROWS... assertions. The [!throws] tag covers you for cases where the exception is caught and handled in the code under test (or your test code).
  • [!shouldfail] This tells Catch that you're expecting this test to fail! Furthermore, if it does fail then it should treat that as a pass!
  • [!mayfail] Rather than explicitly inverting the pass/ fail logic as the previous tag does, this tag just says that the test may fail but that's ok (although it is still reported). It's also ok if it passes.

Hiding Tags

We already looked at [!hide] (and the deprecated [hide]) above, and mentioned that [.] was a shortcut for the same.

It turns that when one of these tags is used it is often combined with another tag that is used when you do want to run the test. The classic example is where you write integration tests in the same executable as unit tests. By default you don't want the integration tests to run as you want the shortest possible path to running just unit tests. So you hide them but also tag them [integration], or something similar (the word "integration" has no significance to Catch). So pairings like, [.][integration] or [.][performance] are frequently found together.

So, as a convenience, Catch now supports . as a tag prefix. The rest of the tag can be completely custom and works exactly like any other normal tag - except that the test is also hidden. Our examples would, thus, be written as [.integration] and [.performance]

One final point to mention about hiding tags is that, due to the way they have evolved through a number of forms (including the severely deprecated "./" name prefix) whichever form is used will not only hide the test, but any of the other forms will match it in a tag pattern. e.g. if you tag a test with [.] you can match it with [!hide].

Tag Aliases

As we saw earlier, tags can be combined in fairly complex ways. While this is powerful and flexible, it can be a bit awkward if you often want to use the same tag expression. Wouldn't it be nice if there was a way of writing the expression once then getting Catch to remember it for you - and associate it with an easier to remember name?

Well there is! You can associate any tag pattern with a name that you can use just like any normal tag - except that it must begin with the @ character.

You create a tag alias, in code, using the CATCH_REGISTER_TAG_ALIAS macro. E.g.

CATCH_REGISTER_TAG_ALIAS( "[@not nice]", "~[nice]~[!hide]" );

This registers a tag alias, [@not nice] which, when expanded will match all tests that are not tagged [nice] but also are not hidden. The second part is important because if you have any hidden tests then they will usually be included any time you use a not expression (~) because the rule is that tests are only hidden if no pattern is specified!

Also did you notice that we had a space in the tag name? Surprised? I never said that tags could not include spaces. Of course they can.

You can register as many aliases as you like and you can put them anywhere you like (as long as catch.hpp is #included). However I recommend keeping them all in your main source file (the one you #define CATCH_CONFIG_MAIN, or equivalent) - simply so you only have to look in one place for them.

Filenames As Tags

The newest special tag form is the result of automatically generating a set of tags. The tags all begin with the # character (I've resisted the urge to call them "hash tags"). The rest of the tag is generated from the name of the source file that the test is implemented in. The full path (as reported by __FILE__) is stripped of its directories and extension - so all tests in /Development/Tests/SquirrelTests.cpp would be tagged, [#SquirrelTests].

At time of writing this feature is only available on the develop branch on GitHub - and must be specifically enabled running with the --filenames-as-tags or -# command line options. It's possible that situation may change by the time it makes it onto master.

The Tag Line

So tags not only provide a rich grouping mechanism in Catch - they also allow you to control some aspects of how Catch runs and treats test cases. Some tags can be generated for you - and some tags can be expanded from simpler forms. We've covered here the complete set of special tags at the time of writing. If you're reading this in the future there may be more - I'll try and be better at keeping the docs up-to-date there. Also any stock price tips you might have from the future would be welcome too.

Call for Papers: C++ track at NDC Oslo 2015, June 17-19

There was a lot of interest for the very strong C++ track that we did at the NDC conference in Oslo last summer. Here is a summary I wrote after the event, including links to the videos that we recorded.

We are repeating the success this year, June 17-19. A few big names has
already been signed up, but we also need your contribution. If you
would like to be part of the C++ track this year, please submit your
proposal soon. The CFP closes February 15. Feel free to contact me directly if you have any questions about the C++ track.

Modern C++ Testing

Back in August I took my family to stay for a week at my brother's house in (Old) South Wales. I've not been to Wales for a long time and it was great to be back there - but that's not what this post is about, of course.

The thing about Wales is that it has mountains (and where there are no mountains there are plenty of hills and valleys and cliffs). Mobile cell coverage is non-existent in much of the country - particularly where my brother lives.

So the timing was particularly bad when, just as we were driving along the south cost (somewhere between Cardiff and Swansea, I think), I started getting emails and tweets from people pointing out that Catch was riding high on HackerNews! Someone had recently discovered Catch and was enjoying it enough that they wanted to share it with the community. Which is awesome!

Except that, between lack of mobile reception and spending time with my family, I didn't have opportunity to join the discussion.

When I got back home a week later I read through the comments. One of them stuck out because it called me out on describing Catch as a "Modern C++" framework (the commenter recommended another framework, Bandit, as being "more modern").

When I first released Catch, back in 2010, C++11 was still referred to as C++1x (or even C++0x!) and the final release date was still uncertain. So Catch was written to target C++03. It used a few "modern" idioms of the time - but the modernity was intended more as being a break from the past - where most C++ frameworks were just reimplementations of JUnit in C++. So I think the label was somewhat justified at the time.

Of course since then C++11 has not only been standardised but is fully, or nearly fully, implemented by many leading, mainstream, compilers. I think adoption is still not high enough, at this point, that I'd be willing to drop support for C++03 in Catch (there is even an actively maintained fork for VC6!). But it is enough that the baseline for what constitutes "modern C++" has definitely moved on. And now C++14 is here too - pushing it even further forward.

"Modern" is not what it used to be

What does it mean to be a "Modern C++ Test Framework" these days anyway? Well the most obvious thing for the user is probably the use of lambdas. Along with a few other features, lambdas allow for a lot of what previously required macros to be done in pure C++. I'm usually the first to hold this up as A Good Thing. In a moment I'll get to why I don't think it's necessarily as good a step as you might think.

But before I get to that; one other thing: For me, as a framework author, the biggest difference C++11/14 would make to something like Catch would be in the internals. Large chunks of code could be removed, reduced or at least cleaned up. The "no dependencies" policy means that Catch has complete implementations of things like shared pointers, optional types and function objects - as well as many things that must be done the long way round (such as iterating collections - I long for range for loops - or at least BOOST_FOREACH).

The competition

I've come across three frameworks that I'd say qualify as truly trying to be "modern C++ test frameworks". I'm sure there are others - and I've not really even used these ones extensively - but these are the ones I'll reference in this discussion. The three frameworks are:

  • Lest - by Martin Moene, an active contributor to Catch - and partly based on some Catch ideas - re-imagined for a C++11 world.
  • Bandit - this is the one mentioned in the Hacker News comment I kicked off with
  • Mettle - I saw this in a tweet from @MeetingCpp last week and it's what kicked off the train of thought that led me to this post

The case for test case macros

But why did I say that the use of lambdas is not such a good idea? Actually I didn't quite say that. I think lambdas are a very good idea - and in many ways they would certainly clean up at least the mechanics of defining and registering test cases and sections.

Before lambdas C++ had only one place you could write a block of imperative code: in a function (or method). That means that, in Catch, test cases are really just functions - which must have a function signature - including a name (which we hide - because in Catch the test name is a string). Those functions must be captured somehow. This is done by passing a pointer to the function to the constructor of a small class - who's sole purposes is to forward the function pointer onto a global registry. Later, when the tests are being run, the registry is iterated and the function pointers invoked.

So a test case like this:

    TEST_CASE( "test name", "[tags]" )
    {
        /* ... */
    }

...written out in full (after macro expansion) looks something like:

    static void generatedFunctionName();
    namespace{ ::Catch::AutoReg generatedNameAutoRegistrar
        (   &generatedFunctionName, 
       	    ::Catch::SourceLineInfo( __FILE__, static_cast<std::size_t>( __LINE__ ) ), 
            ::Catch::NameAndDesc( "test name", "[tags]") ); 
    }
    static void generatedFunctionName()
    {
        /* .... */
    }

(generatedFunctionName is generated by yet another macro, which combines root with the current line number. Because the function is declared static the identifier is only visible in the current translation unit (cpp file), so this should be unique enough)

So there's a lot of boilerplate here - you wouldn't want to write this all by hand every time you start a new test case!

With lambdas, though, blocks of code are now first class entities, and you can introduce them anonymously. So you could write them like:

    Catch11::TestCase( "test name", "[tags]", []() 
    {
        /* ... */
    } );

This is clearly far better than the expanded macro. But it's still noisier than the version that uses the macro. Most of the C++11/14 test frameworks I've looked at tend to group tests together at a higher level. The individual tests are more like Catch's sections - but the pattern is still the same - you get noise from the lambda syntax in the form of the []() or [&]() to introduce the lambda and an extra ); at the end.

Is that really worth worrying about?

Personally I find it's enough extra noise that I think I'd prefer to continue to use a macro - even if it used lambdas under the hood. But it's also small enough that I can certainly see the case for going macro free here.

Assert yourself

But that's just test cases (and sections). Assertions have traditionally been written using macros too. In this case the main reasons are twofold:

  1. It allows the expression evaluation to be wrapped in an exception handler.
  2. It allows us the capture the file and line number to report on.

(1) can arguably be handled in whatever is holding the current lambda (e.g. it or describe in Bandit, suite, subsuite or expect in Mettle). If these blocks are small enough we should get sufficient locality of exception handling - but it's not as tight as the per-expression handling with the macro approach.

(2) simply cannot be done without involving the preprocessor in some way (whether it's to pass __FILE__ and __LINE__ manually, or to encapsulate that with a macro). How much does that matter? Again it's a matter of taste but you get several benefits from having that information. Whether you use it to manually locate the failing assertion or if you're running the reporter in an IDE window that automatically allows you to double-click the failure message to take you to the line - it's really useful to be able to go straight to it. Do you want to give that up in order to go macro free? Perhaps. Perhaps not.

Interestingly lest still uses a macro for assertions

Weighing up

So we've seen that a truly modern C++ test framework, using lambdas in particular, can allow you to write tests without the use of macros - but at a cost!

So the other side of the equation must be: what benefit do you get from eschewing the macros?

Personally I've always striven to minimise or eliminate the use of macros in C++. In the early days that was mostly about using const, inline and templates. Now lambdas allow us to address some of the remaining cases and I'm all for that.

But I also tend to associate a much higher "cost" to macro usage when it generates imperative code. This is code that you're likely to find yourself needing to step through in a debugger at runtime - and macros really obfuscate this process. When I use macros it tends to be in declarative code. Code that generates purely declarative statements, or effectively declarative statements (such as the test case function registration code). It tends to always generate the exact same machinery - so should not be sensitive to its inputs in ways that will require debugging.

How do Catch's macros play out in that regard? Well the test case registration macros get a pass. Sections are a grey area - they are on the path of code that needs to be stepped over - and, worse, hide a conditional (a section is really just an if statement on a global variable!). So score a few points down there. Assertions are also very much runtime executable - and are frequently on the debugging path! In fact stepping into expressions being asserted on in Catch tests can be quite a pain as you end up stepping into some of the "hidden" calls before you get to the expression you supplied (in Visual Studio, at least, this can be mitigated by excluding the Catch namespace using the StepOver registry key).

Now, interestingly, the use of macros for the assertions was never really about C++03 vs C++11. It was about capturing extra information (file/ line) and wrapping in a try-catch. So if you're willing to make that trade-off there's no reason you can't have non-macro assertions even in C++03!

Back to the future

One of my longer arcs of development on Catch (that I edge towards on each refactoring) is to decouple the assertion mechanism from the guts of the test runner. You should be able to provide your own assertions that work with Catch. Many other test frameworks work this way and it allows them to be much more flexible. In particular it will allow me to decouple the matcher framework (and maybe allow third-party matchers to work with Catch).

Of course this would also allow macro-less assertions to be used (as it happens the assertions in bandit and mettle are both matcher-like already).

So, while I think Catch is committed to supporting C++03 for some time yet, that doesn't mean there is no scope for modernising it and keeping it relevant. And, modern or not, I still believe it is the simplest C++ test framework to get up and running with, and the least noisy to work with.

Videos from C++ track on NDC 2014

As the chair for the C++ track on NDC Oslo, I am happy to report that the C++ track was a huge and massive success! The C++ community in Norway is rather small so even if NDC is a big annual conference for programmers (~1600 geeks) and even with names like Nico, Scott and Andrei as headliners for the track, I was not sure how many people would actually turn up for the C++ talks. I was positively surprised. The first three sessions was packed with people (it was of course a cheap trick to make the three first sessions general/introductory on popular topics). All the other talks were also well attended. The NDC organizers have already confirmed that they want to do this next year as well.

NDC Oslo is an annual five day event. First two days of pre-conference workshops (Andrei did 2 days of Scalable design and implementation using C++) and then 9 tracks of talks for three full days. As usual, NDC records all the talks and generously share all the videos with the world (there are 150+ talks, kudos to NDC!).

I have listed the videos from the C++ track this year. I will also put out a link to the slides when I get them. Enjoy!

Day 1, June 4, 2014

  • C++14, Nico Josuttis (video)
  • Effective Modern C++, Scott Meyers (video)
  • Error Handling in C++, Andrei Alexandrescu (video)
  • Move, noexcept, and push_back(), Nico Josuttis (video)
  • C++ Type Deduction and Why You Care, Scott Meyers (video)
  • Generic and Generative Programming in C++, Andrei Alexandrescu (video)

Day 2, June 5, 2014

  • C++ – where are we headed?, Hubert Matthews (video)
  • Three Cool Things about D, Andrei Alexandrescu (video)
  • The C++ memory model, Mike Long (video, slides)
  • C++ for small devices, Isak Styf (video)
  • Brief tour of Clang, Ismail Pazarbasi (video)
  • Insecure coding in C and C++, Olve Maudal (video, slides)
  • So you think you can int? (C++), Anders Knatten (video)

With this great lineup I hope that NDC Oslo in June has established itself as a significant annual C++ event in Europe together with Meeting C++ Berlin in December and ACCU Bristol in April.

Save the date for NDC next year, June 15-19, 2015. I can already promise you a really strong C++ track at NDC Oslo 2015.

A final note: Make sure you stay tuned on isocpp.org for global news about C++.

How to write Boost.Python type converters

Boost.Python [1] makes it possible to write C++ that "feels" like Python. The library is powerful and sometimes subtle. This is as compared with the Python C API, where the experience is very far removed from writing Python code.

Part of making C++ feel more like Python is allowing natural assignment of C++ objects to Python variables. For instance, assigning an standard library string to a Python object looks like this:

// Create a C++ string
std::string msg("Hello, Python");

// Assign it to a python object
boost::python::object py_msg = msg;

Likewise (though somewhat less naturally), it is also important to be able to extract C++ objects from Python objects. Boost.Python provides the extract [2] type for this:

boost::python::object obj = ... ;
std::string msg = boost::python::extract(obj);

To allow this kind of natural assignment, Boost.Python provides a system for registering converters between the languages. Unfortunately, the Boost.Python documentation does a pretty poor job of describing how to write them. A bit of searching on the internet will turn up a few links. [3]

While these are fine (and, in truth, are the basis for what I know about the conversion system), they are not as explicit as I would like.

So, in an effort to clarify the conversion system both for myself and (hopefully) others, I wrote this little primer. I'll step through a full example showing how to write converters for Qt's QString [4] class. In the end, you should have all the information you need to write and register your own converters.

Converting QString

A Boost.Python type converter consists of two major parts. The first part, which is generally the simpler of the two, converts a C++ type into a Python type. I'll refer to this as the to-python converter. The second part converts a Python object into a C++ type. I'll refer to this as the from-python converter.

In order to have your converters be used at runtime, the Boost.Python framework requires you to register them. The Boost.Python API provides separate methods for registering to-python and from-python converters. Because of this, you are free to provide conversion in only one direction for a type if you so choose.

Note that, for certain elements of what I'm about to describe, there is more than one way to do things. For example, in some cases where I choose to use static member functions, you could also use free functions. I won't point these out, but if you wear your C++ thinking-cap you should be able to see what is mandatory and what isn't.

To-python Converters

A to-python converter converts a C++ type to a Python object. From an API perspective, a to-python converter is used any time that you construct a boost::python::object [5] from another C++ type. For example:

// Construct object from an int
boost::python::object int_obj(42);

// Construct object from a string
boost::python::object str_obj = std::string("llama");

// Construct object from a user-defined type
Foo foo;
boost::python::object foo_obj(foo);

You implement a to-python converter using a struct with static member function named convert(), which takes the C++ object to be converted as its argument, and it returns a PyObject*. A to-python converter for QStrings looks like this:

/* to-python convert to QStrings */
struct QString_to_python_str
{
    static PyObject* convert(QString const& s)
    {
        return boost::python::incref(
            boost::python::object(
                s.toLatin1().constData()).ptr());
    }
};

The crux what this does is as follows:

  1. Extract the QString's underlying character data using toLatin1().constData()
  2. Construct a boost::python::object with the character data
  3. Retrieve the boost::python::object's PyObject* with ptr()
  4. Increment the reference count on the PyObject* and return that pointer.

That last step bears a little explanation. Suppose that you didn't increment the reference count on the returned pointer. As soon as the function returned, the boost::python::object in the function would destruct, thereby reducing the ref-count to zero. When the PyObject's reference count goes to zero, Python will consider the object dead and it may be garbage-collected, meaning you would return a deallocated object from convert().

Once you've written the to-python converter for a type, you need to register it with Boost.Python's runtime. You do this with the aptly-named to_python_converter [6] template:

// register the QString-to-python converter
boost::python::to_python_converter<
    QString,
    QString_to_python_str>()

The first template parameter is the C++ type for which you're registering a converter. The second is the converter struct. Notice that this registration process is done at runtime; you need to call the registration functions before you try to do any custom type converting.

From-python Converters

From-python converters are slightly more complex because, beyond simply providing a function to convert from Python to C++, they also have to provide a function that determines if a Python type can safely be converted to the requested C++ type. Likewise, they often require more knowledge of the Python C API.

From-python converters are used whenever Boost.Python's extract type is called. For example:

// get an int from a python object
int x = boost::python::extract(int_obj);

// get an STL string from a python object
std::string s = boost::python::extract(str_obj);

// get a user-defined type from a python object
Foo foo = boost::python::extract(foo_obj);

The recipe I use for creating from-python converters is similar to to-python converters: create a struct with some static methods and register those with the Boost.Python runtime system.

The first method you'll need to define is used to determine whether an arbitrary Python object is convertible to the type you want to extract. If the conversion is OK, this function should return the PyObject*; otherwise, it should return NULL. So, for QStrings you would write:

struct QString_from_python_str
{

    . . .

    // Determine if obj_ptr can be converted in a QString
    static void* convertible(PyObject* obj_ptr)
    {
        if (!PyString_Check(obj_ptr)) return 0;
        return obj_ptr;
    }

    . . .

};

This simply says that a PyObject* can be converted to a QString if it is a Python string.

The second method you'll need to write does the actual conversion. The primary trick in this method is that Boost.Python will provide you with a chunk of memory into which you must in-place construct your new C++ object. All of the funny "rvalue_from_python" stuff just has to do with Boost.Python's method for providing you with that memory chunk:

struct QString_from_python_str
{

    . . .

    // Convert obj_ptr into a QString
    static void construct(
        PyObject* obj_ptr,
        boost::python::converter::rvalue_from_python_stage1_data* data)
    {
        // Extract the character data from the python string
        const char* value = PyString_AsString(obj_ptr);

        // Verify that obj_ptr is a string (should be ensured by
        convertible())
        assert(value);

        // Grab pointer to memory into which to construct the new QString
        void* storage = (
            (boost::python::converter::rvalue_from_python_storage*)
            data)->storage.bytes;

        // in-place construct the new QString using the character data
        // extraced from the python object
        new (storage) QString(value);

        // Stash the memory chunk pointer for later use by boost.python
        data->convertible = storage;
    }

  . . .

};

The final step for from-python converters is, of course, to register the converter. To do this, you use boost::python::converter::registry::push_back(). [7] The first argument is a pointer to the function which tests for convertibility, the second is a pointer to the conversion function, and the third is a boost::python::type_id for the C++ type. In this case, we'll put the registration into the constructor for the struct we've been building up:

struct QString_from_python_str
{
    QString_from_python_str()
    {
        boost::python::converter::registry::push_back(
            &convertible,
            &construct,
            boost::python::type_id());
    }

    . . .

};

Now, if you simply construct a single QString_from_python_str object in your initialization code (just like you how you called to_python_converter() for the to-python registration), conversion from Python strings to QString will be enabled.

Taking a reference to the PyObject in convert()

One gotcha to be aware of in your construct() function is that the PyObject argument is a 'borrowed' reference. That is, its reference count has not already been incremented for you. [8] If you plan to keep a reference to that object, you must use Boost.Python's borrowed construct. For example:

class MyClass
{
public:
    MyClass(boost::python::object obj) : obj_ (obj) {}

private:
    boost::python::object obj_;
};

struct MyClass_from_python
{
    . . .

    static void construct(
        PyObject* obj_ptr,
        boost::python::converter::rvalue_from_python_stage1_data* data)
    {
        using namespace boost::python;

        void* storage = (
            (converter::rvalue_from_python_storage*)
                data)->storage.bytes;

        // Use borrowed to construct the object so that a reference
        // count will be properly handled.
        handle<> hndl(borrowed(obj_ptr));
        new (storage) MyClass(object(hndl));

        data->convertible = storage;
    }
};

Failing to use borrowed() in this situation will generally lead to memory corruption and/or garbage collection errors in the Python runtime.

There are a number of useful resources on the web for finding more information on Boost.Python objects, handles, and reference counting. [9]

When converters don't exist

Finally, a cautionary note. The Boost.Python type-conversion system works well, not only at the job of moving objects across the C++-python languages barrier, but at making code easier to read and understand. You must always keep in mind, though, this comes at the cost of very little compile-time checking.

That is, the boost::python::object copy-constructor is templatized and accepts any type without complaint. This means that your code will compile just fine even if you're constructing boost::python::object s from types that have no registered converter. At runtime these constructors will find that they have no converter for the requested type, and this will result in exceptions.

These exceptions [10] will tend to happen in unexpected places, and you could spend quite a bit of time trying to figure them out. I say all of this so that maybe, when you encounter strange exceptions when using Boost.Python, you'll remember to check that your converters are registered first. Hopefully it'll save you some time.

Resources

Boost.Python is fairly complex and can be difficult to understand all at once. Here are few more useful resources that might help you come up to speed on this useful technology:

  • This IPython notebook-based tutorial covers a lot of the major (and some of the more obscure) topics in Boost.Python.
  • The Boost.Python wiki contains a lot of collected Boost.Python knowledge.
  • And of course, the Boost.Python documentation itself is very useful.

Appendix: Full code for QString converter

struct QString_to_python_str
{
    static PyObject* convert(QString const& s)
    {
        return boost::python::incref(
            boost::python::object(
                s.toLatin1().constData()).ptr());
    }
};

struct QString_from_python_str
{
    QString_from_python_str()
    {
        boost::python::converter::registry::push_back(
            &convertible,
            &construct,
            boost::python::type_id());
    }

    // Determine if obj_ptr can be converted in a QString
    static void* convertible(PyObject* obj_ptr)
    {
        if (!PyString_Check(obj_ptr)) return 0;
        return obj_ptr;
    }

    // Convert obj_ptr into a QString
    static void construct(
        PyObject* obj_ptr,
        boost::python::converter::rvalue_from_python_stage1_data* data)
    {
        // Extract the character data from the python string
        const char* value = PyString_AsString(obj_ptr);

        // Verify that obj_ptr is a string (should be ensured by convertible())
        assert(value);

        // Grab pointer to memory into which to construct the new QString
        void* storage = (
            (boost::python::converter::rvalue_from_python_storage*)
            data)->storage.bytes;

        // in-place construct the new QString using the character data
        // extraced from the python object
        new (storage) QString(value);

        // Stash the memory chunk pointer for later use by boost.python
        data->convertible = storage;
    }
};

void initializeConverters()
{
    using namespace boost::python;

    // register the to-python converter
    to_python_converter<
        QString,
        QString_to_python_str>();

    // register the from-python converter
    QString_from_python_str();
}
[1]The Boost.Python homepage.
[2]boost::python::extract<> documentation.
[3]For example the Boost.Python FAQ.
[4]The Qt QString documentation.
[5]The boost::python::object documentation.
[6]The to_python_converter documentation.
[7]The boost::python::converter::registry documentation.
[8]Python reference counting details.
[9]For example, this discussion from the C++-sig discussion list, the Boost.Python documentation, and David Abrahams' guidelines. for handle<> on the Python wiki.))
[10]Boost.Python uniformly uses boost::python::error_already_set to communicate exceptions from Python to C++..

How to write Boost.Python type converters

Boost.Python [1] makes it possible to write C++ that "feels" like Python. The library is powerful and sometimes subtle. This is as compared with the Python C API, where the experience is very far removed from writing Python code.

Part of making C++ feel more like Python is allowing natural assignment of C++ objects to Python variables. For instance, assigning an standard library string to a Python object looks like this:

// Create a C++ string
std::string msg("Hello, Python");

// Assign it to a python object
boost::python::object py_msg = msg;

Likewise (though somewhat less naturally), it is also important to be able to extract C++ objects from Python objects. Boost.Python provides the extract [2] type for this:

boost::python::object obj = ... ;
std::string msg = boost::python::extract(obj);

To allow this kind of natural assignment, Boost.Python provides a system for registering converters between the languages. Unfortunately, the Boost.Python documentation does a pretty poor job of describing how to write them. A bit of searching on the internet will turn up a few links. [3]

While these are fine (and, in truth, are the basis for what I know about the conversion system), they are not as explicit as I would like.

So, in an effort to clarify the conversion system both for myself and (hopefully) others, I wrote this little primer. I'll step through a full example showing how to write converters for Qt's QString [4] class. In the end, you should have all the information you need to write and register your own converters.

Converting QString

A Boost.Python type converter consists of two major parts. The first part, which is generally the simpler of the two, converts a C++ type into a Python type. I'll refer to this as the to-python converter. The second part converts a Python object into a C++ type. I'll refer to this as the from-python converter.

In order to have your converters be used at runtime, the Boost.Python framework requires you to register them. The Boost.Python API provides separate methods for registering to-python and from-python converters. Because of this, you are free to provide conversion in only one direction for a type if you so choose.

Note that, for certain elements of what I'm about to describe, there is more than one way to do things. For example, in some cases where I choose to use static member functions, you could also use free functions. I won't point these out, but if you wear your C++ thinking-cap you should be able to see what is mandatory and what isn't.

To-python Converters

A to-python converter converts a C++ type to a Python object. From an API perspective, a to-python converter is used any time that you construct a boost::python::object [5] from another C++ type. For example:

// Construct object from an int
boost::python::object int_obj(42);

// Construct object from a string
boost::python::object str_obj = std::string("llama");

// Construct object from a user-defined type
Foo foo;
boost::python::object foo_obj(foo);

You implement a to-python converter using a struct with static member function named convert(), which takes the C++ object to be converted as its argument, and it returns a PyObject*. A to-python converter for QStrings looks like this:

/* to-python convert to QStrings */
struct QString_to_python_str
{
    static PyObject* convert(QString const& s)
    {
        return boost::python::incref(
            boost::python::object(
                s.toLatin1().constData()).ptr());
    }
};

The crux what this does is as follows:

  1. Extract the QString's underlying character data using toLatin1().constData()
  2. Construct a boost::python::object with the character data
  3. Retrieve the boost::python::object's PyObject* with ptr()
  4. Increment the reference count on the PyObject* and return that pointer.

That last step bears a little explanation. Suppose that you didn't increment the reference count on the returned pointer. As soon as the function returned, the boost::python::object in the function would destruct, thereby reducing the ref-count to zero. When the PyObject's reference count goes to zero, Python will consider the object dead and it may be garbage-collected, meaning you would return a deallocated object from convert().

Once you've written the to-python converter for a type, you need to register it with Boost.Python's runtime. You do this with the aptly-named to_python_converter [6] template:

// register the QString-to-python converter
boost::python::to_python_converter<
    QString,
    QString_to_python_str>()

The first template parameter is the C++ type for which you're registering a converter. The second is the converter struct. Notice that this registration process is done at runtime; you need to call the registration functions before you try to do any custom type converting.

From-python Converters

From-python converters are slightly more complex because, beyond simply providing a function to convert from Python to C++, they also have to provide a function that determines if a Python type can safely be converted to the requested C++ type. Likewise, they often require more knowledge of the Python C API.

From-python converters are used whenever Boost.Python's extract type is called. For example:

// get an int from a python object
int x = boost::python::extract(int_obj);

// get an STL string from a python object
std::string s = boost::python::extract(str_obj);

// get a user-defined type from a python object
Foo foo = boost::python::extract(foo_obj);

The recipe I use for creating from-python converters is similar to to-python converters: create a struct with some static methods and register those with the Boost.Python runtime system.

The first method you'll need to define is used to determine whether an arbitrary Python object is convertible to the type you want to extract. If the conversion is OK, this function should return the PyObject*; otherwise, it should return NULL. So, for QStrings you would write:

struct QString_from_python_str
{

    . . .

    // Determine if obj_ptr can be converted in a QString
    static void* convertible(PyObject* obj_ptr)
    {
        if (!PyString_Check(obj_ptr)) return 0;
        return obj_ptr;
    }

    . . .

};

This simply says that a PyObject* can be converted to a QString if it is a Python string.

The second method you'll need to write does the actual conversion. The primary trick in this method is that Boost.Python will provide you with a chunk of memory into which you must in-place construct your new C++ object. All of the funny "rvalue_from_python" stuff just has to do with Boost.Python's method for providing you with that memory chunk:

struct QString_from_python_str
{

    . . .

    // Convert obj_ptr into a QString
    static void construct(
        PyObject* obj_ptr,
        boost::python::converter::rvalue_from_python_stage1_data* data)
    {
        // Extract the character data from the python string
        const char* value = PyString_AsString(obj_ptr);

        // Verify that obj_ptr is a string (should be ensured by
        convertible())
        assert(value);

        // Grab pointer to memory into which to construct the new QString
        void* storage = (
            (boost::python::converter::rvalue_from_python_storage*)
            data)->storage.bytes;

        // in-place construct the new QString using the character data
        // extraced from the python object
        new (storage) QString(value);

        // Stash the memory chunk pointer for later use by boost.python
        data->convertible = storage;
    }

  . . .

};

The final step for from-python converters is, of course, to register the converter. To do this, you use boost::python::converter::registry::push_back(). [7] The first argument is a pointer to the function which tests for convertibility, the second is a pointer to the conversion function, and the third is a boost::python::type_id for the C++ type. In this case, we'll put the registration into the constructor for the struct we've been building up:

struct QString_from_python_str
{
    QString_from_python_str()
    {
        boost::python::converter::registry::push_back(
            &convertible,
            &construct,
            boost::python::type_id());
    }

    . . .

};

Now, if you simply construct a single QString_from_python_str object in your initialization code (just like you how you called to_python_converter() for the to-python registration), conversion from Python strings to QString will be enabled.

Taking a reference to the PyObject in convert()

One gotcha to be aware of in your construct() function is that the PyObject argument is a 'borrowed' reference. That is, its reference count has not already been incremented for you. [8] If you plan to keep a reference to that object, you must use Boost.Python's borrowed construct. For example:

class MyClass
{
public:
    MyClass(boost::python::object obj) : obj_ (obj) {}

private:
    boost::python::object obj_;
};

struct MyClass_from_python
{
    . . .

    static void construct(
        PyObject* obj_ptr,
        boost::python::converter::rvalue_from_python_stage1_data* data)
    {
        using namespace boost::python;

        void* storage = (
            (converter::rvalue_from_python_storage*)
                data)->storage.bytes;

        // Use borrowed to construct the object so that a reference
        // count will be properly handled.
        handle<> hndl(borrowed(obj_ptr));
        new (storage) MyClass(object(hndl));

        data->convertible = storage;
    }
};

Failing to use borrowed() in this situation will generally lead to memory corruption and/or garbage collection errors in the Python runtime.

There are a number of useful resources on the web for finding more information on Boost.Python objects, handles, and reference counting. [9]

When converters don't exist

Finally, a cautionary note. The Boost.Python type-conversion system works well, not only at the job of moving objects across the C++-python languages barrier, but at making code easier to read and understand. You must always keep in mind, though, this comes at the cost of very little compile-time checking.

That is, the boost::python::object copy-constructor is templatized and accepts any type without complaint. This means that your code will compile just fine even if you're constructing boost::python::object s from types that have no registered converter. At runtime these constructors will find that they have no converter for the requested type, and this will result in exceptions.

These exceptions [10] will tend to happen in unexpected places, and you could spend quite a bit of time trying to figure them out. I say all of this so that maybe, when you encounter strange exceptions when using Boost.Python, you'll remember to check that your converters are registered first. Hopefully it'll save you some time.

Resources

Boost.Python is fairly complex and can be difficult to understand all at once. Here are few more useful resources that might help you come up to speed on this useful technology:

  • This IPython notebook-based tutorial covers a lot of the major (and some of the more obscure) topics in Boost.Python.
  • The Boost.Python wiki contains a lot of collected Boost.Python knowledge.
  • And of course, the Boost.Python documentation itself is very useful.

Appendix: Full code for QString converter

struct QString_to_python_str
{
    static PyObject* convert(QString const& s)
    {
        return boost::python::incref(
            boost::python::object(
                s.toLatin1().constData()).ptr());
    }
};

struct QString_from_python_str
{
    QString_from_python_str()
    {
        boost::python::converter::registry::push_back(
            &convertible,
            &construct,
            boost::python::type_id());
    }

    // Determine if obj_ptr can be converted in a QString
    static void* convertible(PyObject* obj_ptr)
    {
        if (!PyString_Check(obj_ptr)) return 0;
        return obj_ptr;
    }

    // Convert obj_ptr into a QString
    static void construct(
        PyObject* obj_ptr,
        boost::python::converter::rvalue_from_python_stage1_data* data)
    {
        // Extract the character data from the python string
        const char* value = PyString_AsString(obj_ptr);

        // Verify that obj_ptr is a string (should be ensured by convertible())
        assert(value);

        // Grab pointer to memory into which to construct the new QString
        void* storage = (
            (boost::python::converter::rvalue_from_python_storage*)
            data)->storage.bytes;

        // in-place construct the new QString using the character data
        // extraced from the python object
        new (storage) QString(value);

        // Stash the memory chunk pointer for later use by boost.python
        data->convertible = storage;
    }
};

void initializeConverters()
{
    using namespace boost::python;

    // register the to-python converter
    to_python_converter<
        QString,
        QString_to_python_str>();

    // register the from-python converter
    QString_from_python_str();
}
[1]The Boost.Python homepage.
[2]boost::python::extract<> documentation.
[3]For example the Boost.Python FAQ.
[4]The Qt QString documentation.
[5]The boost::python::object documentation.
[6]The to_python_converter documentation.
[7]The boost::python::converter::registry documentation.
[8]Python reference counting details.
[9]For example, this discussion from the C++-sig discussion list, the Boost.Python documentation, and David Abrahams' guidelines. for handle<> on the Python wiki.))
[10]Boost.Python uniformly uses boost::python::error_already_set to communicate exceptions from Python to C++..

A CppQuiz a day, keeps the debugger away!

C++ is a difficult language to master. Very difficult. It does not take more than a few days away from the keyboard before you start forgetting some of the details that will bite when you visit the dark and dusty corners of the language (sometimes because you work with code written by others).

Last month, Anders Schau Knatten officially launched a great tool for practicing your C++ language skills:

http://cppquiz.org

I recommend that you visit this site once in a while to challenge yourself. A good score on this quiz does not make you a great programmer, but it does suggest that you have a deeper understanding of the language than most. Being fluent in a programming language makes it much easier to avoid the dark and dusty corners so that you can concentrate on writing high quality software instead of spending time in the debugger.

Optional streaming

Catch has a number of macros that allow values of arbitrary types to be streamed into an ostringstream. The canonical example is the INFO macro:

INFO( "There were " << bottles.size() << " green bottles, hanging on the wall" );

This macro builds up a string that will be passed to the next assertion to be included as an annotation. Note that, unlike with a naked ostringstream there is no leading <<. This makes it clean and uncluttered when you just want to log a single value (such as a string), for example:

INFO( "Weirdness" );
The obvious way to do this is for the macro to provide the leading << prior to its argument. Conceptually something like this:
#define INFO( log ) { \
	std::ostringstream oss; \
	oss << log; \
	useTheString( oss.str() ); 
}

This all works quite nicely. But there are a few other macros that use this idiom, too: WARN, SUCCEED and FAIL.

The last two are of interest because the logging behaviour is more of a secondary concern. The primary behaviour is to appear like a passing or failing assertion, respectively, without the need to actually assert on anything. SUCCEED can be useful if you otherwise have no assertions in a test and you don't want to see warnings about it. FAIL is useful if the situation that leads to the failure is not captured in an expression for some reason. It can also be useful to force a test to fail, perhaps as a placeholder. These are useful macros to have available, but they are not often needed in practice. So when they are it's nice to be able to annotate their useage inline - hence the streamed argument.

This is all well and good. But I've found there are still enough cases where I don't want to annotate that having to pass an empty string or make something up is a little annoying. I also use a similar idiom in other projects where it would be nice to be able to make the stream completely optional.

This is not as easy as it sounds, though. The first, and most obvious, issue is that this requires support for variadic macros. Catch has made use of variadic macros, where available, for some time now. In theory they are available to any C++11 compiler. In practice most, if not all, compilers that support any reasonable chunk of C++11 support variadic macros - and most supported them as an extension even before that. That's certainly true of Visual C++, GCC and Clang.

The technically more interesting problem, though, is dealing with that initial <<. Remember the first << is being supplied inside the macro. It will still be there even if the caller does not supply an argument to the macro. If we wrote FAIL the same way we presented INFO earlier (but with variadic macros) it might look something like this:

#define FAIL( ... ) { \
	std::ostringstream oss; \
	oss << __VA_ARGS__; \
	notifyFail( oss.str() ); \
}
... which, with no argument provided, would expand to...
{ 
	std::ostringstream oss; 
	oss << ; 
	notifyFail( oss.str() ); 
}

Do you see the problem? With nothing following the << this will not compile.

So do we need a different operator? What properties would we need? It seems we'd need an operator that comes in two forms: a binary operator that allows us to capture an argument, and a unary operator that allows us to omit the argument. Furthermore the binary form must not require its argument to be enclosed in any sort of brackets. Finally it must have higher precedence than << so we can switch over to normal stream insertion at that point.

That's a long list. Does such an operator exist? Fortunately there's not just one but two such operators to choose from! + and -. The only slight hitch is that the unary form is right-to-left associative, whereas the binary form is left-to-right. So how can we work these in?

Let's pick one of the operators. I've gone with +, but I don't think there is any advantage either way. Because unary + is right-to-left associative it needs to prefix something. So we can't use it at the start of our streaming expression. We can, however, use it at the end. Then we'll need an object to apply it to. The object doesn't actually need to do anything else. I've gone with this implementation of StreamEndStop in Catch:

struct StreamEndStop {
    std::string operator+() {
        return std::string();
    }
};
With this definition the expression, +StreamEndStop() now yields an empty string - which is idempotent with a stringstream. Which means we can write:
{
	std::ostringstream oss; 
	oss << +StreamEndStop();
	notifyFail( oss.str() ); 
}
And oss.str() evaluates to an empty string. Perfect. But what about when we do stream something? Well that would expand to:
{
	std::ostringstream oss; 
	oss << something +StreamEndStop();
	notifyFail( oss.str() ); 
}
... where something could be a string or variable or literal of any type. So we need some way for the expression:
something +StreamEndStop()
to yield the value of something. That's where the binary form of operator+ comes in:
template<typename T>
T const& operator + ( T const& value, StreamEndStop& ) {
	return value;
}
Now, whether we supply nothing, a single value or multiple values joined by <<s we'll end up with a stringstream containing what we expect. The relevant bit of code in Catch actually looks like this:
Catch::ExpressionResultBuilder( messageType ) \
	<< __VA_ARGS__ \
	+::Catch::StreamEndStop()
which yields an ExpressionResultBuilder that gets passed on elsewhere. This is all protected by CATCH_CONFIG_VARIADIC_MACROS. Otherwise it falls back to:
Catch::ExpressionResultBuilder( messageType ) << log
So a lot of work to save a few explicit empty strings, but sometimes it's the little things.

Capturing lvalue references in C++11 lambdas

Recently the question "what is the type of an lvalue reference when captured by reference in a C++11 lambda?" was asked. It turns out that it's a reference to whatever the original reference was too. This is just like taking a reference to an existing reference, e.g.

int foo = 7;
int& rfoo = foo;
int& rfoo1 = rfoo;
int& rfoo2 = rfoo1;

All references refer to foo rather than rfoo2->rfoo1->rfoo->foo meaning the following code

std::cout << "foo:" << foo << ", rfoo:" << rfoo 
          << ", rfoo1:" << rfoo1 << ", rfoo2:" << rfoo2 
          << '\n';
++foo;

std::cout << "foo:" << foo << ", rfoo:" << rfoo 
          << ", rfoo1:" << rfoo1 << ", rfoo2:" << rfoo2 
          << '\n';

std::cout << "&foo:" << &foo << ", &rfoo:" << &rfoo 
          << ", &rfoo1:" << &rfoo1 << ", &rfoo2:" << &rfoo2 
          << '\n';

Which gives:

foo:7, rfoo:7, rfoo1:7, rfoo2:7
foo:8, rfoo:8, rfoo1:8, rfoo2:8
&foo:00D3FB0C, &rfoo:00D3FB0C, &rfoo1:00D3FB0C, &rfoo2:00D3FB0C

I.e. all the references are aliases for the original foo hence the same value is displayed including when the original is modified and that the address of each variable is the same, that of foo.

There is nothing surprising here it's just basic C++ but it's along time since I've thought about it which is why with lambdas, l-value, r-value and universal references I sometimes I do a double take on what was once obvious.

The same happens with lambda capture but it's a slightly more interesting story. Take the following example:

int foo = 99;
int& rfoo = foo;
int& rfoo1 = foo;

std::cout << "foo:" << foo << ", rfoo:" << rfoo 
          << ", rfoo1:" << rfoo1 
          << '\n';

std::cout << "&foo:" << &foo << ", &rfoo:" << &rfoo 
          << ", &rfoo1:" << &rfoo1 
          << '\n';

auto l = [foo, rfoo, &rfoo1]()
{
    std::cout << "foo:" << foo << '\n';
    std::cout << "rfoo:" << rfoo << '\n';
    std::cout << "rfoo1:" << rfoo1 << '\n';

    std::cout << "&foo:" << &foo << ", &rfoo:" 
              << &rfoo << ", &rfoo1:" << &rfoo1 
              << '\n';
};

foo = 100;

l();

Which gives:

foo:99, rfoo:99, rfoo1:99
&foo:00D3FB0C, &rfoo:00D3FB0C, &rfoo1:00D3FB0C
foo:99
rfoo:99
rfoo1:100
&foo:00D3FAE0, &rfoo:00D3FAE4, &rfoo1:00D3FB0C

To begin with it behaves as per the first example in that foo, rfoo and rfoo1 all give the same value as rfoo and rfoo1 are effectively aliases for foo as shown when displaying their addresses; they're all the same.

However, when these same variables are captured it's a different story: The capture of foo is of no surprise as this is by-value so displays the captured value of 99 despite the original foo being changed to 100 prior to the lambda being invoked. Its address is that of a new variable; a member of the lambda.

It starts to get interesting with the capture of rfoo. When the lambda is invoked this too displays 99, the original captured value. Also, its address is not that of the original foo. It seems that the reference itself has not been captured but rather what it refers too, in this case an int with the value of 99. It appears to have been magically dereferenced as part of the capture.

This is the correct behaviour and when thought about becomes somewhat obvious. It's just like assigning a variable from a reference, e.g.

int foo = 7;
int& rfoo = foo;
int bar = rfoo;

bar doesn't become an int& and  rfoo is magically dereferenced except in this scenario there is nothing magical at all, it's as expected. If int were replaced with auto, e.g.

auto bar = rfoo;

then it would be expected that bar is an int as auto strips of CV and reference qualifiers.

Finally, there is rfoo1. This too is odd as it is attempting to take a reference to a reference. As seen in the first example this is perfectly fine. The end effect is that there can't be a reference to reference and so on and all are aliases of the original variable.

This is pretty much what's happening here. It's irrelevant that the target of the capture is a reference. In the end the capture by reference is capture by reference of the underlying variable, i.e. what rfoo1 refers too, in this case foo not rfoo1 itself. This is demonstrated twofold by rfoo1 within the lambda displaying the updated value of foo and also that the address of rfoo1 within the lambda is that of foo outside it.

This is as per the standard section 5.1.2 Lambda expression sub-note 14:

An entity is captured by copy if it is implicitly captured and the capture-default is = or if it is explicitly
captured with a capture that does not include an &. For each entity captured by copy, an unnamed nonstatic
data member is declared in the closure type. The declaration order of these members is unspecified.
The type of such a data member is the type of the corresponding captured entity if the entity is not a
reference to an object, or the referenced type otherwise. [ Note: If the captured entity is a reference to a
function, the corresponding data member is also a reference to a function. —end note ]

The sentence in bold states that for a reference captured by value then the type of the captured value is the type referred to, i.e. the reference aspect as been removed the crucial part being "or the referenced type otherwise". (NOTE: I haven't experimented with references to functions).

Finally, a vivid example showing that a reference captured by value involves a dereference.

class Bar
{
private:
int mValue;

public:
Bar(const Bar&) : mValue(9999)
{
}

public:
Bar(const int value) : mValue(value) {}
int GetValue() const { return mValue; }
void SetValue(const int value) { mValue = value; }
};

Bar bar(1);
Bar& rbar = bar;
Bar& rbar1 = bar;

std::cout << "&bar:" << &bar << ", &rbar:" << &rbar<< ", &rbar1:" << &rbar1 << '\n';

auto l2 = [bar, rbar, &rbar1]()
{
std::cout << "bar:" << bar.GetValue() << '\n';
std::cout << "rbar:" << rbar.GetValue() << '\n';
std::cout << "rbar1:" << rbar1.GetValue() << '\n';

std::cout << "&bar:" << &bar << ", &rbar:" << &rbar<< ", &rbar1:" << &rbar1 << '\n';
};

bar.SetValue(2);

l2();

The class bar provides a crude copy-constructor that sets the stored value to 9999. The following output is similar to that in the previous example in that the addresses of bar and rbar in the lambda differ from that of bar showing they're copies whilst rbar1 is the same. Secondly, the value of mValue stored within Bar is shown as 9999 for the first two captured variables meaning they were copy-constructed.

&bar:00D3FB0C, &rbar:00D3FB0C, &rbar1:00D3FB0C
bar:9999
rbar:9999
rbar1:2
&bar:00D3FAE0, &rbar:00D3FAE4, &rbar1:00D3FB0C

Making the copy-construct private (by commenting out the seemingly unnecessary 'public:') prevents compilation.

1>------ Build started: Project: References, Configuration: Debug Win32 ------
1>  main.cpp
1>c:\users\pete\desktop\references\references\main.cpp(85): error C2248: 'Bar::Bar' : cannot access private member declared in class 'Bar'
1>          c:\users\pete\desktop\references\references\main.cpp(59) : see declaration of 'Bar::Bar'
1>          c:\users\pete\desktop\references\references\main.cpp(54) : see declaration of 'Bar'
1>          c:\users\pete\desktop\references\references\main.cpp(59) : see declaration of 'Bar::Bar'
1>          c:\users\pete\desktop\references\references\main.cpp(54) : see declaration of 'Bar'

Writing this post has clarified the situation for me, I hope it helps you as well.

The sample code is available here.

Capturing lvalue references in C++11 lambdas

Recently the question "what is the type of an lvalue reference when captured by reference in a C++11 lambda?" was asked. It turns out that it's a reference to whatever the original reference was too. This is just like taking a reference to an existing reference, e.g.

int foo = 7;
int& rfoo = foo;
int& rfoo1 = rfoo;
int& rfoo2 = rfoo1;

All references refer to foo rather than rfoo2->rfoo1->rfoo->foo meaning the following code

std::cout << "foo:" << foo << ", rfoo:" << rfoo 
          << ", rfoo1:" << rfoo1 << ", rfoo2:" << rfoo2 
          << '\n';
++foo;

std::cout << "foo:" << foo << ", rfoo:" << rfoo 
          << ", rfoo1:" << rfoo1 << ", rfoo2:" << rfoo2 
          << '\n';

std::cout << "&foo:" << &foo << ", &rfoo:" << &rfoo 
          << ", &rfoo1:" << &rfoo1 << ", &rfoo2:" << &rfoo2 
          << '\n';

Which gives:

foo:7, rfoo:7, rfoo1:7, rfoo2:7
foo:8, rfoo:8, rfoo1:8, rfoo2:8
&foo:00D3FB0C, &rfoo:00D3FB0C, &rfoo1:00D3FB0C, &rfoo2:00D3FB0C

I.e. all the references are aliases for the original foo hence the same value is displayed including when the original is modified and that the address of each variable is the same, that of foo.

There is nothing surprising here it's just basic C++ but it's along time since I've thought about it which is why with lambdas, l-value, r-value and universal references I sometimes I do a double take on what was once obvious.

The same happens with lambda capture but it's a slightly more interesting story. Take the following example:

int foo = 99;
int& rfoo = foo;
int& rfoo1 = foo;

std::cout << "foo:" << foo << ", rfoo:" << rfoo 
          << ", rfoo1:" << rfoo1 
          << '\n';

std::cout << "&foo:" << &foo << ", &rfoo:" << &rfoo 
          << ", &rfoo1:" << &rfoo1 
          << '\n';

auto l = [foo, rfoo, &rfoo1]()
{
    std::cout << "foo:" << foo << '\n';
    std::cout << "rfoo:" << rfoo << '\n';
    std::cout << "rfoo1:" << rfoo1 << '\n';

    std::cout << "&foo:" << &foo << ", &rfoo:" 
              << &rfoo << ", &rfoo1:" << &rfoo1 
              << '\n';
};

foo = 100;

l();

Which gives:

foo:99, rfoo:99, rfoo1:99
&foo:00D3FB0C, &rfoo:00D3FB0C, &rfoo1:00D3FB0C
foo:99
rfoo:99
rfoo1:100
&foo:00D3FAE0, &rfoo:00D3FAE4, &rfoo1:00D3FB0C

To begin with it behaves as per the first example in that foo, rfoo and rfoo1 all give the same value as rfoo and rfoo1 are effectively aliases for foo as shown when displaying their addresses; they're all the same.

However, when these same variables are captured it's a different story: The capture of foo is of no surprise as this is by-value so displays the captured value of 99 despite the original foo being changed to 100 prior to the lambda being invoked. Its address is that of a new variable; a member of the lambda.

It starts to get interesting with the capture of rfoo. When the lambda is invoked this too displays 99, the original captured value. Also, its address is not that of the original foo. It seems that the reference itself has not been captured but rather what it refers too, in this case an int with the value of 99. It appears to have been magically dereferenced as part of the capture.

This is the correct behaviour and when thought about becomes somewhat obvious. It's just like assigning a variable from a reference, e.g.

int foo = 7;
int& rfoo = foo;
int bar = rfoo;

bar doesn't become an int& and  rfoo is magically dereferenced except in this scenario there is nothing magical at all, it's as expected. If int were replaced with auto, e.g.

auto bar = rfoo;

then it would be expected that bar is an int as auto strips of CV and reference qualifiers.

Finally, there is rfoo1. This too is odd as it is attempting to take a reference to a reference. As seen in the first example this is perfectly fine. The end effect is that there can't be a reference to reference and so on and all are aliases of the original variable.

This is pretty much what's happening here. It's irrelevant that the target of the capture is a reference. In the end the capture by reference is capture by reference of the underlying variable, i.e. what rfoo1 refers too, in this case foo not rfoo1 itself. This is demonstrated twofold by rfoo1 within the lambda displaying the updated value of foo and also that the address of rfoo1 within the lambda is that of foo outside it.

This is as per the standard section 5.1.2 Lambda expression sub-note 14:

An entity is captured by copy if it is implicitly captured and the capture-default is = or if it is explicitly
captured with a capture that does not include an &. For each entity captured by copy, an unnamed nonstatic
data member is declared in the closure type. The declaration order of these members is unspecified.
The type of such a data member is the type of the corresponding captured entity if the entity is not a
reference to an object, or the referenced type otherwise. [ Note: If the captured entity is a reference to a
function, the corresponding data member is also a reference to a function. —end note ]

The sentence in bold states that for a reference captured by value then the type of the captured value is the type referred to, i.e. the reference aspect as been removed the crucial part being "or the referenced type otherwise". (NOTE: I haven't experimented with references to functions).

Finally, a vivid example showing that a reference captured by value involves a dereference.

class Bar
{
private:
int mValue;

public:
Bar(const Bar&) : mValue(9999)
{
}

public:
Bar(const int value) : mValue(value) {}
int GetValue() const { return mValue; }
void SetValue(const int value) { mValue = value; }
};

Bar bar(1);
Bar& rbar = bar;
Bar& rbar1 = bar;

std::cout << "&bar:" << &bar << ", &rbar:" << &rbar<< ", &rbar1:" << &rbar1 << '\n';

auto l2 = [bar, rbar, &rbar1]()
{
std::cout << "bar:" << bar.GetValue() << '\n';
std::cout << "rbar:" << rbar.GetValue() << '\n';
std::cout << "rbar1:" << rbar1.GetValue() << '\n';

std::cout << "&bar:" << &bar << ", &rbar:" << &rbar<< ", &rbar1:" << &rbar1 << '\n';
};

bar.SetValue(2);

l2();

The class bar provides a crude copy-constructor that sets the stored value to 9999. The following output is similar to that in the previous example in that the addresses of bar and rbar in the lambda differ from that of bar showing they're copies whilst rbar1 is the same. Secondly, the value of mValue stored within Bar is shown as 9999 for the first two captured variables meaning they were copy-constructed.

&bar:00D3FB0C, &rbar:00D3FB0C, &rbar1:00D3FB0C
bar:9999
rbar:9999
rbar1:2
&bar:00D3FAE0, &rbar:00D3FAE4, &rbar1:00D3FB0C

Making the copy-construct private (by commenting out the seemingly unnecessary 'public:') prevents compilation.

1>------ Build started: Project: References, Configuration: Debug Win32 ------
1>  main.cpp
1>c:\users\pete\desktop\references\references\main.cpp(85): error C2248: 'Bar::Bar' : cannot access private member declared in class 'Bar'
1>          c:\users\pete\desktop\references\references\main.cpp(59) : see declaration of 'Bar::Bar'
1>          c:\users\pete\desktop\references\references\main.cpp(54) : see declaration of 'Bar'
1>          c:\users\pete\desktop\references\references\main.cpp(59) : see declaration of 'Bar::Bar'
1>          c:\users\pete\desktop\references\references\main.cpp(54) : see declaration of 'Bar'

Writing this post has clarified the situation for me, I hope it helps you as well.

The sample code is available here.

Deep C (and C++)

Programming is hard. Programming correct C and C++ is particularly hard. Indeed, both in C and certainly in C++, it is uncommon to see a screenful containing only well defined and conforming code. Why do professional programmers write code like this? Because most programmers do not have a deep understanding of the language they are using. While they sometimes know that certain things are undefined or unspecified, they often do not know why it is so. In these slides we will study small code snippets in C and C++, and use them to discuss the fundamental building blocks, limitations and underlying design philosophies of these wonderful but dangerous programming languages.

Jon Jagger and I just released a slide deck to discuss the fundamentals of C and C++ (slideshare, pdf).

Solid C++ Code by Example

Sometimes I see code that is perfectly OK according to the definition of the language but which is flawed because it breaks too many established idioms and conventions of the language. I just gave a 90 minute workshop about Solid C++ Code at the ACCU 2010 conference in Oxford.

When discussing solid code it is important to work on “real” problems, not just toy examples and coding katas because they lack the required complexity to make discussions interesting. So, as a preparation I had developed, from scratch, an NTLM Authentication Library (pal) that can be used by a client to do NTLM authentication when retrieving a protected webpage on an IIS server. Then I picked out a few files, the encoding and decoding of NTLM messages, and tried to write it as solid as possible after useful discussions with ACCU friends and some top coders within my company. Then I “doped” the code, I injected impurities and bad stuff into the code, to produce these handouts. At the ACCU talk/workshop the audience read through the “doped” code and came up with things that could be improved while I did online coding (in Emacs of course) fixing the issues as they popped up. With loads of solid C++ coders in the room, I think we found most of the issues worth caring about, and we ended up with something that can be considered to be solid C++, something that appears to have been developed by somebody who cares about high quality code. Here are the slides that I used to summarize our findings. Feel free to use these slides for whatever you want. Perhaps you would like to run a similar talk in your development team? Contact me if you want the complete source code for the authentication library, or if you want to discuss ideas for running a similar talk yourself. I plan to publish the code on githup soon – so stay tuned.

UPDATE June 2010: The PAL library is now published on github. A much improved slide set is also available on slideshare.