2-day More Concurrent Thinking class at CppCon 2022

I am excited to be going to CppCon again this year, where I will be running a 2-day class: More Concurrent Thinking in C++: Beyond the Basics.

The class is onsite at the conference venue in Aurora, Colorado, USA, on Saturday 10th September 2022 and Sunday 11th September 2022, immediately before the main conference.

I'll be teaching you to think about the synchronization properties of the constructs you use, and how to work with the low level facilities provided by the C++20 standard to build high level abstractions. We'll look at actors, thread pools and executors, and how to design code that works with those abstractions.

We'll look at how to use low-level atomic operations to build lock-free data structures, and the issues that can arise when doing so. We'll also look at how scalability concerns can impact your design choices.

Finally, we'll look at the important issue of testing multithreaded code, and the tools we can use to help us find the causes of problems.

If this sounds interesting, sign up for the class when you register for the main conference.

I'll also be doing a presentation on Tuesday: An Introduction to Multithreading in C++20. This is a whistle-stop tour of the multithreading features of C++20, briefly covering each feature and what it is used for, and which you should choose by default.

Whether you come to my class and presentation or not, I hope to see you there!

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Online Concurrency Workshop at C++ on Sea 2021

The restrictions brought upon us by COVID-19 are not over yet, and C++ on Sea is the latest conference that will be running as an online-only conference.

I will be running my More Concurrent Thinking class as an online workshop for C++ on Sea on 30th June and 1st July 2021.

The workshop will run from 09:30 UTC to 18:15 UTC each day. For attendees from North and South America, this is likely quite an early morning, and may be a late night for attendees from the far East, so please check the times in your local timezone.

Tickets include the full day of "normal" conference presentations on 2nd July 2021. Get yours from the C++ On Sea tickets page.

I hope to see you there!

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Using atomics for thread synchronization in C++

In my previous blog post I wrote about spin locks, and how compilers must not move the locking loop above a prior unlock.

After thinking about this done more, I realised that is not something specific to locks — the same issue arises with any two step synchronization between threads.

Consider the following code

std::atomic<bool> ready1{false};
std::atomic<bool> ready2{false};

void thread1(){
  ready1.store(true, std::memory_order_release);
  while(!ready2.load(std::memory_order_acquire)){}
}

void thread2() {
  while(!ready1.load(std::memory_order_acquire)) {}
  ready2.store(true, std::memory_order_release);
}

thread1 sets ready1 to true, then waits for thread2 to set ready2 to true. Meanwhile, thread2 waits for ready1 to be true, then sets ready2 to true.

This is almost identical to the unlock/lock case from the previous blog post, except the waiting thread is just using plain load rather than exchange.

If the compiler moves the wait loop in thread1 above the store then both threads will hang forever. However it cannot do this for the same reason the spinlocks can't deadlock in the previous post: the store has to be visible to the other thread in a finite period if time, so must be issued before the wait loop. https://eel.is/c++draft/intro.multithread#intro.progress-18

An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.

If the optimizer moved the store across the loop in thread1, then it could not guarantee that the value became visible to the other thread in a finite period of time. Therefore such an optimization is forbidden.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Can non-overlapping spinlocks deadlock in C++?

There has been discussion on Twitter recently about whether or not the C++ memory model allows spinlocks to deadlock if they just use memory_order_acquire in lock and memory_order_release in unlock, due to compiler optimizations. The case in question is where a thread locks one mutex, unlocks it, and locks a second: can the compiler reorder the second lock above the first unlock? If it does, and another thread does the same in the reverse order, with the same optimization, then sequential locks could deadlock.

Here is the code in question, with all the lock/unlock code inlined.

std::atomic<bool> mutex1{false};
std::atomic<bool> mutex2{false};

int x=0;
int y=0;

void thread1(){
  while(mutex1.exchange(true,std::memory_order_acquire)){}  // #1
  x=1;
  mutex1.store(false,std::memory_order_release); // #2

  while(mutex2.exchange(true,std::memory_order_acquire)){} // #3
  y=1;
  mutex2.store(false,std::memory_order_release); // #4
}

void thread2(){
  while(mutex2.exchange(true,std::memory_order_acquire)){} // #5
  x=2;
  mutex2.store(false,std::memory_order_release); // #6

  while(mutex1.exchange(true,std::memory_order_acquire)){} // #7
  y=2;
  mutex1.store(false,std::memory_order_release); // #8
}

For there to even be the possibility of deadlock, thread1 must successfully execute line #1 before thread2 successfully executes line #7, and thread2 must successfully execute line #5 before thread1 successfully executes line #3. Because these are RMW operations, the threads must agree on the ordering.

The modification order of mutex1 must thus be #1(success), #2, #7(success), #8. Similarly, the modification order of mutex2 must be #5(success), #6, #3(success), #4.

All threads must agree on these modification orders. https://eel.is/c++draft/intro.multithread#intro.races-4

From the point of view of thread1, everything must run in program order: compilers can only optimize things as long as they run "as if" in program order.

The store to mutex1 at #2 is guaranteed to be visible to thread2 in "a finite period of time". https://eel.is/c++draft/intro.multithread#intro.progress-18

Consequently, thread2 must eventually see that store at line #7, even if it executes line #7 a large number of times first.

Therefore, the compiler cannot move line #3 completely above line #2, since doing so would not guarantee the visibility of #2 to other threads in a finite period of time. It can move an arbitrary number of executions of line #3 above line #2 (all of which will see that mutex2 is still true), but not all the executions of line #3.

Given that thread2 eventually sees the store from #2 at line #7, the exchange at line #7 will eventually succeed, and thread2 will eventually complete.

Likewise, the store at #6 must become visible to thread1 in a finite period of time. Therefore the exchange at line #3 will eventually see the value stored by

6, the exchange will succeed, and thread1 will complete, and the compiler is

not allowed to move all the executions of line #7 above #6.

No amount of compiler optimization is allowed to break this, so no: spinlocks cannot deadlock if they don't overlap.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Ticket Maps

It has been an increasingly common scenario that I've encountered where you have some ID that's monotonically increasing, such as a subscription or connection index, or user ID, and you need your C++ program to hold some data that's associated with that ID value. The program can then pass round the ID, and use that ID to access the associated data at a later point.

Over time, IDs can become invalidated, so the data associated with that value is removed, and you end up with a sparse set of currently-active IDs. You would therefore naturally lean towards using a map (whether a std::map, std::unordered_map or some other custom map) to associate the data with the ID.

Often such maps are implemented as node-based containers, which means that the nodes can be allocated at disparate memory addresses, which is bad for cache performance. Adding and removing nodes also always requires memory allocation/deallocation.

In his "Better Code: Relationships" presentation, Sean Parent describes an alternative implementation, which he calls the "Russian Coat-Check Algorithm". In this algorithm, the map is implemented as a vector of pairs of key/optional data. Because the keys come from a monotonically increasing index, the vector is always sorted, and inserts are always at the end. Entries can be removed by clearing the data, and if there are too many empty entries then the vector can be compacted. Lookups are always fast, because the vector is always sorted, so a simple binary search will find the right element.

Inspired by watching Sean's presentation at ACCU 2021 last week, I implemented what I call a Ticket Map (it maps a Ticket to a Value). This is an implementation of the algorithm Sean described, fleshed out to a full container. When you insert a value, it is assigned the next available ticket value. You can later access or erase the value using that ticket.

#include <string>
#include <iostream>
#include "ticket_map.hpp"

int main(){
    jss::ticket_map<int,std::string> map;

    auto ticket1=map.insert("hello");
    auto ticket2=map.insert("world");

    std::cout<<map[ticket1]<<" "<<map[ticket2]<<std::endl;

    map.erase(ticket1);
}

You can of course iterate through the container: in this case the iterators are Input Iterators, where the value_type is a std::pair<Ticket const&,Value&>. This allows you to access both the tickets and the raw elements, but also allows the iterator to provide a nice view over the data without exposing the std::optional implementation detail.

#include <string>
#include <iostream>
#include "ticket_map.hpp"

int main(){
    jss::ticket_map<int,std::string> map;

    auto ticket1=map.insert("hello");
    auto ticket2=map.insert("world");
    auto ticket3=map.insert("goodbye");

    for(auto& [ticket,value]: map){
      std::cout<<ticket<<": "<<value<<std::endl;
    }
}

The code is available on GitHub under the Boost Software License.

Enjoy!

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Online Concurrency Classes

With all the restrictions brought upon us by COVID-19, many C++ conferences are moving online.

This includes Cppcon and NDC Tech Town, both of which are being run as 100% virtual conferences this year.

I will be running my More Concurrent Thinking class as an online class for both conferences.

The class at NDC Tech Town will be a 2-day class running on 31st August 2020 and 1st September 2020, 9am - 5pm CEST, which is 0700 - 1500 UTC.

The class at Cppcon will be a 3-day class running on 9th September 2020 to 11th September 2020, 11am - 5pm EDT, which is 1500 - 2100 UTC.

The content is the same in both cases, though the timings are clearly different. Hopefully one or other fits in with your local timezone.

I hope to see you there!

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Invariants and Preconditions

I tend to think about invariants and preconditions a lot. Pretty much every class has invariants, and most functions have preconditions. I don't think they are complicated concepts, but somehow they seem to confuse people anyway, so I decided it was time to write down some of my thoughts.

Invariants

A class invariant is something that is always true for every instance of that class. Users of objects of that class never see an object for which the invariant is not true; it is true for all states of all objects of that class at all times. This is what makes it invariant.

In order to perform the required operations, class member functions may temporarily break invariants, and then restore them afterwards. Unless you allow concurrent invocations of member functions on the same object, or deliberately pass the object to another function from inside a member function, this is OK, as the outside world will not be able to interact with the object when it is in such a state. It does mean you need to be especially careful when calling other functions from inside a member function, to ensure that the invariants still hold.

Invariants are important. If you writing a function that operates on an object, the only thing that is guaranteed to be true about it is that the class invariants hold, unless you choose to impose additional preconditions.

Preconditions

Preconditions are those that must be true when a function is called. Every operation on a class has the implicit precondition that the class invariants hold, but operations can add additional preconditions. For example, you always call v.empty() to check if a vector is empty, but you only call v.front() to retrieve the first element if the vector is not empty. v[i] and v.at(i) differ only in their preconditions; v[i] requires the vector to have more than i elements, whereas v.at(i) is specified to work for any vector, throwing if i is out of range.

Some people like to write classes with two-stage construction — first you default-construct the object, then you call x.init() or similar to "finish off" the initialization, and you can't call any other function on that object until after the initialization is complete. Though I don't like this pattern it is valid code; every member function has a precondition that x.init() has been called.

Some people like to say that such a class can have "invariants" that do not hold until after initialization is complete. I think this is a misuse of the term; invariants must always hold for every object of a class, outside the internals of its member functions. As I just stated above, what such a class really has is a precondition on every function, not an invariant.

If I write a function

void do_stuff(X& x){
  // my code here
}

then I can rely on the invariants of X holding, but cannot rely on anything else unless I add preconditions to do_stuff.

Likewise, it should not be possible to do anything to an object that breaks its invariants. If you can then either you gave a bug, or they are not really invariants, just preconditions for most operations.

Some people like to write classes that have operations to transfer the internals from one object to another, leaving the source in a special "emptier than empty" state. This can be good for efficiency, especially when doing otherwise would require allocating resources to return the source to the plain "empty" state, and it is likely that the source will be destroyed immediately anyway.

Again, this is a perfectly reasonable thing to do, particularly when the resources are expensive to allocate and performance of the code is important. The problem comes when people then want to say that the class invariants don't hold for this state. Just as with the pre-initialization state above, I think this is a misuse of the term; they are not invariants, it is just that most operations have a precondition that the object is not "emptier than empty".

Move semantics

Since C++11, C++ has had the concept of "moving" objects, transferring resources from the source to the destination. This is an important facility for expressiveness of code and efficiency. Objects that own resources such as std::unique_ptr, std::thread or std::fstream can now be moved, transferring ownership of the resource from the source to the destination. This allows such objects to be put in containers, and transferred between scopes, which can greatly simplify code. Likewise, containers like std::vector can be moved, transferring the entire set of contained objects from one vector to another without reallocation. These facilities are of great importance for writing clean, efficient code.

One thing all these operations have in common is that after a move, the class invariants still hold for the source object. This should go without saying: the source is still an object of its type, so of course the invariants hold.

The issue is with things like std::list, where some implementations allocate a sentinel node for the head/tail of the list in the default constructor, which is thus transferred during move operations, so the move constructor has to allocate a new sentinel node for the source, in order to ensure the class invariants still hold. A similar thing occurs with anything that uses the pimpl idiom: if the implementation pointer is transferred then either the class invariants must allow for there to be a null implementation pointer, or a new implementation object must be allocated.

Some people therefore argue that move operations should allow the source to be left with broken invariants, as a hollow shell of an object, in an "emptier than empty" state. I think this is misuse of the term "invariants". By all means, steal the internals and leave the source with a null pointer to the internals, or whatever. However, you now need to change the invariants of your class to allow this, or they are not invariants, because they would no longer apply to all objects of that class. There is no need to update any of the functions on your class to handle this new state, but if you do not do so, then any functions that don't work for objects in this "emptier than empty" state now have an additional precondition that the object is not in that state.

This is all perfectly fine, objects with such states are plentiful, and have legitimate reasons for existing, but it is important to accept that their functions now have preconditions. My do_stuff function above can only call member functions that are safe to use in such a state unless it too has a precondition that the object x is not in such a state.

From a user perspective, it would be nice to be able to query such a state, so I could know what operations are permitted. For example, std::future provides a member function valid so you can call f.valid() before trying to do anything with it, and std::thread provides the joinable member function, which can be used to see if the object holds a thread handle. My do_stuff function can then call x.is_emptier_than_empty() in order to check for the special state and take appropriate action. That said, this is a "nice to have": the absence of such a function doesn't mean that the state can't exist, just that it's potentially harder to deal with.

Interaction with other code

If you pass an object to a function or class, then you need to know what that function requires of your object and ensure that those expectations are met. If the function expects your object to be a container with at least one element and you pass an empty container, then you've broken the expectations and your code has a bug. Likewise, if the function expects to be able to call x.foo() on your object, but foo cannot be called on an "emptier than empty" object, and you passed such an object to the function, your code has a bug.

The difficulty comes when such a state arises as a consequence of other actions. If x.foo() can leave x in an "emptier than empty" state, then do_stuff had better be prepared to handle the scenario, otherwise there is a bug somewhere. Where the bug is depends on the documentation: if do_stuff requires that you can always perform certain actions on the object it is passed as a parameter, and those actions require that the object is not "emptier than empty", then the bug is in the caller, or maybe the implementation of class X. If there is no such documentation, the bug is in do_stuff.

Note that requiring that x.foo() can only be called with objects that are not "emptier than empty" is a precondition. It should be perfectly acceptable for do_stuff to call any functions on x that do not have preconditions, even if x ends up in an "emptier than empty" state.

This is the exactly the same scenario we get with other code. For example if you pass a vector to a function that requires that the vector isn't empty, the function can erase the last element without a problem. If it wishes to erase the last element again, thus removing two elements in total, then it needs to check for and handle the case that the first erasure left the vector empty. Calling v.clear() or v.empty() or v.push_back(y) would be fine without checking, as those functions have no preconditions.

If x.foo() is instead spelled y=std::move(x), then nothing changes: it is perfectly fine for x to end up in an "emptier than empty" state if do_stuff knows how to handle such a state, or doesn't care, because it doesn't touch x again.

One option is for do_stuff to say that after a move-assignment like this, it can't rely on x having any particular value, but it is still an instance of class X, so its invariants must hold, and therefore any operation without a precondition can be used. It can therefore call x.is_emptier_than_empty() and behave appropriately.

The other option is for do_stuff to place stronger requirements on x, such as requiring that it does not end up "emptier than empty" after a move, or even requiring that it has a specific state.

Valid but unspecified states

The standard library has chosen option 1: after move, objects must be valid - i.e. still actually be objects of the expected type, and not destroyed - but they have an unspecified state, so the function cannot rely on any specific value, and can therefore only perform operations without preconditions, until it has verified that the preconditions for other operations hold.

This holds both for standard library types such as std::string or std::vector<int>, but also for user defined types that you use with the standard library. If you write

std::string s="hello";
std::string s2=std::move(s);

then s was the source of a move operation, and thus is in a valid, but unspecified state. For implementations that use the Small String Optimization, such that short strings are stored entirely within the string object itself, without requiring dynamic allocation, this might mean that s still has the value hello after the move, because then it doesn't have to store an empty string value in s as well as copying the contents to s2. It is also possible that the implementation might clear s, for consistency with longer strings. Both are valid choices, as in either case s is still a std::string object, and the exact details of the state are unspecified.

Likewise, if you write

std::vector<MyWidget> v=create_some_widgets();
v.insert(v.begin(),MyWidget());

then that call to v.insert is going to have to move the widgets around in order to make room at the beginning for the extra one. The library requires that moving widgets leaves them in a valid, but unspecified, state. In this case, that means that having moved a widget from one position to another, it is OK to move another widget on top of the first one, as that is the only operation the vector will perform anyway. If you pass your object to a standard library function that does something other than move things around (such as std::sort or std::remove_if), then you need to check that the other operations the function might do can still be done on a moved-from object. By calling the library function you are stating that your objects meet (and will continue to meet) any preconditions you have imposed on the operations that the library function specification says it might perform.

Invariants and Concurrency

Right back at the beginning of this article I stated that "Users of objects of that class never see an object for which the invariant is not true", but also that "class member functions may temporarily break invariants, and then restore them afterwards". These two things don't work together very well if an object can be accessed concurrently from multiple threads, and this is one aspect that makes writing concurrent code hard, especially if you try to use fine-grained locks or atomic operations.

Consider a simple class Names which holds two vectors, one for first names and one for last names:

class Names{
  std::vector<std::string> firstNames,lastNames;
};

I want the invariant of this class to be that the number of elements in firstNames and lastNames is always the same, so that I can add operations to this class knowing that is always true. There is a strong argument that this ought to be a std::vector<std::pair<std::string,std::string>> rather than two vectors, but assume that the class author has a legitimate reason for the separate vectors.

At first glance, a member function to add an entry is quite simple:

void Names::addEntry(std::string const& first,std::string const& last){
    firstNames.push_back(first);
    lastNames.push_back(last);
}

However, even for the single-threaded case, this isn't good: if lastNames.push_back(last) throws an exception, then our invariant is broken, as we successfully added an element to firstNames but not to lastNames. We therefore need to handle that case:

void Names::addEntry(std::string const& first,std::string const& last){
  firstNames.push_back(first);
  try{
    lastNames.push_back(last);
  } catch(...){
    firstNames.resize(lastNames.size());
    throw;
  }
}

Now, if lastNames.push_back(last) throws, then std::vector guarantees that lastNames is unchanged, so we can ensure our invariant holds again by shrinking firstNames back to the same size, and the single-threaded case is now sorted. If our Names object is only ever accessed by a single thread, then the invariant holds at all times outside the internals of a member function.

What about the concurrent case? If we call Names::addEntry from multiple threads, then everything is broken: std::vector is not safe for concurrent access from multiple threads, so we have data races and undefined behaviour. Using a ThreadSafeVector class instead which provides the operations we need but is safe for concurrent access removes these data races and the undefined behaviour, but doesn't fix the overall problem. Thread safety is not composable. In this case, the invariants are still broken during the call to Names::addEntry, so a concurrent call will see an object with broken invariants: the thread safety of the two vectors doesn't matter if the second thread can see firstNames and lastNames with different sizes.

We can fix the problem by using a mutex: the mutex lock prevents the second thread from accessing the internals of the object until the first thread has finished, so the second thread cannot see the state with a broken invariant. It also allows us to revert back to std::vector rather than ThreadSafeVector since the individual vectors are only ever accessed by one thread at a time:

class Names{
  std::mutex mutex;
  std::vector<std::string> firstNames,lastNames;

public:
  void addEntry(std::string const& first,std::string const& last){
    std::lock_guard guard(mutex);
    firstNames.push_back(first);
    try{
      lastNames.push_back(last);
    } catch(...){
      firstNames.resize(lastNames.size());
      throw;
    }
  }
};

This is one of the big reasons why lock-free data structures are so hard to write: we cannot just stick a mutex lock on the outside to ensure other threads never see broken invariants. Instead, we must ensure that the invariants hold at all times, even during the operation of member functions. For this reason, lock-free data structures are often designed so that as much as possible is done off to the side, without modifying the core data structures, and then either the entire change is committed with a single atomic operation, or care is taken to ensure that the invariants still hold after each incremental step.

End note

Invariants are important. They are vital to working with objects, and without them it is much harder to write correct code. However, you have to choose your invariants carefully: invariants that make the rest of your code easier to reason about can have a cost, so you have to be aware of the trade-offs. Like everything in programming it is up to you what trade-off you choose: simplicitly vs performance is a common choice, but it can also be a choice of which operations are fast and which are slow. Whatever you choose, there will be cases where the other choice was more appropriate.

Whatever you choose, you must be honest with yourself, and the users of your class. Don't claim something is an invariant just because it holds most of the time; it must hold all of the time. Obviously, you can make it so a precondition of a function that objects it operates on are only in certain states, and then write your program logic to ensure that the conditions are met, but don't confuse the two.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

CppCon 2019 Trip Report and Slides

Having been back from CppCon 2019 for over a week, I thought it was about time I wrote up my trip report.

The venue

This year, CppCon was at a new venue: the Gaylord Rockies Resort near Denver, Colorado, USA. This is a huge conference centre, currently surrounded by vast tracts of empty space, though people told me there were many plans for developing the surrounding area.

There were hosting multiple conferences and events alongside CppCon; it was quite amusing to emerge from the conference rooms and find oneself surrounded by people in ballgowns and fancy evening wear for an event in the nearby ballroom!

There were a choice of eating establishments, but they all had one thing in common: they were overpriced, taking advantage of the captured nature of the hotel clientelle. The food was reasonably nice though.

The size of the venue did make for a fair amount of walking around between sessions.

Overall the venue was nice, and the staff were friendly and helpful.

Pre-conference Workshop

I ran a 2-day pre-conference class, entitled More Concurrent Thinking in C++: Beyond the Basics, which was for those looking to move beyond the basics of threads and locks to the next level: high level library and application design, as well as lock-free programming with atomics. This was well attended, and I had interesting discussions with people over lunch and in the evening.

If you would like to book this course for your company, please see my training page.

The main conference

Bjarne Stroustrup kicked off the main conference with his presentation on "C++20: C++ at 40". Bjarne again reiterated his vision for C++, and outlined some of the many nice language and library features we have to make development easier, and code clearer and less error-prone.

Matt Godbolt's presentation on "Compiler Explorer: Behind the Scenes" was good and entertaining. Matt showed how he'd evolved Compiler Explorer from a simple script to the current website, and demonstrated some nifty things about it along the way, including features you might not have known about such as the LLVM instruction cost view, or the new "run your code" facility.

In "If You Can't Open It, You Don't Own It", Matt Butler talked about security and trust, and how bad things can happen if something you trust is compromised. Mostly this was obvious if you thought about it, but not something we necessarily do think about, so it was nice to be reminded, especially with the concrete examples. His advice on what we can do to build more secure systems, and existing and proposed C++ features that help was also good.

Barbara Geller and Ansel Sermersheim made an enthusiastic duo presenting "High performance graphics and text rendering on the GPU for any C++ application". I am excited about the potential for their Copperspice wrapper for the Vulkan rendering library: rendering 3D graphics portably is hard, and text more so.

Andrew Sutton's presentation on "Reflections: Compile-time Introspection of Source Code" was an interesting end to Monday. There is a lot of scope for eliminating boilerplate if we can use reflection, so it is good to see the progress being made on it.

Tuesday morning began with a scary question posed by Michael Wong, Paul McKenney and Maged Michael: "Will Your Code Survive the Attack of the Zombie Pointers?" Currently, if you delete an object or call free then all copies of those pointers immediately become invalid across all threads. Since invalid pointers can't even be compared, this can result in zombies eating your brains. Michael, Paul and Maged looked at what we can do in our code to avoid this, and what they are proposing for the C++ Standard to fix the problem.

Andrei Alexandrescu's presentation on "Speed is found in the minds of people" was an insightful look at optimizing sort. Andrei showed how compiler and processor features mean that performance can be counter-intuitive, and code with a higher algorithmic complexity can run faster in the right conditions. Always use infinite loops (except for most cases).

I love the interactive slides in Hana Dusikova's presentation "A State of Compile Time Regular Expressions". She is pushing the boundaries of compile-time coding to make our code perform better at runtime. std::regex can be slow compared to other regular expression libraries, but ctre can be much better. I am excited to see how this can be extended to compile-time parsing of other DSLs.

In "Applied WebAssembly: Compiling and Running C++ in Your Web Browser", Ben Smith showed the use of WebAssembly as a target to allow you to write high-performance C++ code that will run in a suitable web browser on any platform, much like the "Write once, run anywhere" promise of Java. I am interested to see where this can lead.

Samy Al Bahra and Paul Khuong presented the final session I attended: "Abusing Your Memory Model for Fun and Profit". They discussed how they have written code that relies on the stronger memory ordering requirements imposed by X86 CPUs over and above the standard C++ memory model in order to write high-performance concurrent data structures. I am intrigued to see if any of their techniques can be used in a portable fashion, or used to improve Just::Thread Pro.

Whiteboard code

This year there were a few whiteboards around the conference area for people to use for impromptu discussions. One of them had a challenge written on it:

"Can you write a requires expression that ensures a class has a member function with a specified signature?"

This led to a lot of discussion, which Arthur O'Dwyer wrote up as a blog post. Though the premise of the question is wrong (we shouldn't want to constrain on such specifics), it was fun, interesting and enlightening trying to think how one might do it — it allows you to explore the corner cases of the language in ways that might turn out to be useful later.

My presentation

As well as the workshop, I presented a talk on "Concurrency in C++20 and beyond", which was on Tuesday afternoon. It was in an intermediate-sized room, and I believe was well attended, though it was hard to see the audience with the bright stage lighting. There were a number of interesting questions from the audience addressing the issues raised in my presentation, which is always good, though the acoustics did make it hard to hear some of them.

Slides are available here.

~trip_report()

So that was an overview of another awesome CppCon. I love the in-person interactions with so many people involved in using C++ for such a wide variety of things. Everyone has their own perspective, and I always learn something.

The videos are being uploaded incrementally to the CppCon YouTube channel, so hopefully the video of my presentation and the ones above that aren't already available will be uploaded soon.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , , , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

CppCon 2019 Class, Presentation and Book Signing

It is now less than a month to this year's CppCon, which is going to be in Aurora, Colorado, USA for the first time this year, in a change from Bellevue where it has been for the last few years.

The main conference runs from 15th-20th September 2019, but there are also pre-conference classes on 13th and 14th September, and post-conference classes on 21st and 22nd September.

I will be running a 2-day pre-conference class, entitled More Concurrent Thinking in C++: Beyond the Basics, which is for those looking to move beyond the basics of threads and locks to the next level: high level library and application design, as well as lock-free programming with atomics. You can book your place as part of the normal CppCon registration.

I will also be presenting a session during the main conference on "Concurrency in C++20 and beyond".

Finally, I will also be signing copies of the second edition of my book C++ Concurrency In Action now that it is in print.

I look forward to seeing you there!

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

The Power of Hidden Friends in C++

"Friendship" in C++ is commonly thought of as a means of allowing non-member functions and other classes to access the private data of a class. This might be done to allow symmetric conversions on non-member comparison operators, or allow a factory class exclusive access to the constructor of a class, or any number of things.

However, this is not the only use of friendship in C++, as there is an additional property to declaring a function or function template a friend: the friend function is now available to be found via Argument-Dependent Lookup (ADL). This is what makes operator overloading work with classes in different namespaces.

Argument Dependent Lookup at Work

Consider the following code snippet:

namespace A{
  class X{
  public:
    X(int i):data(i){}
  private:
    int data;
    friend bool operator==(X const& lhs,X const& rhs){
      return lhs.data==rhs.data;
    }
  };
}
int main(){
  A::X a(42),b(43);
  if(a==b) do_stuff();
}

This code snippet works as you might expect: the compiler looks for an implementation of operator== that works for A::X objects, and there isn't one in the global namespace, so it also looks in the namespace where X came from (A), and finds the operator defined as a friend of class X. Everything is fine. This is ADL at work: the argument to the operator is an A::X object, so the namespace that it comes from (A) is searched as well as the namespace where the usage is.

Note, however, that the comparison operator is not declared anywhere other than the friend declaration. This means that it is only considered for name lookup when one of the arguments is an X object (and thus is "hidden" from normal name lookup). To demonstrate this, let's define an additional class in namespace A, which is convertible to 'X':

namespace A{
  class Y{
  public:
    operator X() const{
      return X(data);
    }
    Y(int i):data(i){}
  private:
    int data;
  };
}
A::Y y(99);
A::X converted=y; // OK

Our Y class has a conversion operator defined, so we can convert it to an X object at will, and it is also in namespace A. You might think that we can compare Y objects, because our comparison operator takes an X, and Y is convertible to X. If you did, you'd be wrong: the comparison operator is only visible to name lookup if one of the arguments is an X object.

int main(){
  A::Y a(1),b(2);
  if(a==b) // ERROR: no available comparison operator
    do_stuff();
}

If we convert one of the arguments to an X then it works, because the comparison operator is now visible, and the other argument is converted to an X to match the function signature:

int main(){
  A::Y a(1),b(2);
  if(A::X(a)==b) // OK
    do_stuff();
}

Similarly, if we declare the comparison operator at namespace scope, everything works too:

namespace A{
  bool operator==(X const& lhs,X const& rhs);
}
int main(){
  A::Y a(1),b(2);
  if(a==b) // OK now
    do_stuff();
}

In this case, the arguments are of type Y, so namespace A is searched, which now includes the declaration of the comparison operator, so it is found, and the arguments are converted to X objects to do the comparison.

If we omit this namespace scope definition, as in the original example, then this function is a hidden friend.

This isn't just limited to operators: normal functions can be defined in friend declarations too, and just as with the comparison operator above, if they are not also declared at namespace scope then they are hidden from normal name lookup. For example:

struct X{
  X(int){}
  friend void foo(X){};
};
int main(){
    X x(42);
    foo(x); // OK, calls foo defined in friend declaration
    foo(99); // Error: foo not found, as int is not X
    ::foo(x); // Error: foo not found as ADL not triggered
}

Benefits of Hidden Friends

The first benefit of hidden friends is that it avoids accidental implicit conversions. In our example above, comparing Y objects doesn't implicitly convert them to X objects to use the X comparison unless you explicitly do something to trigger that behaviour. This can avoid accidental uses of the wrong function too: if I have a function wibble that takes an X and wobble that takes a Y, then a typo in the function name won't trigger the implicit conversion to X:

class X{
friend void wibble(X const&){}
};

class Y{
friend void wobble(Y const&){}
public:
operator X() const;
};

int main(){
  Y y;
  wibble(y); // Error no function wibble(Y)
}

This also helps spot errors where the typo was on the definition: we meant to define wibble(Y) but misspelled it. With "normal" declarations, the call to wibble(y) would silently call wibble(X(y)) instead, leading to unexpected behaviour. Hopefully this would be caught by tests, but it might make it harder to identify the problem as you'd be checking the definition of wobble, wondering why it didn't work.

Another consequence is that it makes it easier for the compiler: the hidden friends are only checked when there is a relevant argument provided. This means that there are fewer functions to consider for overload resolution, which makes compilation quicker. This is especially important for operators: if you have a large codebase, you might have thousands of classes with operator== defined. If they are declared at namespace scope, then every use of == might have to check a large number of them and perform overload resolution. If they are hidden friends, then they are ignored unless one of the expressions being compared is already of the right type.

In order to truly understand the benefits and use them correctly, we need to know when hidden friends are visible.

Rules for Visibility of Hidden Friends

Firstly, hidden friends must be functions or function templates; callable objects don't count.

Secondly, the call site must use an unqualified name — if you use a qualified name, then that checks only the specified scope, and disregards ADL (which we need to find hidden friends).

Thirdly, normal unqualified lookup must not find anything that isn't a function or function template. If you have a local variable int foo;, and try to call foo(my_object) from the same scope, then the compiler will rightly complain that this is invalid, even if the type of my_object has a hidden friend named foo.

Finally, one of the arguments to the function call must be of a user-defined type, or a pointer or reference to that type.

We now have the circumstances for calling a hidden friend if there is one:

my_object x;
my_object* px=&x;

foo(x);
foo(px);

Both calls to foo in this code will trigger ADL, and search for hidden friends.

ADL searches a set of namespaces that depend on the type of my_object, but that doesn't really matter for now, as you could get to normal definitions of foo in those namespaces by using appropriate qualification. Consider this code:

std::string x,y;
swap(x,y);

ADL will find std::swap, since std::string is in the std namespace, but we could just as well have spelled out std::swap in the first place. Though this is certainly useful, it isn't what we're looking at right now.

The hidden friend part of ADL is that for every argument to the function call, the compiler builds a set of classes to search for hidden friend declarations. This lookup list is built as follows from a source type list, which is initially the types of the arguments supplied to the function call.

Our lookup list starts empty. For each type in the source type list:

  • If the type being considered is a pointer or reference, add the pointed-to or referenced type to the source type list
  • Otherwise, if the type being considered is a built-in type, do nothing
  • Otherwise, if the type is a class type then add it to the lookup list, and check the following:
    • If the type has any direct or indirect base classes, add them to the lookup list
    • If the type is a member of a class, add the containing class to the lookup list
    • If the type is a specialization of a class template, then:
    • add the types of any template type arguments (not non-type arguments or template template arguments) to the source type list
    • if any of the template parameters are template template parameters, and the supplied arguments are member templates, then add the classes of which those templates are members to the lookup list
  • Otherwise, if the type is an enumerated type that is a member of a class, add that class to the lookup list
  • Otherwise, if the type is a function type, add the types of the function return value and function parameters to the source type list
  • Otherwise, if the type is a pointer to a member of some class X, add the class X and the type of the member to the source type list

This gets us a final lookup list which may be empty (e.g. in foo(42)), or may contain a number of classes. All the classes in that lookup list are now searched for hidden friends. Normal overload resolution is used to determine which function call is the best match amongst all the found hidden friends, and all the "normal" namespace-scope functions.

This means that you can add free functions and operators that work on a user-defined type by adding normal namespace-scope functions, or by adding hidden friends to any of the classes in the lookup list for that type.

Adding hidden friends via base classes

In a recent blog post, I mentioned my strong_typedef implementation. The initial design for that used an enum class to specify the permitted operations, but this was rather restrictive, so after talking with some others (notably Peter Sommerlad) about alternative implementation strategies, I switched it to a mixin-based implementation. In this case, the Properties argument is now a variadic parameter pack, which specifies types that provide mixin classes for the typedef. jss::strong_typedef<Tag,Underlying,Prop> then derives from Prop::mixin<jss::strong_typedef<Tag,Underlying,Prop>,Underlying>. This means that the class template Prop::mixin can provide hidden friends that operate on the typedef type, but are not considered for "normal" lookup. Consider, for example, the implementation of jss::strong_typedef_properties::post_incrementable:

struct post_incrementable {
    template <typename Derived, typename ValueType> struct mixin {
        friend Derived operator++(Derived &self, int) noexcept(
            noexcept(std::declval<ValueType &>()++)) {
            return Derived{self.underlying_value()++};
        }
    };
};

This provides an implementation of operator++ which operates on the strong typedef type Derived, but is only visible as a hidden friend, so if you do x++, and x is not a strong typedef that specifies it is post_incrementable then this operator is not considered, and you don't get accidental conversions.

This makes the strong typedef system easily extensible: you can add new property types that define mixin templates to provide both member functions and free functions that operate on the typedef, without making these functions generally visible at namespace scope.

Hidden Friends and Enumerations

I had forgotten that enumerated types declared inside a class also triggered searching that class for hidden friends until I was trying to solve a problem for a client recently. We had some enumerated types that were being used for a particular purpose, which we therefore wanted to enable operations on that wouldn't be enabled for "normal" enumerated types.

One option was to specialize a global template as I described in my article on Using Enum Classes as Bitfields, but this makes it inconvenient to deal with enumerated types that are members of a class (especially if they are private members), and impossible to deal with enumerated types that are declared at local scope. We also wanted to be able to declare these enums with a macro, which would mean we couldn't use the specialization as you can only declare specializations in the namespace in which the original template is declared, and the macro wouldn't know how to switch namespaces, and wouldn't be usable at class scope.

This is where hidden friends came to the rescue. You can define a class anywhere you can define an enumerated type, and hidden friends declared in the enclosing class of an enumerated type are considered when calling functions that take the enumerated as a parameter. We could therefore declare our enumerated types with a wrapper class, like so:

struct my_enum_wrapper{
  enum class my_enum{
    // enumerations
  };
};
using my_enum=my_enum_wrapper::my_enum;

The using declaration means that other code can just use my_enum directly without having to know or care about my_enum_wrapper.

Now we can add our special functions, starting with a function to verify this is one of our special enums:

namespace xyz{
  constexpr bool is_special_enum(void*) noexcept{
    return false;
  }
  template<typename T>
  constexpr bool is_special_enum() noexcept{
    return is_special_enum((T*)nullptr);
  }
}

Now we can say xyz::is_special_enum<T>() to check if something is one of our special enumerated types. By default this will call the void* overload, and thus return false. However, the internal call passes a pointer-to-T as the argument, which invokes ADL, and searches hidden friends. We can therefore add a friend declaration to our wrapper class which will be found by ADL:

struct my_enum_wrapper{
  enum class my_enum{
    // enumerations
  };
  constexpr bool is_special_enum(my_enum*) noexcept
  {
    return true;
  }
};
using my_enum=my_enum_wrapper::my_enum;

Now, xyz::is_special_enum<my_enum>() will return true. Since this is a constexpr function, it can be used in a constant expression, so can be used with std::enable_if to permit operations only for our special enumerated types, or as a template parameter to specialize a template just for our enumerated types. Of course, some additional operations can also be added as hidden friends in the wrapper class.

Our wrapper macro now looks like this:

#define DECLARE_SPECIAL_ENUM(enum_name,underlying_type,...)\
struct enum_name##_wrapper{\
  enum class enum_name: underlying_type{\
    __VA_ARGS__\
  };\
  constexpr bool is_special_enum(enum_name*) noexcept\
  {\
    return true;\
  }\
};\
using enum_name=enum_name##_wrapper::enum_name;

so you can declare a special enum as DECLARE_SPECIAL_ENUM(my_enum,int,a,b,c=42,d). This works at namespace scope, as a class member, and at local scope, all due to the hidden friend.

Summary

Hidden Friends are a great way to add operations to a specific type without permitting accidental implicit conversions, or slowing down the compiler by introducing overloads that it has to consider in other contexts. They also allow declaring operations on types in contexts that otherwise you wouldn't be able to do so. Every C++ programmer should know how to use them, so they can be used where appropriate.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

strong_typedef – Create distinct types for distinct purposes

One common problem in C++ code is the use of simple types for many things: a std::string might be a filename, a person's name, a SQL query string or a piece of JSON; an int could be a count, an index, an ID number, or even a file handle. In his 1999 book "Refactoring" (which has a second edition as of January 2019), Martin Fowler called this phenomenon "Primitive Obsession", and recommended that we use dedicated classes for each purpose rather than built-in or library types.

The difficulty with doing so is that built-in types and library types have predefined sets of operations that can be done with them from simple operations like incrementing/decrementing and comparing, to more complex ones such as replacing substrings. Creating a new class each time means that we have to write implementations for all these functions every time. This duplication of effort raises the barrier to doing this, and means that we often decide that it isn't worthwhile.

However, by sticking to the built-in and library types, we can end up in a scenario where a function takes multiple parameters of the same type, with distinct meanings, and no clear reason for any specific ordering. In such a scenario, it is easy to get the parameters in the wrong order and not notice until something breaks. By wrapping the primitive type in a unique type for each usage we can eliminate this class of problem.

My strong_typedef class template aims to make this easier. It wraps an existing type, and associates it with a tag type to define the purpose, and which can therefore be used to make it unique. Crucially, it then allows you to specify which sets of operations you want to enable: it might not make sense to add ID numbers, but it might make perfect sense to add counters, even if both are represented by integers. You might therefore using jss::strong_typedef<struct IdTag,unsigned,jss::strong_typedef_properties::equality_comparable> for an ID number, but jss::strong_typedef<struct IndexTag,unsigned,jss::strong_typedef_properties::comparable | jss::strong_typedef_properties::incrementable | jss::strong_typedef_properties::decrementable> for an index type.

I've implemented something similar to this class for various clients over the years, so I decided it was about time to make it publicly available. The implementation on github condenses all of the solutions to this problem that I've written over the years to provide a generic implementation.

Basic Usage

jss::strong_typedef takes three template parameters: Tag, ValueType and Properties.

The first (Tag) is a tag type. This is not used for anything other than to make the type unique, and can be incomplete. Most commonly, this is a class or struct declared directly in the template parameter, and nowhere else, as in the examples struct IdTag and struct IndexTag above.

The second (ValueType) is the underlying type of the strong typedef. This is the basic type that you would otherwise be using.

The third (Properties) is an optional parameter that specifies the operations you wish the strong typedef to support. By default it is jss::strong_typedef_properties::none — no operations are supported. See below for a full list.

Declaring Types

You create a typedef by specifying these parameters:

using type1=jss::strong_typedef<struct type1_tag,int>;
using type2=jss::strong_typedef<struct type2_tag,int>;
using type3=jss::strong_typedef<struct type3_tag,std::string,
    jss::strong_typedef_properties::comparable>;

type1, type2 and type3 are now separate types. They cannot be implicitly converted to or from each other or anything else.

Creating Values

If the underlying type is default-constructible, then so is the new type. You can also construct the objects from an object of the wrapped type:

type1 t1;
type2 t2(42);
// type2 e2(t1); // error, type1 cannot be converted to type2

Accessing the Value

strong_typedef can wrap built-in or class type, but that's only useful if you can access the value. There are two ways to access the value:

  • Cast to the stored type: static_cast<unsigned>(my_channel_index)
  • Use the underlying_value member function: my_channel_index.underlying_value()

Using the underlying_value member function returns a reference to the stored value, which can thus be used to modify non-const values, or to call member functions on the stored value without taking a copy. This makes it particularly useful for class types such as std::string.

using transaction_id=jss::strong_typedef<struct transaction_tag,std::string>;

bool is_a_foo(transaction_id id){
    auto& s=id.underlying_value();
    return s.find("foo")!=s.end();
}

Other Operations

Depending on the properties you've assigned to your type you may be able to do other operations on that type, such as compare a == b or a < b, increment with ++a, or add two values with a + b. You might also be able to hash the values with std::hash<my_typedef>, or write them to a std::ostream with os << a. Only the behaviours enabled by the Properties template parameter will be available on any given type. For anything else, you need to extract the wrapped value and use that.

Examples

IDs

An ID of some description might essentially be a number, but it makes no sense to perform much in the way of operations on it. You probably want to be able to compare IDs, possibly with an ordering so you can use them as keys in a std::map, or with hashing so you can use them as keys in std::unordered_map, and maybe you want to be able to write them to a stream. Such an ID type might be declared as follows:

using widget_id=jss::strong_typedef<struct widget_id_tag,unsigned long long,
    jss::strong_typedef_properties::comparable |
    jss::strong_typedef_properties::hashable |
    jss::strong_typedef_properties::streamable>;

using froob_id=jss::strong_typedef<struct froob_id_tag,unsigned long long,
    jss::strong_typedef_properties::comparable |
    jss::strong_typedef_properties::hashable |
    jss::strong_typedef_properties::streamable>;

Note that froob_id and widget_id are now different types due to the different tags used, even though they are both based on unsigned long long. Therefore any attempt to use a widget_id as a froob_id or vice-versa will lead to a compiler error. It also means you can overload on them:

void do_stuff(widget_id my_widget);
void do_stuff(froob_id my_froob);

widget_id some_widget(421982);
do_stuff(some_widget);

Alternatively, an ID might be a string, such as a purchase order number of transaction ID:

using transaction_id=jss::strong_typedef<struct transaction_id_tag,std::string,
    jss::strong_typedef_properties::comparable |
    jss::strong_typedef_properties::hashable |
    jss::strong_typedef_properties::streamable>;

transaction_id some_transaction("GBA283-HT9X");

That works too, since strong_typedef can wrap any built-in or class type.

Indexes

Suppose you have a device that supports a number of channels, so you want to be able to retrieve the data for a given channel. Each channel yields a number of data items, so you also want to access the data items by index. You could use strong_typedef to wrap the channel index and the data item index, so they can't be confused. You can also make the index types incrementable and decrementable so they can be used in a for loop:

using channel_index=jss::strong_typedef<struct channel_index_tag,unsigned,
    jss::strong_typedef_properties::comparable |
    jss::strong_typedef_properties::incrementable |
    jss::strong_typedef_properties::decrementable>;

using data_index=jss::strong_typedef<struct data_index_tag,unsigned,
    jss::strong_typedef_properties::comparable |
    jss::strong_typedef_properties::incrementable |
    jss::strong_typedef_properties::decrementable>;

Data get_data_item(channel_index channel,data_index item);
data_index get_num_items(channel_index channel);
void process_data(Data data);

void foo(){
    channel_index const num_channels(99);
    for(channel_index channel(0);channel<num_channels;++channel){
        data_index const num_data_items(get_num_items(channel));
        for(data_index item(0);item<num_data_items;++item){
            process_data(get_data_item(channel,item));
        }
    }
}

The compiler will complain if you pass the wrong parameters, or compare the channel against the item.

Behaviour Properties

The Properties parameter specifies behavioural properties for the new type. It must be one of the values of jss::strong_typedef_properties, or a value obtained by or-ing them together (e.g. jss::strong_typedef_properties::hashable | jss::strong_typedef_properties::streamable | jss::strong_typedef_properties::comparable). Each property adds some behaviour. The available properties are:

  • equality_comparable => Can be compared for equality (st==st2) and inequality (st!=st2)
  • pre_incrementable => Supports preincrement (++st)
  • post_incrementable => Supports postincrement (st++)
  • pre_decrementable => Supports predecrement (--st)
  • post_decrementable => Supports postdecrement (st--)
  • addable => Supports addition (st+value, value+st, st+st2) where the result is convertible to the underlying type. The result is a new instance of the strong typedef.
  • subtractable => Supports subtraction (st-value, value-st, st-st2) where the result is convertible to the underlying type. The result is a new instance of the strong typedef.
  • ordered => Supports ordering comparisons (st<st2, st>st2, st<=st2, st>=st2)
  • mixed_ordered => Supports ordering comparisons where only one of the values is a strong typedef
  • hashable => Supports hashing with std::hash
  • streamable => Can be written to a std::ostream with operator<<
  • incrementable => pre_incrementable | post_incrementable
  • decrementable => pre_decrementable | post_decrementable
  • comparable => ordered | equality_comparable

Guideline and Implementation

I strongly recommend using strong_typedef or an equivalent implementation anywhere you would otherwise reach for a built-in or library type such as int or std::string when designing an interface.

My strong_typedef implementation is available on github under the Boost Software License.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

ACCU 2019 Slides and Trip Report

I attended ACCU 2019 a couple of weeks ago, where I was presenting my session Here's my number; call me, maybe. Callbacks in a multithreaded world.

The conference proper started on Wednesday, after a day of pre-conference workshops on the Tuesday, and continued until Saturday. I was only there Wednesday to Friday.

Wednesday

I didn't arrive until Wednesday lunchtime, so I missed the first keynote and morning sessions. I did, however get to see Ivan Čukić presenting his session on Ranges for distributed and asynchronous systems. This was an interesting talk that covered similar ground to things I've thought about before. It was good to see Ivan's take, and think about how it differed to mine. It was was also good to see how modern C++ techniques can produce simpler code than I had when I thought about this a few years ago. Ivan's approach is a clean design for pipelined tasks that allows implicit parallelism.

After the break I then went to Gail Ollis's presentation and workshop on Helping Developers to Help Each Other . Gail shared some of her research into how developers feel about various aspects of software development, from the behaviour of others to the code that they write. She then got us to try one of the exercises she talked about in small groups. By picking developer behaviours from the cards she provided to each group, and telling stories about how that behaviour has affected us, either positively or negatively, we can share our experiences, and learn from each other.

Thursday

First up on Thursday was Herb Sutter's keynote: De-fragmenting C++: Making exceptions more affordable and usable . Herb was eloquent as always, talking about his idea for making exceptions in C++ lower cost, so that they can be used in all projects: a significant number of projects currently ban exceptions from at least some of their code. I think this is a worthwhile aim, and hope to see something like Herb's ideas get accepted for C++ in a future standard.

Next up was my session, Here's my number; call me, maybe. Callbacks in a multithreaded world. It was well attended, with interesting questions from the audience. My slides are available here, and the video is available on youtube. Several people came up to me later in the conference to say that they had enjoyed my talk, and that they thought it would be useful for them in their work, which pleased me no end: this is what I always hope to achieve from my presentations.

Thursday lunchtime was taken up with book signings. I was one of four authors of recently-published programming books set up in the conservatory area of the hotel to sell copies of our books, and sign books for people. I sold plenty, and signed more, which was great.

Kate Gregory's talk on What Do We Mean When We Say Nothing At All? was after lunch. She discussed the various places in C++ where we can choose to specify something (such as const, virtual, or explicit), but we don't have to. Can we interpret meaning from the lack of an annotation? If your codebase uses override everywhere, except in one place, is that an accidental omission, or is it a flag to say "this isn't actually an override of the base class function"? Is it a good or bad idea to omit the names of unused parameters? There was a lot to think about with this talk, but the key takeaway for me is Consistency is Key: if you are consistent in your use of optional annotations, then deviation from your usual pattern can convey meaning to the reader, whereas if you are inconsistent then the reader cannot infer anything.

The final session I attended on Thursday was the C++ Pub Quiz, which was hosted by Felix Petriconi. The presented code was intended to confuse, and elicit exclamations of "WTF!", and succeeded on both counts. However, it was fun as ever, helped by the free drinks, and the fact that my team "Ungarian Notation" were the eventual winners.

Friday

Friday was the last day of the conference for me (though there the conference had another full day on Saturday). It started with Paul Grenyer's keynote on the trials and tribulations of trying to form a "community" for developers in Norwich, with meet-ups and conferences. Paul managed to be entertaining, but having followed Paul's blog for a few years, there wasn't anything that was new to me.

Interactive C++ : Meet Jupyter / Cling - The data scientist's geeky younger sibling was the next session I attended, presented by Neil Horlock. This was an interesting session about cling, a C++ interpreter, complete with a REPL, and how this can be combined with Jupyter notebooks to create a wiki with embedded code that you can edit and run. Support for various libraries allows to write code to plot graphs and maps and things, and have the graphs appear right there in the web page immediately. This is an incredibly powerful tool, and I had discussions with people afterwards about how this could be used both as an educational tool, and for "live" documentation and customer-facing tests: "here is sample code, try it out right now" is an incredibly powerful thing to be able to say.

After lunch I went to see Andreas Weis talk about Taming Dynamic Memory - An Introduction to Custom Allocators. This was a good introduction to various simple allocators, along with how and why you might use them in your C++ code. With John Lakos in the front row, Andreas had to field many questions. I had hoped for more depth, but I thought the material was well-paced, and so there wouldn't have been time; that would have been quite a different presentation, and less of an "introduction".

The final session I attended was Elsewhere Memory by Niall Douglas. Niall talked about the C++ object model, and how that can cause difficulties for code that wants to serialize the binary representation of objects to disk, or over the network, or wants to directly share memory with another process. Niall is working on a standardization proposal which would allow creating objects "fully formed" from a binary representation, without running a constructor, and would allow terminating the lifetime of an object without running its destructor. This is a difficult area as it interacts with compilers' alias analysis and the normal deterministic lifetime rules. However, this is an area where people currently do have "working" code that violates the strict lifetime rules of the standard, so it would be good to have a way of making such code standards-conforming.

Between the Sessions

The sessions at a conference at ACCU are great, and I always enjoy attending them, and often learn things. However, you can often watch these on Youtube later. One of the best parts of physically attending a conference is the discussions had in person before and after the sessions. It is always great to chat to people in person who you primarily converse with via email, and it is exciting to meet new people.

The conference tries to encourage attendees to be open to new people joining discussions with the "Pacman rule" — don't form a closed circle when having a discussion, but leave a space for someone to join. This seemed to work well in practice.

I always have a great time at ACCU conferences, and this one was no different.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

“The Developers” 2019 presentation and book signing

I will be presenting "Concurrency in C++20 and beyond" at The Developers 2019 in Romania on 23rd May 2019. Here is the abstract of my talk:

C++20 is set to add new facilities to make writing concurrent code easier. Some of them come from the previously published Concurrency TS, and others are new, but they all make our lives as developers easier. This talk will introduce the new features, and explain how and why we should use them.

The evolution of the C++ Concurrency support doesn't stop there though: the committee has a continuous stream of new proposals. This talk will also introduce some of the most important of these, including the new Executor model.

I will also be signing copies of the second edition of my book C++ Concurrency In Action now that it is finally in print.

I look forward to seeing you there!

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Get the element index when iterating with an indexed_view

One crucial difference between using an index-based for loop and a range-based for loop is that the former allows you to use the index for something other than just identifying the element, whereas the latter does not provide you with access to the index at all.

The difference between index-based for loops and range-based for loops means that some people are unable to use simple range-based for loops in some cases, because they need the index.

For example, you might be initializing a set of worker threads in a thread pool, and each thread needs to know it's own index:

std::vector<std::thread> workers;

void setup_workers(unsigned num_threads){
    workers.resize(num_threads);
    for(unsigned i=0;i<num_threads;++i){
        workers[i]=std::thread(&my_worker_thread_func,i);
    }
}

Even though workers has a fixed size in the loop, we need the loop index to pass to the thread function, so we cannot use range-based for. This requires that we duplicate num_threads, adding the potential for error as we must ensure that it is correctly updated in both places if we ever change it.

jss::indexed_view to the rescue

jss::indexed_view provides a means of obtaining that index with a range-based for loop: it creates a new view range which wraps the original range, where each element holds the loop index, as well as a reference to the element of the original range.

With jss::indexed_view, we can avoid the duplication from the previous example and use the range-based for:

std::vector<std::thread> workers;

void setup_workers(unsigned num_threads){
    workers.resize(num_threads);
    for(auto entry: jss::indexed_view(workers)){
        entry.value=std::thread(&my_worker_thread_func,entry.index);
    }
}

As you can see from this example, the value field is writable: it is a reference to the underlying value if the iterator on the source range is a reference. This allows you to use it to modify the elements in the source range if they are non-const.

jss::indexed_view also works with iterator-based ranges, so if you have a pair of iterators, then you can still use range-based for loops. For example, the following code processes the elements up to the first zero in the supplied vector, or the whole vector if there is no zero.

void foo(std::vector<int> const& v){
    auto end=std::find(v.begin(),v.end(),0);
    for(auto entry: jss::indexed_view(v.begin(),end)){
        process(entry.index,entry.value);
    }
}

Finally, jss::indexed_view can also be used with algorithms that require iterator-based ranges, so our first example could also be written as:

std::vector<std::thread> workers;

void setup_workers(unsigned num_threads){
    workers.resize(num_threads);
    auto view=jss::indexed_view(workers);
    std::for_each(view.begin(),view.end(),[](auto entry){
        entry.value=std::thread(&my_worker_thread_func,entry.index);
    });
}

Final words

Having to use non-ranged for loop to get the loop index introduces a potential source of error: it is easy to mistype the loop index either in the for-loop header, or when using it to get the indexed element, especially in nested loops.

By using jss::indexed_view to wrap the range, you can eliminate this particular source of error, as well as making it clear that you are iterating across the entire range, and that you need the index.

Get the source from github and use it in your project now.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , , , , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

object_ptr – a safer replacement for raw pointers

Yesterday I uploaded my object_ptr<T> implementation to github under the Boost Software License.

This is an implementation of a class similar to std::experimental::observer_ptr<T> from the Library Fundamentals TS 2, but with various improvements suggested in WG21 email discussions of the feature.

The idea of std::experimental::observer_ptr<T> is that it provides a pointer-like object that does not own the pointee, and thus can be used in place of a raw pointer, but does not allow pointer arithmetic or use of delete, or pointer arithmetic, so is not as dangerous as a raw pointer. Its use also serves as documentation: this object is owned elsewhere, so explicitly check the lifetime of the pointed-to object — there is nothing to prevent a dangling std::experimental::observer_ptr<T>.

My implementation of this concept has a different name (object_ptr<T>). I feel that observer_ptr is a bad name, because it conjures up the idea of the Observer pattern, but it doesn't really "observe" anything. I believe object_ptr is better: it is a pointer to an object, so doesn't have any array-related functionality such as pointer arithmetic, but it doesn't tell you anything about ownership.

It also has slightly different semantics to std::experimental::observer_ptr: it allows incoming implicit conversions, and drops the release() member function. The implicit conversions make it easier to use as a function parameter, without losing any safety, as you can freely pass a std::shared_ptr<T>, std::unique_ptr<T>, or even a raw pointer to a function accepting object_ptr<T>. It makes it much easier to use jss::object_ptr<T> as a drop-in replacement for T* in function parameters. There is nothing you can do with a jss::object_ptr<T> that you can't do with a T*, and in fact there is considerably less that you can do: without explicitly requesting the stored T*, you can only use it to access the pointed-to object, or compare it with other pointers. The same applies with std::shared_ptr<T> and std::unique_ptr<T>: you are reducing functionality, so this is safe, and reducing typing for safe operations is a good thing.

I strongly recommend using object_ptr<T> or an equivalent implementation of the observer_ptr concept anywhere you have a non-owning raw pointer in your codebase that points to a single object.

If you have a raw pointer that does own its pointee, then I would strongly suggest finding a smart pointer class to use as a wrapper to encapsulate that ownership. For example, std::unique_ptr or std::shared_ptr with a custom deleter might well do the job.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , ,
Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Begin and End with range-based for loops

On slack the other day, someone mentioned that lots of companies don't use range-based for loops in their code because they use PascalCase identifiers, and their containers thus have Begin and End member functions rather than the expected begin and end member functions.

Having recently worked in a codebase where this was the case, I thought it would be nice to provide a solution to this problem.

The natural solution would be to provide global overloads of the begin and end functions: these are always checked by range-based for if the member functions begin() and end() are not found. However, when defining global function templates, you need to be sure that they are not too greedy: you don't want them to cause ambiguity in overload resolution or be picked in preference to std::begin or std::end.

My first thought was to jump through metaprogramming hoops checking for Begin() and End() members that return iterators, but then I thought that seemed complicated, so looked for something simpler to start with.

The simplest possible solution is just to declare the functions the same way that std::begin() and std::end() are declared:

template <class C> constexpr auto begin(C &c) -> decltype(c.Begin()) {
    return c.Begin();
}
template <class C> constexpr auto begin(const C &c) -> decltype(c.Begin()) {
    return c.Begin();
}

template <class C> constexpr auto end(C &c) -> decltype(c.End()) {
    return c.End();
}
template <class C> constexpr auto end(const C &c) -> decltype(c.End()) {
    return c.End();
}

Initially I thought that this would be too greedy, and cause problems, but it turns out this is fine.

The use of decltype(c.Begin()) triggers SFINAE, so only types which have a public member named Begin which can be invoked with empty parentheses are considered; for anything else these functions are just discarded and not considered for overload resolution.

The only way this is likely to be a problem is if the user has also defined a begin free function template for a class that has a suitable Begin member, in which case this would potentially introduce overload resolution ambiguity. However, this seems really unlikely in practice: most such function templates will end up being a better match, and any non-template functions are almost certainly a better match.

So there you have it: in this case, the simplest solution really is good enough! Just include this header and you're can freely use range-based for loops with containers that use Begin() and End() instead of begin() and end().

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

ACCU 2019 presentation and book signing

The ACCU 2019 conference is running from 9th-13 April 2019, in Bristol, UK.

This year I will be presenting "Here's my number; call me, maybe. Callbacks in a multithreaded world" on 11th April. The abstract is:

A common pattern in multithreaded applications is the use of callbacks, continuations and task pipelines to divide the processing of data across threads. This has the benefit of ensuring that threads can quickly move on to further processing, and can minimize blocking waits, since tasks are only scheduled when there is work to be done.

The downside is that they can weave a tangled web of connections, and managing object lifetimes can now become complicated.

This presentation will look at ways of managing this complexity and ensuring that your code is as clear as possible, and there is no possibility of dangling references or leaked objects.

I will also be signing copies of the second edition of my book C++ Concurrency In Action now that it is finally in print.

I look forward to seeing you there!

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Just::Thread Pro v2.5.0 released with coroutines support

I am pleased to announce that Just::Thread Pro v2.5.0 has been released. This adds support for gcc 7, clang 4.0 and clang 5.0, but the big change with this version is the support for coroutines with Microsoft Visual Studio 2017, and clang 5.0 on ubuntu when used with libc++ 5.0.

Just::Thread Pro is our C++ concurrency extensions library which provides an Actor framework for easier concurrency, along with concurrent data structures: a thread-safe queue, and concurrent hash map, and a wrapper for ensuring synchronized access to single objects.

It also includes the new facilities from the Concurrency TS:

Coroutines support is here!

V2.5.0 adds support for coroutines with Microsoft Visual Studio 2017 and clang 5.0. This means that you can now use co_await to wait for a std::experimental::future, and can create coroutines that return a std::experimental::future.

Supported compilers

Just::Thread Pro is now fully supported on the following compiler/OS combinations (32-bit and 64-bit):

  • Microsoft Visual Studio 2015 for Windows
  • Microsoft Visual Studio 2017 for Windows
  • gcc 5 for Ubuntu 14.04 or later
  • gcc 6 for Ubuntu 14.04 or later
  • gcc 7 for Ubuntu 14.04 or later
  • clang 3.8 for Ubuntu 16.04 or later
  • clang 3.9 for Ubuntu 16.04 or later
  • clang 4.0 for Ubuntu 16.04 or later
  • clang 5.0 for Ubuntu 16.04 or later with libc++ or libstdc++
  • gcc 5 for Fedora 22 and 23
  • gcc 6 for Fedora 24 and 25
  • gcc 7 for Fedora 26
  • clang 3.8 for Fedora 24
  • clang 3.9 for Fedora 25
  • clang 4.0 for Fedora 26

Just::Thread Pro v2.2 is also supported with the Just::Thread compatibility library on the following compiler/OS combinations:

  • Microsoft Visual Studio 2005, 2008, 2010, 2012 and 2013 for Windows
  • TDM gcc 4.5.2, 4.6.1 and 4.8.1 for Windows
  • g++ 4.3 or later for Ubuntu 9.04 or later
  • g++ 4.4 or later for Fedora 13 or later
  • g++ 4.4 for Centos 6
  • MacPorts g++ 4.3 to 4.8 on MacOSX Snow Leopard or later

All licences include a free upgrade to point releases, so if you purchase now you'll get a free upgrade to all 2.x releases of Just::Thread Pro. Purchasers of the older Just::Thread library (now called the compatibility library) may upgrade to Just::Thread Pro for a small fee.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Libc++ v5 PPA for Ubuntu now available

I am please to announce that I have made v5.0 of libc++ available for Ubuntu Linux via a PPA. Initially, the binaries are only available for Ubuntu 16.04 Xenial.

I was frustrated at the lack of an up-to-date build of libc++ for linux, especially with the v5.0 release of clang. Even though the LLVM project provide Ubuntu packages for clang, they don't provide one for libc++, and the version in the Ubuntu repositories is woefully out of date (v3.7). I therefore decided to package it myself, in this PPA.

This is especially good, since in combination with clang v5.0, it provides support for the Coroutines TS!

Enjoy!

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

CppCon 2017 class and presentation on concurrency

I am pleased to announce that I will be running my "Concurrent Thinking in C++" class at CppCon again this year. Here is the course description:

One of the most difficult issues around designing software with multiple threads of execution is synchronizing data.

Whether you use actors, active objects, futures and continuations or mutable shared state, every non-trivial system with multiple threads needs to transfer data between them. This means thinking about which data needs to be processed by which thread, and ensuring that the right data gets to the right threads in the right order. It also means thinking about API design to avoid race conditions.

In this workshop you will encounter a series of scenarios involving multithreaded code, and be guided through identifying the problem areas and the ways of handling them.

You will learn techniques for thinking about the scenarios to ease the analysis, as well as details of the tools we have available in C++ to mitigate the problems. You will also learn how to use the C++ standard library to help enforce the requirements of each scenario in code.

I will also be presenting on "Concurrency, Parallelism and Coroutines":

C++17 is adding parallel overloads of most of the Standard Library algorithms. There is a TS for Concurrency in C++ already published, and a TS for Coroutines in C++ and a second TS for Concurrency in C++ in the works.

What does all this mean for programmers? How are they all related? How do coroutines help with parallelism?

This session will attempt to answer these questions and more. We will look at the implementation of parallel algorithms, and how continuations, coroutines and work-stealing fit together. We will also look at how this meshes with the Grand Unified Executors Proposal, and how you will be able to take advantage of all this as an application developer.

My class is on 17th-18th September, and the main conference is running 19th-23rd, with my presentation on 20th September. If you haven't got your ticket already, head on over to CppCon Registration to get yours now.

Hope to see you there!

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Slides for my ACCU 2017 presentation

A couple of weeks ago I presented at the ACCU conference in Bristol (ACCU 2017). The title of my presentation was "Concurrency, Parallelism and Coroutines", and I was talking about the relationship between the Concurrency TS, the Coroutines TS and the Parallel algorithms from C++17:

C++17 is adding parallel overloads of most of the Standard Library algorithms. There is a TS for Concurrency in C++ already published, and a TS for Coroutines in C++ and a second TS for Concurrency in C++ in the works.

What does all this mean for programmers? How are they all related? How do coroutines help with parallelism?

This session will attempt to answer these questions and more. We will look at the implementation of parallel algorithms, and how continuations, coroutines and work-stealing fit together. We will also look at how this meshes with the Grand Unified Executors Proposal, and how you will be able to take advantage of all this as an application developer.

It was well attended, with a lot of interesting questions.

The slides are available here.

This year, my presentation was recorded. The video is available on youtube.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Getting tuple elements with a runtime index

std::tuple is great. It provides a nice, generic way of holding a fixed-size set of data items of whatever types you need.

However, sometimes it has limitations that mean it doesn't quite work as you'd like. One of these is accessing an item based on a runtime index.

std::get needs a compile-time index

The way to get the nth item in a tuple is to use std::get: std::get<n>(my_tuple). This works nicely, as long as n is a compile-time constant. If you've got a value that is calculated at runtime, this doesn't work: you can't use a value that isn't known until runtime as a template parameter.

std::tuple<int,int,int> my_tuple=...;
size_t index;
std::cin>>index;
int val=std::get<index>(my_tuple); // won't compile

So, what can we do? We need a new function, which I'll call runtime_get, to retrieve the nth value, where n is a runtime value.

template<typename Tuple>
... runtime_get(Tuple&& t,size_t index){
  ...
}

The question is: how do we implement it?

Fixed return type

The return type is easy: our function must have a single return type for any given Tuple. That means that all the elements in the tuple must have the same type, so we can just use the type of the first element. std::tuple_element will tell us this, though we must first adjust our template parameter so it's not a reference.

template<typename Tuple>
typename std::tuple_element<
  0,typename std::remove_reference<Tuple>::type>::type&
runtime_get(Tuple&& t,size_t index){
  ...
}

Note: C++17 includes std::variant, so you might think we could use that to hold the return type, but that wouldn't actually help us: to get the value from a variant, you need to call std::get<n>(v), which requires n to be a constant (again)!

OK, so the return type is just a reference to the type of the first element. How do we get the element?

Retrieving the nth element

We can't do a straightforward switch, because that requires knowing all the cases in advance, and we want this to work for any size of tuple.

One way would be to have a recursive function that checked the runtime index against a compile-time index, and then called the function with the next compile-time index if they were different, but that would mean that the access time would depend on the index, and potentially end up with a deeply nested function call if we wanted the last element in a large tuple.

One thing we can do is use the index value as an array index. If we have an array of functions, each of which returns the corresponding element from the tuple, then we can call the appropriate function to return the relevant index.

The function we need is of course std::get; it's just a matter of getting the function signature right. Our overload of std::get has the following signature for const and non-const tuples:

template <size_t I, class... Types>
constexpr tuple_element_t<I, tuple<Types...>>&
get(tuple<Types...>&) noexcept;

template <size_t I, class... Types>
constexpr const tuple_element_t<I, tuple<Types...>>&
get(const tuple<Types...>&) noexcept;

so, we can capture the relevant instantiation of std::get for a given tuple type Tuple in a function pointer declared as:

using return_type=typename std::tuple_element<0,Tuple>::type&;
using get_func_ptr=return_type(*)(Tuple&) noexcept;

The signature is the same, regardless of the index, because we made the decision that we're only going to support tuples where all the elements are the same.

This makes it easy to build a function table: use a variadic pack expansion to supply a different index for each array element, and fill in std::get<N> for each entry:

template<
  typename Tuple,
  typename Indices=std::make_index_sequence<std::tuple_size<Tuple>::value>>
struct runtime_get_func_table;

template<typename Tuple,size_t ... Indices>
struct runtime_get_func_table<Tuple,std::index_sequence<Indices...>>{
    using return_type=typename std::tuple_element<0,Tuple>::type&;
    using get_func_ptr=return_type (*)(Tuple&) noexcept;
    static constexpr get_func_ptr table[std::tuple_size<Tuple>::value]={
        &std::get<Indices>...
    };
};

template<typename Tuple,size_t ... Indices>
constexpr typename
runtime_get_func_table<Tuple,std::index_sequence<Indices...>>::get_func_ptr
runtime_get_func_table<Tuple,std::index_sequence<Indices...>>::table[
  std::tuple_size<Tuple>::value];

We need the separate redeclaration of the table to satisfy a pre-C++17 compiler; with C++17 inline variables it is no longer needed.

Our final function is then just a simple wrapper around a table lookup:

template<typename Tuple>
constexpr
typename std::tuple_element<0,typename std::remove_reference<Tuple>::type>::type&
runtime_get(Tuple&& t,size_t index){
    using tuple_type=typename std::remove_reference<Tuple>::type;
    if(index>=std::tuple_size<tuple_type>::value)
        throw std::runtime_error("Out of range");
    return runtime_get_func_table<tuple_type>::table[index](t);
}

It's constexpr safe, though in a constexpr context you could probably just use std::get directly anyway.

So, there you have it: a constant-time function for retrieving the nth element of a tuple where all the elements have the same type.

Final code

Here is the final code for a constant-time function to retrieve an item from a tuple based on a runtime index.

#include <tuple>
#include <utility>
#include <type_traits>
#include <stdexcept>

template<
  typename Tuple,
  typename Indices=std::make_index_sequence<std::tuple_size<Tuple>::value>>
struct runtime_get_func_table;

template<typename Tuple,size_t ... Indices>
struct runtime_get_func_table<Tuple,std::index_sequence<Indices...>>{
    using return_type=typename std::tuple_element<0,Tuple>::type&;
    using get_func_ptr=return_type (*)(Tuple&) noexcept;
    static constexpr get_func_ptr table[std::tuple_size<Tuple>::value]={
        &std::get<Indices>...
    };
};

template<typename Tuple,size_t ... Indices>
constexpr typename
runtime_get_func_table<Tuple,std::index_sequence<Indices...>>::get_func_ptr
runtime_get_func_table<Tuple,std::index_sequence<Indices...>>::table[std::tuple_size<Tuple>::value];

template<typename Tuple>
constexpr
typename std::tuple_element<0,typename std::remove_reference<Tuple>::type>::type&
runtime_get(Tuple&& t,size_t index){
    using tuple_type=typename std::remove_reference<Tuple>::type;
    if(index>=std::tuple_size<tuple_type>::value)
        throw std::runtime_error("Out of range");
    return runtime_get_func_table<tuple_type>::table[index](t);
}

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

C++ Concurrency in Action 2nd edition Early Access

I am happy to announce that the second edition of my book, C++ Concurrency in Action, is now available under the Manning Early Access Program.

The second edition is being updated to cover C++14, C++17 and the Concurrency TS, along with general improvements throughout the book.

This includes full coverage of the library changes from C++14 and C++17:

  • std::shared_mutex and std::shared_timed_mutex. These provide for multiple-reader/single-writer mutex locks.
  • std::scoped_lock from C++17 for locking multiple mutexes together.
  • Parallel overloads of many standard library algorithms include std::sort, std::for_each and std::transform_reduce.

Plus, full coverage of the library extensions from the concurrency TS:

  • std::experimental::latch to allow waiting for a set number of events to occur
  • std::experimental::barrier and std::experimental::flex_barrier to synchronize groups of threads
  • std::experimental::atomic_shared_ptr to allow atomic accesses to a single shared_ptr instance from multiple threads, as a better alternative that the std::atomic_load and std::atomic_store free functions.
  • Extended futures that allow continuations, so additional functions can be scheduled for when a future is ready.
  • std::experimental::when_all and std::experimental::when_any to allow waiting for either all of a set of futures to be ready, or the first of a set of futures to be ready.

Only the first few chapters are available at the moment, but if you sign up now, then you will get those chapters in PDF form now, as well as updated PDFs including the later chapters as they become ready, along with updates for the earlier ones, and a final print copy of the book when it's done.

50% Discount

If you use the code mlwilliams4 at the checkout when you sign up for the MEAP before 27th March 2017 then you'll get a 50% discount.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Just::Thread Pro v2.4.2 released with clang support

I am pleased to announce that Just::Thread Pro v2.4.2 has been released with support for clang on linux.

Just::Thread Pro is our C++ concurrency extensions library which provides an Actor framework for easier concurrency, along with concurrent data structures: a thread-safe queue, and concurrent hash map, and a wrapper for ensuring synchronized access to single objects.

It also includes the new facilities from the Concurrency TS:

Clang support is finally here!

V2.4.2 adds the much-anticipated support for clang. clang 3.8 and 3.9 are supported on ubuntu 16.04 or later, clang 3.8 is supported on Fedora 24, and clang 3.9 on Fedora 25.

Just::Thread Pro is now fully supported on the following compiler/OS combinations (32-bit and 64-bit):

  • Microsoft Visual Studio 2015 for Windows
  • Microsoft Visual Studio 2017 for Windows
  • gcc 5 for Ubuntu 14.04 or later
  • gcc 6 for Ubuntu 14.04 or later
  • clang 3.8 for Ubuntu 16.04 or later
  • clang 3.9 for Ubuntu 16.04 or later
  • gcc 5 for Fedora 22 and 23
  • gcc 6 for Fedora 24 and 25
  • clang 3.8 for Fedora 24
  • clang 3.9 for Fedora 25

Just::Thread Pro v2.2 is also supported with the Just::Thread compatibility library on the following compiler/OS combinations:

  • Microsoft Visual Studio 2005, 2008, 2010, 2012 and 2013 for Windows
  • TDM gcc 4.5.2, 4.6.1 and 4.8.1 for Windows
  • g++ 4.3 or later for Ubuntu 9.04 or later
  • g++ 4.4 or later for Fedora 13 or later
  • g++ 4.4 for Centos 6
  • MacPorts g++ 4.3 to 4.8 on MacOSX Snow Leopard or later

All licences include a free upgrade to point releases, so if you purchase now you'll get a free upgrade to all 2.x releases of Just::Thread Pro. Purchasers of the older Just::Thread library (now called the compatibility library) may upgrade to Just::Thread Pro for a small fee.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Does const mean thread-safe?

There was a discussion recently on the cpplang slack about whether const meant thread-safe.

As with everything in life, it depends. In some ways, yes it does. In others, it does not. Read on for the details.

What do we mean by "thread-safe"?

If someone says something is "thread-safe", then it is important to define what that means. Here is an incomplete list of some things people might mean when they say something is "thread safe".

  • Calling foo(x) on one thread and foo(y) on a second thread concurrently is OK.
  • Calling foo(x_i) on any number of threads concurrently is OK, provided each x_i is different.
  • Calling foo(x) on a specific number of threads concurrently is OK.
  • Calling foo(x) on any number of threads concurrently is OK.
  • Calling foo(x) on one thread and bar(x) on another thread concurrently is OK.
  • Calling foo(x) on one thread and bar(x) on any number of threads concurrently is OK.

Which one we mean in a given circumstance is important. For example, a concurrent queue might be a Single-Producer, Single-Consumer queue (SPSC), in which case it is safe for one thread to push a value while another pops a value, but if two threads try and push values then things go wrong. Alternatively, it might be a Multi-Producer, Single-Consumer queue (MPSC), which allows multiple threads to push values, but only one to pop values. Both of these are "thread safe", but the meaning is subtly different.

Before we look at what sort of thread safety we're after, let's just define what it means to be "OK".

Data races

At the basic level, when we say an operation is "OK" from a thread-safety point of view, we mean it has defined behaviour, and there are no data races, since data races lead to undefined behaviour.

From the C++ Standard perspective, a data race is where there are 2 operations that access the same memory location, such that neither happens-before the other, at least one of them modifies that memory location, and at least one of them is not an atomic operation.

An operation is thus "thread safe" with respect to the set of threads we wish to perform the operation concurrently if:

  • none of the threads modify a memory location accessed by any of the other threads, or
  • all accesses performed by the threads to memory locations which are modified by one or more of the threads are atomic, or
  • the threads use suitable synchronization to ensure that there are happens-before operations between all modifications, and any other accesses to the modified memory locations.

So: what sort of thread-safety are we looking for from const objects, and why?

Do as ints do

A good rule of thumb for choosing behaviour for a class in C++ is "do as ints do".

With regard to thread safety, ints are simple:

  • Any number of threads may read a given int concurrently
  • If any thread modifies a given int, no other threads may access that int for reading or writing concurrently.

This follows naturally from the definition of a data race, since ints cannot do anything special to provide synchronization or atomic operations.

If you have an int, and more than one thread that wants to access it, if any of those threads wants to modify it then you need external synchronization. Typically you'll use a mutex for the external synchronization, but other mechanisms can work too.

If your int is const, (or you have const int& that references it, or const int* that points to it) then you can't modify it.

What does that mean for your class? In a well-designed class, the const member functions do not modify the state of the object. It is "logically" immutable when accessed exclusively through const member functions. On the other hand, if you use a non-const member function then you are potentially modifying the state of the object. So far, so good: this is what ints do with regard to reading and modifying.

To do what ints do with respect to thread safety, we need to ensure that it is OK to call any const member functions concurrently on any number of threads. For many classes this is trivially achieved: if you don't modify the internal representation of the object in any way, you're home dry.

Consider an employee class that stores basic information about an employee, such as their name, employee ID and so forth. The natural implementation of const member functions will just read the members, perform some simple manipulation on the values, and return. Nothing is modified.

class employee{
    std::string first_name;
    std::string last_name;
    // other data
public:
    std::string get_full_name() const{
        return last_name + ", " + first_name;
    }
    // other member functions
}

Provided that reading from a const std::string and appending it to another string is OK, employee::get_full_name is OK to be called from any number of threads concurrently.

You only have to do something special to "do as ints do" if you modify the internal state in your const member function, e.g. to keep a tally of calls, or cache calculation values, or similar things which modify the internal state without modifying the externally-visible state. Of course, you would also need to add some synchronization if you were modifying externally-visible state in your const member function, but we've already decided that's not a good plan.

In employee::get_full_name, we're relying on the thread-safety of std::string to get us through. Is that OK? Can we rely on that?

Thread-safety in the C++ Standard Library

The C++ Standard Library itself sticks to the "do as ints do" rule. This is spelled out in the section on Data race avoidance (res.on.data.races). Specifically,

A C++ standard library function shall not directly or indirectly modify objects accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function's non-const arguments, including this.

and

Implementations may share their own internal objects between threads if the objects are not visible to users and are protected against data races.

This means that if you have a const std::string& referencing an object, then any calls to member functions on it must not modify the object, and any shared internal state must be protected against data races. The same applies if it is passed to any other function in the C++ Standard Library.

However, if you have a std::string& referencing the same object, then you must ensure that all accesses to that object must be synchronized externally, otherwise you may trigger a data race.

Our employee::get_full_name function is thus as thread-safe as an int: concurrent reads are OK, but any modifications will require external synchronization for all concurrent accesses.

There are two little words in the first paragraph quoted above which have a surprising consequence: "or indirectly".

Indirect Accesses

If you have two const std::vector<X>s, vx and vy, then calling standard library functions on those objects must not modify any objects accessible by other threads, otherwise we've violated the requirements from the "data race avoidance" section quoted above, since those objects would be "indirectly" modified by the function.

This means that any operations on the X objects within those containers that are performed by the operations we do on the vectors must also refrain from modifying any objects accessible by other threads. For example, the expression vx==vy compares each of the elements in turn. These comparisons must thus not modify any objects accessible by other threads. Likewise, std::for_each(vx.begin(),vx.end(),foo) must not modify any objects accessible by other threads.

This pretty much boils down to a requirement that if you use your class with the C++ Standard Library, then const operations on your class must be safe if called from multiple threads. There is no such requirement for non-const operations, or combinations of const and non-const operations.

You may of course decide that your class is going to allow concurrent modifications (even if that is by using a mutex to restrict accesses internally), but that is up to you. A class designed for concurrent access, such as a concurrent queue, will need to have the internal synchronization; a value class such as our employee is unlikely to need it.

Summary

Do as ints do: concurrent calls to const member functions on your class must be OK, but you are not required to ensure thread-safety if there are also concurrent calls to non-const member functions.

The C++ Standard Library sticks to this rule itself, and requires it of your code. In most cases, it's trivial to ensure; it's only classes with complex const member functions that modify internal state that may need synchronization added.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

just::thread Pro adds Visual Studio 2017 support

I am pleased to announce that just::thread Pro now supports Microsoft Visual Studio 2017 on Microsoft Windows.

This adds to the support for Microsoft Visual Studio 2015, g++ 5 and g++ 6 for the just::thread Pro enhancements, which build on top of the platform-supplied version of the C++14 thread library. For older compilers, and for MacOSX, the just::thread compatibility library is still required.

The new build features all the same facilities as the previous release:

Get your copy of just::thread Pro

Purchase your copy and get started now.

As usual, all customers with V2.x licenses of just::thread Pro will get a free upgrade to the new just::thread Pro Standalone edition.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Generating Sequences

I was having a discussion with my son over breakfast about C++ and Python, and he asked me if C++ had anything equivalent to Python's range() function for generating a sequence of integers. I had to tell him that no, the C++ standard library didn't supply such a function, but there were algorithms for generating sequences (std::generate and std::generate_n) into an existing container, and you could write something that would provide a "virtual" container that would supply a sequence as you iterated over it with range-for.

He was happy with that, but I felt the urge to write such a container, and blog about it. There are existing implementations available (e.g. boost::counting_range), but hopefully people will find this instructive, if not useful.

Iterating over a "virtual" container

Our initial goal is to be able write

for(int x: range(10))
  std::cout<<x<<std::endl;

and have it print out the numbers 0 to 9. This requires that our new range function returns something we can use with range-for — something for which we can call begin and end to get iterators.

    class numeric_range{
    public:
        class iterator;
        iterator begin();
        iterator end();
    };

    numeric_range range(int count);

Our container is virtual, so we don't have real values we can refer to. This means our iterators must be input iterators — forward iterators need real objects to refer to. That's OK though; we're designing this for use with range-for, which only does one pass anyway.

Input Iterator requirements

All iterators have 5 associated types. They can either be provided as member types/typedefs, or by specializing std::iterator_traits<>. If you provide them as part of your iterator class, then the std::iterator_traits<> primary template works fine, so we'll do that. The types are:

iterator_category

The tag type for the iterator category: for input iterators it's std::input_iterator_tag.

value_type

This is the type of the element in the sequence. For an integer sequence, it would be int, but we'll want to allow sequences of other types, such as unsigned, double, or even a custom class.

reference

The result of operator* on an iterator. For forward iterators it has to be an lvalue reference, but for our input iterator it can be anything convertible to value_type. For simplicity, we'll make it the same as value_type.

pointer

The return type of operator-> on an iterator. value_type* works for us, and is the simplest option.

difference_type

The type used to represent the difference between two iterators. For input iterators, this is irrelevant, as they provide a one-pass abstraction with no exposed size. We can therefore use void.

The types aren't the only requirements on input iterators: we also have to meet the requirements on valid expressions.

Input Iterator expression requirements

Input iterators can be valid or invalid. If an iterator is invalid (e.g. because it was invalidated by some operation), then doing anything to it except assigning to it or destroying it is undefined behaviour, so we don't have to worry about that. All requirements below apply only to valid iterators.

Valid iterators can either refer to an element of a sequence, in which case they are dereferencable, or they can be past-the-end iterators, which are not dereferencable.

Comparisons

For iterators a and b, a==b is only a valid expression if a and b refer to the same sequence. If they do refer to the same sequence, a==b is true if and only if both a and b are past-the-end iterators, or both a and b are dereferencable iterators that refer to the same element in the sequence.

a!=b returns !(a==b), so is again only applicable if a and b are from the same sequence.

Incrementing

Only dereferencable iterators can be incremented.

If a is dereferencable, ++a must move a to the next value in the sequence, or make a a past-the-end iterator. Incrementing an iterator a may invalidate all copies of a: input iteration is single-pass, so you cannot use iterators to an element after you've incremented any iterator that referenced that element. ++a returns a reference to a.

(void)a++ is equivalent to (void)++a. The return type is unspecified, and not relevant to the increment.

Dereferencing

For a dereferencable iterator a, *a must return the reference type for that iterator, which must be convertible to the value_type. Dereferencing the same iterator multiple times without modifying the iterator must return an equivalent value each time. If a and b are iterators such that a==b is true, *a and *b must be return equivalent values.

a->m must be equivalent to (*a).m.

*a++ must return a value convertible to the value_type of our iterator, and is equivalent to calling following the function:

iterator::value_type increment_and_dereference(iterator& a) {
  iterator::value_type temp(*a);
  ++a;
  return temp;
}

This is why the return type for a++ is unspecified: for some iterators it will need to be a special type that can generate the required value on dereferencing; for others it can be a reference to a real value_type object.

Basic implementation

Our initial implementation can be really simple. We essentially just need a current value, and a final value. Our dereferencable iterators can then just hold a pointer to the range object, and past-the-end iterators can hold a null pointer. Iterators are thus equal if they are from the same range. Comparing invalidated iterators, or iterators from the wrong range is undefined behaviour, so that's OK.

class numeric_range{
    int current;
    int final;
public:
    numeric_range(int final_):current(0),final(final_){}

    iterator begin(){ return iterator(this); }
    iterator end() { return iterator(nullptr); }

    class iterator{
      numeric_range* range;
    public:
      using value_type = T;
      using reference = T;
      using iterator_category=std::input_iterator_tag;
      using pointer = T*;
      using difference_type = void;

      iterator(numeric_range* range_):range(range_){}

      int operator*() const{ return range->current; }
      int* operator->() const{ return &range->current;}

      iterator& operator++(){ // preincrement
          ++range->current;
          if(range->current==range->final)
            range=nullptr;
            return *this;
      }

      ??? operator++(int); // postincrement

      friend bool operator==(iterator const& lhs,iterator const& rhs){
        return lhs.range==rhs.range;
      }
      friend bool operator!=(iterator const& lhs,iterator const& rhs){
        return !(lhs==rhs);
      }
    };
};

numeric_range range(int max){
  return numeric_range(max);
}

Note that I've left out the definition of the iterator postincrement operator. That's because it warrants a bit more discussion.

Remember the spec for *a++: equivalent to calling

iterator::value_type increment_and_dereference(iterator& a) {
  iterator::value_type temp(*a);
  ++a;
  return temp;
}

Since this is a combination of two operators, we can't do it directly: instead, our postincrement operator has to return a special type to hold the temporary value. Our postincrement operator thus looks like this:

class numeric_range::iterator{
  class postinc_return{
    int value;
  public:
    postinc_return(int value_): value(value_){}
    int operator*(){ return value; }
  };
public:
  postinc_return operator++(int){
    postinc_return temp(range->current);
    ++*this;
    return temp;
  }
};

That's enough for our initial requirement:

int main(){
    for(int x: range(10)){
        std::cout<<x<<",";
    }
    std::cout<<std::endl;
}

will now print 0 to 9.

More complete implementation

The Python range function provides more options than just this, though: you can also specify start and end values, or start, end and step values, and the step can be negative. We might also like to handle alternative types, such as unsigned or double, and we need to ensure we handle things like empty ranges.

Alternative types is mostly easy: just make numeric_range a class template, and replace int with our new template parameter T everywhere.

Setting the initial and final values is also easy: just provide a new constructor that takes both the current and final values, rather than initializing the current value to 0.

Steps are a bit more tricky: if the final value is not initial+n*step for an integer n, then the direct comparison of current==final to check for the end of the sequence won't work, as it will always be false. We therefore need to check for current>=final, to account for overshoot. This then doesn't work for negative steps, so we need to allow for that too: if the step is negative, we check for current<=final instead.

Final code

The final code is available for download with simple examples, under the BSD license.

Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter

Just::Thread Pro v2.4 released

I am pleased to announce that Just::Thread Pro v2.4 has been released.

Just::Thread Pro is our C++ concurrency extensions library which provides an Actor framework for easier concurrency, along with concurrent data structures: a thread-safe queue, and concurrent hash map, and a wrapper for ensuring synchronized access to single objects.

It also includes the new facilities from the Concurrency TS:

Unwrapping the future

V2.4 adds automatic future unwrapping, as specified by the Concurrency TS, so a future<T> can be constructed from a future<future<T>>. This also works with continuations, so if your continuation returns a future<T>, the return value from then is still a future<T>, whereas if your continuation returns a T that is not a future, then then returns future<T>.

This is great for continuation-style concurrency, as it allows you to compose continuations easily, and use asynchronous calls within continuations without adding an additional layer of future-wrapping.

std::experimental::future<InitialData> first_task();
std::experimental::future<IntermediateData> second_task(InitialData data);
std::experimental::future<FinalData> third_task(IntermediateData data);

std::experimental::future<FinalData> f=first_task.then([](auto data){
    return second_task(data.get());
}).then([](auto data){
    return third_task(data.get());
});

New compiler/OS support

V2.4 also adds support for gcc 5 on Fedora 22 and 23, and gcc 6 on Fedora 24 and 25.

Just::Thread Pro is now fully supported on the following compiler/OS combinations (32-bit and 64-bit):

  • Microsoft Visual Studio 2015 for Windows
  • gcc 5 for Ubuntu 14.04 or later
  • gcc 6 for Ubuntu 14.04 or later
  • gcc 5 on Fedora 22 and 23
  • gcc 6 on Fedora 24 and 25

Just::Thread Pro v2.2 is also supported with the Just::Thread compatibility library on the following compiler/OS combinations:

  • Microsoft Visual Studio 2005, 2008, 2010, 2012 and 2013 for Windows
  • TDM gcc 4.5.2, 4.6.1 and 4.8.1 for Windows
  • g++ 4.3 or later for Ubuntu 9.04 or later
  • g++ 4.4 or later for Fedora 13 or later
  • g++ 4.4 for Centos 6
  • MacPorts g++ 4.3 to 4.8 on MacOSX Snow Leopard or later

All licences include a free upgrade to point releases, so if you purchase now you'll get a free upgrade to all 2.x releases of Just::Thread Pro.

Posted by Anthony Williams
[/ news /] permanent link
Tags: , , ,

| Stumble It! stumbleupon logo | Submit to Reddit reddit logo | Submit to DZone dzone logo

Comment on this post

Follow me on Twitter