At Codeplay I currently work on a compiler backend for an embedded accelerator. The backend is the part of the compiler which takes some representation of the source code and translates it to machine code. In my case I’m working with LLVM, so this representation is LLVM IR.
It’s very common for accelerators like GPUs to have multiple regions of addressable memory – each with distinct properties. One important optimisation I’ve implemented recently is extending LLVM’s alias analysis functionality to handle the different address spaces for our target architecture.
This article will give an overview of the aim of alias analysis – along with an example based around address spaces – and show how custom alias analyses can be expressed in LLVM. You should be able to understand this article even if you don’t work with compilers or LLVM; hopefully it gives some insight into what compiler developers do to make the tools you use generate better code. LLVM developers may find the implementation section helpful, but may want to read the documentation or examples linked at the bottom for more details.
What is alias analysis
Alias analysis is the process of determining whether two pointers can point to the same object (alias) or not. This is very important for some valuable optimisations.
Take this C function as an example:
__attribute__s specify that
a points to an
int in address space
b points to an
int in address space
1. An important detail of the target architecture for this code is that address spaces
1 are completely distinct: modifying memory in address space
0 can never affect memory in address space
1. Here’s some LLVM IR which could be generated from this function:
For those unfamiliar with LLVM IR, the first
store is storing
*a, the second storing
%0 = ... line is like loading
*a into a temporary variable, which is then returned in the final line.
Now we want
foo to be optimised. Can you see an optimisation which could be made?
What we really want is for that load from
a (the line beginning
%0 = ...) to be removed and for the final statement to instead return
42. We want the optimised code to look like this:
However, we have to be very careful, because this optimisation is only valid if
b do not alias, i.e. they must not point at the same object. Forgetting about the address spaces for a second, consider this call to
foo where we pass pointers which do alias:
Inside the unoptimised version of
i will be set to
42, then to
20 will be returned. However, if we carry out desired optimisation then the two stores will occur, but
42 will be returned instead of
20. We’ve just broken the behaviour of our function.
The only way that a compiler can reasonably carry out the above optimisation is if it can prove that the two pointers cannot possibly alias. This reasoning is carried out through alias analysis.
Custom alias analysis in LLVM
As I mentioned above, address spaces
1 for our target architecture are distinct. However, this may not hold for some systems, so LLVM cannot assume that it holds in general: we need to make it explicit.
The magic happens in the
alias member function, which takes two
MemoryLocations and returns an
AliasResult. The result indicates whether the locations can never alias, may alias, partially alias, or precisely alias. We want our analysis to say “If the address spaces for the two memory locations are different, then they can never alias”. The resulting code is surprisingly close to this English description:
Alongside this you need a bunch of boilerplate for creating a pass out of this analysis (I’ll link to a full example at the end), but after that’s done you just register the pass and ensure that the results of it are tracked:
We also want to ensure that there is an optimisation pass which will remove unnecessary loads and stores which our new analysis will find. One example is Global Value Numbering.
After all this is done, running the optimiser on the LLVM IR from the start of the post will eliminate the unnecessary load, and the generated code will be faster as a result.
You can see an example with all of the necessary boilerplate in LLVM’s test suite. The AMDGPU target has a full implementation which is essentially what I presented above extended for more address spaces.
Hopefully you have got a taste of what alias analysis does and the kinds of work involved in writing compilers, even if I have not gone into a whole lot of detail. I may write some more short posts on other areas of compiler development if readers find this interesting. Let me know on Twitter or in the comments!