Opaque Pointers

The Opaque Pointer Type

Traditionally, LLVM IR pointer types have contained a pointee type. For example, i32* is a pointer that points to an i32 somewhere in memory. However, due to a lack of pointee type semantics and various issues with having pointee types, there is a desire to remove pointee types from pointers.

The opaque pointer type project aims to replace all pointer types containing pointee types in LLVM with an opaque pointer type. The new pointer type is represented textually as ptr.

Some instructions still need to know what type to treat the memory pointed to by the pointer as. For example, a load needs to know how many bytes to load from memory and what type to treat the resulting value as. In these cases, instructions themselves contain a type argument. For example the load instruction from older versions of LLVM

load i64* %p

becomes

load i64, ptr %p

Address spaces are still used to distinguish between different kinds of pointers where the distinction is relevant for lowering (e.g. data vs function pointers have different sizes on some architectures). Opaque pointers are not changing anything related to address spaces and lowering. For more information, see DataLayout. Opaque pointers in non-default address space are spelled ptr addrspace(N).

This was proposed all the way back in 2015.

Issues with explicit pointee types

LLVM IR pointers can be cast back and forth between pointers with different pointee types. The pointee type does not necessarily represent the actual underlying type in memory. In other words, the pointee type carries no real semantics.

Historically LLVM was some sort of type-safe subset of C. Having pointee types provided an extra layer of checks to make sure that the Clang frontend matched its frontend values/operations with the corresponding LLVM IR. However, as other languages like C++ adopted LLVM, the community realized that pointee types were more of a hindrance for LLVM development and that the extra type checking with some frontends wasn’t worth it.

LLVM’s type system was originally designed <https://llvm.org/pubs/2003-05-01-GCCSummit2003.html> to support high-level optimization. However, years of LLVM implementation experience have demonstrated that the pointee type system design does not effectively support optimization. Memory optimization algorithms, such as SROA, GVN, and AA, generally need to look through LLVM’s struct types and reason about the underlying memory offsets. The community realized that pointee types hinder LLVM development, rather than helping it. Some of the initially proposed high-level optimizations have evolved into TBAA <https://llvm.org/docs/LangRef.html#tbaa-metadata> due to limitations with representing higher-level language information directly via SSA values.

Pointee types provide some value to frontends because the IR verifier uses types to detect straightforward type confusion bugs. However, frontends also have to deal with the complexity of inserting bitcasts everywhere that they might be required. The community consensus is that the costs of pointee types outweight the benefits, and that they should be removed.

Many operations do not actually care about the underlying type. These operations, typically intrinsics, usually end up taking an arbitrary pointer type i8* and sometimes a size. This causes lots of redundant no-op bitcasts in the IR to and from a pointer with a different pointee type.

No-op bitcasts take up memory/disk space and also take up compile time to look through. However, perhaps the biggest issue is the code complexity required to deal with bitcasts. When looking up through def-use chains for pointers it’s easy to forget to call Value::stripPointerCasts() to find the true underlying pointer obfuscated by bitcasts. And when looking down through def-use chains passes need to iterate through bitcasts to handle uses. Removing no-op pointer bitcasts prevents a category of missed optimizations and makes writing LLVM passes a little bit easier.

Fewer no-op pointer bitcasts also reduces the chances of incorrect bitcasts in regards to address spaces. People maintaining backends that care a lot about address spaces have complained that frontends like Clang often incorrectly bitcast pointers, losing address space information.

An analogous transition that happened earlier in LLVM is integer signedness. Currently there is no distinction between signed and unsigned integer types, but rather each integer operation (e.g. add) contains flags to signal how to treat the integer. Previously LLVM IR distinguished between unsigned and signed integer types and ran into similar issues of no-op casts. The transition from manifesting signedness in types to instructions happened early on in LLVM’s timeline to make LLVM easier to work with.

Opaque Pointers Mode

During the transition phase, LLVM can be used in two modes: In typed pointer mode all pointer types have a pointee type and opaque pointers cannot be used. In opaque pointers mode (the default), all pointers are opaque. The opaque pointer mode can be disabled using -opaque-pointers=0 in LLVM tools like opt, or -Xclang -no-opaque-pointers in clang. Additionally, opaque pointer mode is automatically disabled for IR and bitcode files that explicitly mention i8* style typed pointers.

In opaque pointer mode, all typed pointers used in IR, bitcode, or created using PointerType::get() and similar APIs are automatically converted into opaque pointers. This simplifies migration and allows testing existing IR with opaque pointers.

define i8* @test(i8* %p) {
  %p2 = getelementptr i8, i8* %p, i64 1
  ret i8* %p2
}

; Is automatically converted into the following if -opaque-pointers
; is enabled:

define ptr @test(ptr %p) {
  %p2 = getelementptr i8, ptr %p, i64 1
  ret ptr %p2
}

Migration Instructions

In order to support opaque pointers, two types of changes tend to be necessary. The first is the removal of all calls to PointerType::getElementType() and Type::getPointerElementType().

In the LLVM middle-end and backend, this is usually accomplished by inspecting the type of relevant operations instead. For example, memory access related analyses and optimizations should use the types encoded in the load and store instructions instead of querying the pointer type.

Here are some common ways to avoid pointer element type accesses:

  • For loads, use getType().

  • For stores, use getValueOperand()->getType().

  • Use getLoadStoreType() to handle both of the above in one call.

  • For getelementptr instructions, use getSourceElementType().

  • For calls, use getFunctionType().

  • For allocas, use getAllocatedType().

  • For globals, use getValueType().

  • For consistency assertions, use PointerType::isOpaqueOrPointeeTypeEquals().

  • To create a pointer type in a different address space, use PointerType::getWithSamePointeeType().

  • To check that two pointers have the same element type, use PointerType::hasSameElementTypeAs().

  • While it is preferred to write code in a way that accepts both typed and opaque pointers, Type::isOpaquePointerTy() and PointerType::isOpaque() can be used to handle opaque pointers specially. PointerType::getNonOpaquePointerElementType() can be used as a marker in code-paths where opaque pointers have been explicitly excluded.

  • To get the type of a byval argument, use getParamByValType(). Similar method exists for other ABI-affecting attributes that need to know the element type, such as byref, sret, inalloca and preallocated.

  • Some intrinsics require an elementtype attribute, which can be retrieved using getParamElementType(). This attribute is required in cases where the intrinsic does not naturally encode a needed element type. This is also used for inline assembly.

Note that some of the methods mentioned above only exist to support both typed and opaque pointers at the same time, and will be dropped once the migration has completed. For example, isOpaqueOrPointeeTypeEquals() becomes meaningless once all pointers are opaque.

While direct usage of pointer element types is immediately apparent in code, there is a more subtle issue that opaque pointers need to contend with: A lot of code assumes that pointer equality also implies that the used load/store type or GEP source element type is the same. Consider the following examples with typed an opaque pointers:

define i32 @test(i32* %p) {
  store i32 0, i32* %p
  %bc = bitcast i32* %p to i64*
  %v = load i64, i64* %bc
  ret i64 %v
}

define i32 @test(ptr %p) {
  store i32 0, ptr %p
  %v = load i64, ptr %p
  ret i64 %v
}

Without opaque pointers, a check that the pointer operand of the load and store are the same also ensures that the accessed type is the same. Using a different type requires a bitcast, which will result in distinct pointer operands.

With opaque pointers, the bitcast is not present, and this check is no longer sufficient. In the above example, it could result in store to load forwarding of an incorrect type. Code making such assumptions needs to be adjusted to check the accessed type explicitly: LI->getType() == SI->getValueOperand()->getType().

Frontends

Frontends need to be adjusted to track pointee types independently of LLVM, insofar as they are necessary for lowering. For example, clang now tracks the pointee type in the Address structure.

Frontends using the C API through an FFI interface should be aware that a number of C API functions are deprecated and will be removed as part of the opaque pointer transition:

LLVMBuildLoad -> LLVMBuildLoad2
LLVMBuildCall -> LLVMBuildCall2
LLVMBuildInvoke -> LLVMBuildInvoke2
LLVMBuildGEP -> LLVMBuildGEP2
LLVMBuildInBoundsGEP -> LLVMBuildInBoundsGEP2
LLVMBuildStructGEP -> LLVMBuildStructGEP2
LLVMBuildPtrDiff -> LLVMBuildPtrDiff2
LLVMConstGEP -> LLVMConstGEP2
LLVMConstInBoundsGEP -> LLVMConstInBoundsGEP2
LLVMAddAlias -> LLVMAddAlias2

Additionally, it will no longer be possible to call LLVMGetElementType() on a pointer type.

It is possible to control whether opaque pointers are used (if you want to override the default) using LLVMContext::setOpaquePointers or LLVMContextSetOpaquePointers().

Transition State

As of April 2022 both LLVM and Clang have complete support for opaque pointers, and opaque pointers are enabled by default in LLVM and Clang.

For users of the clang driver interface, it is possible to temporarily restore the old default using the -DCLANG_ENABLE_OPAQUE_POINTERS=OFF cmake option, or by passing -Xclang -no-opaque-pointers to a single clang invocation.

For users of the clang cc1 interface, -no-opaque-pointers can be passed. Note that the CLANG_ENABLE_OPAQUE_POINTERS cmake option has no effect on the cc1 interface.

Usage for LTO can be disabled by passing -Wl,-plugin-opt=no-opaque-pointers to the clang driver.

For users of LLVM as a library, opaque pointers can be disabled by calling setOpaquePointers(false) on the LLVMContext.

For users of LLVM tools like opt, opaque pointers can be disabled by passing -opaque-pointers=0.

The next steps for the opaque pointer migration are:

  • Migrate Clang/LLVM tests to use opaque pointers.

  • Remove support for typed pointers after the LLVM 15 branch has been created.

Version Support

LLVM 14: Supports all necessary APIs for migrating to opaque pointers and deprecates/removes incompatible APIs. However, using opaque pointers in the optimization pipeline is not fully supported. This release can be used to make out-of-tree code compatible with opaque pointers, but opaque pointers should not be enabled in production.

LLVM 15: Opaque pointers are enabled by default. Typed pointers are still available, but only supported on a best-effort basis and may be untested.

LLVM 16: Only opaque pointers will be supported. Typed pointers will not be supported.