CodeView Symbol Records¶
Introduction¶
This document describes the usage and serialization format of the various CodeView symbol records that LLVM understands. Like CodeView Type Records, we describe only the important types which are generated by modern C++ toolchains.
Record Categories¶
Symbol records share one major similarity with type records: They start with the same record prefix, which we will not describe again (refer to the previous link for a description). As a result of this, a sequence of symbol records can be processed with largely the same code as that which processes type records. There are several important differences between symbol and type records:
Symbol records only appear in the The PDB Public Symbol Stream, The PDB Global Symbol Stream, and Module Info Streams.
Type records only appear in the TPI & IPI streams.
While types are referenced from other CodeView records via type indices, symbol records are referenced by the byte offset of the record in the stream that it appears in.
Types can reference types (via type indices), and symbols can reference both types (via type indices) and symbols (via offsets), but types can never reference symbols.
There is no notion of Leaf Records and Member Records as there are with types. Every symbol record describes is own length.
Certain special symbol records begin a “scope”. For these records, all following records up until the next
S_END
record are “children” of this symbol record. For example, given a symbol record which describes a certain function, all local variables of this function would appear following the function up until the correspondingS_END
record.
Finally, there are three general categories of symbol record, grouped by where they are legal to appear in a PDB file. Public Symbols (which appear only in the publics stream), Global Symbols (which appear only in the globals stream) and module symbols (which appear in the module info stream).
Public Symbols¶
Public symbols are the CodeView equivalent of DWARF .debug_pubnames
. There
is one public symbol record for every function or variable in the program that
has a mangled name. The Publics Stream, which contains these
records, additionally contains a hash table that allows one to quickly locate a
record by mangled name.
S_PUB32 (0x110e)¶
There is only type of public symbol, an S_PUB32
which describes a mangled
name, a flag indicating what kind of symbol it is (e.g. function, variable), and
the symbol’s address. The Section Map Substream of the
DBI Stream can be consulted to determine what module this address
corresponds to, and from there that module’s module debug stream
can be consulted to locate full information for the symbol with the given address.
Global Symbols¶
While there is one public symbol for every symbol in the program with external linkage, there is one global symbol for every symbol in the program with linkage (including internal linkage). As a result, global symbols do not describe a mangled name or an address, since symbols with internal linkage need not have any mangling at all, and also may not have an address. Thus, all global symbols simply refer directly to the full symbol record via a module/offset combination.
Similarly to public symbols, all global symbols are contained in a single Globals Stream, which contains a hash table mapping fully qualified name to the corresponding record in the globals stream (which as mentioned, then contains information allowing one to locate the full record in the corresponding module symbol stream).
Note that a consequence and limitation of this design is that program-wide lookup by anything other than an exact textually matching fully-qualified name of whatever the compiler decided to emit is impractical. This differs from DWARF, where even though we don’t necessarily have O(1) lookup by basename within a given scope (including O(1) scope, we at least have O(n) access within a given scope).
Important
Program-wide lookup of names by anything other than an exact textually matching fully qualified name is not possible.