Clang Transformer Tutorial¶
A tutorial on how to write a source-to-source translation tool using Clang Transformer.
What is Clang Transformer?¶
Clang Transformer is a framework for writing C++ diagnostics and program transformations. It is built on the clang toolchain and the LibTooling library, but aims to hide much of the complexity of clang’s native, low-level libraries.
The core abstraction of Transformer is the rewrite rule, which specifies how to change a given program pattern into a new form. Here are some examples of tasks you can achieve with Transformer:
warn against using the name
MkX
for a declared function,change
MkX
toMakeX
, whereMkX
is the name of a declared function,change
s.size()
toSize(s)
, wheres
is astring
,collapse
e.child().m()
toe.m()
, for any expressione
and method namedm
.
All of the examples have a common form: they identify a pattern that is the
target of the transformation, they specify an edit to the code identified by
the pattern, and their pattern and edit refer to common variables, like s
,
e
, and m
, that range over code fragments. Our first and second examples also
specify constraints on the pattern that aren’t apparent from the syntax alone,
like “s
is a string
.” Even the first example (“warn …”) shares this form,
even though it doesn’t change any of the code – it’s “edit” is simply a no-op.
Transformer helps users succinctly specify rules of this sort and easily execute them locally over a collection of files, apply them to selected portions of a codebase, or even bundle them as a clang-tidy check for ongoing application.
Who is Clang Transformer for?¶
Clang Transformer is for developers who want to write clang-tidy checks or write tools to modify a large number of C++ files in (roughly) the same way. What qualifies as “large” really depends on the nature of the change and your patience for repetitive editing. In our experience, automated solutions become worthwhile somewhere between 100 and 500 files.
Getting Started¶
Patterns in Transformer are expressed with clang’s AST matchers. Matchers are a language of combinators for describing portions of a clang Abstract Syntax Tree (AST). Since clang’s AST includes complete type information (within the limits of single Translation Unit (TU), these patterns can even encode rich constraints on the type properties of AST nodes.
We assume a familiarity with the clang AST and the corresponding AST matchers for the purpose of this tutorial. Users who are unfamiliar with either are encouraged to start with the recommended references in Related Reading.
Example: style-checking names¶
Assume you have a style-guide rule which forbids functions from being named “MkX” and you want to write a check that catches any violations of this rule. We can express this a Transformer rewrite rule:
makeRule(functionDecl(hasName("MkX").bind("fun"),
noopEdit(node("fun")),
cat("The name ``MkX`` is not allowed for functions; please rename"));
makeRule
is our go-to function for generating rewrite rules. It takes three
arguments: the pattern, the edit, and (optionally) an explanatory note. In our
example, the pattern (functionDecl(...)
) identifies the declaration of the
function MkX
. Since we’re just diagnosing the problem, but not suggesting a
fix, our edit is an no-op. But, it contains an anchor for the diagnostic
message: node("fun")
says to associate the message with the source range of
the AST node bound to “fun”; in this case, the ill-named function declaration.
Finally, we use cat
to build a message that explains the change. Regarding the
name cat
– we’ll discuss it in more detail below, but suffice it to say that
it can also take multiple arguments and concatenate their results.
Note that the result of makeRule
is a value of type
clang::transformer::RewriteRule
, but most users don’t need to care about the
details of this type.
Example: renaming a function¶
Now, let’s extend this example to a transformation; specifically, the second example above:
makeRule(declRefExpr(to(functionDecl(hasName("MkX")))),
changeTo(cat("MakeX")),
cat("MkX has been renamed MakeX"));
In this example, the pattern (declRefExpr(...)
) identifies any reference to
the function MkX
, rather than the declaration itself, as in our previous
example. Our edit (changeTo(...)
) says to change the code matched by the
pattern to the text “MakeX”. Finally, we use cat
again to build a message
that explains the change.
Here are some example changes that this rule would make:
Original |
Result |
---|---|
|
|
|
|
|
|
Example: method to function¶
Next, let’s write a rule to replace a method call with a (free) function call,
applied to the original method call’s target object. Specifically, “change
s.size()
to Size(s)
, where s
is a string
.” We start with a simpler
change that ignores the type of s
. That is, it will modify any method call
where the method is named “size”:
llvm::StringRef s = "str";
makeRule(
cxxMemberCallExpr(
on(expr().bind(s)),
callee(cxxMethodDecl(hasName("size")))),
changeTo(cat("Size(", node(s), ")")),
cat("Method ``size`` is deprecated in favor of free function ``Size``"));
We express the pattern with the given AST matcher, which binds the method call’s
target to s
1. For the edit, we again use changeTo
, but this
time we construct the term from multiple parts, which we compose with cat
. The
second part of our term is node(s)
, which selects the source code
corresponding to the AST node s
that was bound when a match was found in the
AST for our rule’s pattern. node(s)
constructs a RangeSelector
, which, when
used in cat
, indicates that the selected source should be inserted in the
output at that point.
Now, we probably don’t want to rewrite all invocations of “size” methods, just
those on std::string
s. We can achieve this change simply by refining our
matcher. The rest of the rule remains unchanged:
llvm::StringRef s = "str";
makeRule(
cxxMemberCallExpr(
on(expr(hasType(namedDecl(hasName("std::string"))))
.bind(s)),
callee(cxxMethodDecl(hasName("size")))),
changeTo(cat("Size(", node(s), ")")),
cat("Method ``size`` is deprecated in favor of free function ``Size``"));
Example: rewriting method calls¶
In this example, we delete an “intermediary” method call in a string of invocations. This scenario can arise, for example, if you want to collapse a substructure into its parent.
llvm::StringRef e = "expr", m = "member";
auto child_call = cxxMemberCallExpr(on(expr().bind(e)),
callee(cxxMethodDecl(hasName("child"))));
makeRule(cxxMemberCallExpr(on(child_call), callee(memberExpr().bind(m)),
changeTo(cat(e, ".", member(m), "()"))),
cat("``child`` accessor is being removed; call ",
member(m), " directly on parent"));
This rule isn’t quite what we want: it will rewrite my_object.child().foo()
to
my_object.foo()
, but it will also rewrite my_ptr->child().foo()
to
my_ptr.foo()
, which is not what we intend. We could fix this by restricting
the pattern with not(isArrow())
in the definition of child_call
. Yet, we
want to rewrite calls through pointers.
To capture this idiom, we provide the access
combinator to intelligently
construct a field/method access. In our example, the member access is expressed
as:
access(e, cat(member(m)))
The first argument specifies the object being accessed and the second, a
description of the field/method name. In this case, we specify that the method
name should be copied from the source – specifically, the source range of m
’s
member. To construct the method call, we would use this expression in cat
:
cat(access(e, cat(member(m))), "()")
Reference: ranges, stencils, edits, rules¶
The above examples demonstrate just the basics of rewrite rules. Every element we touched on has more available constructors: range selectors, stencils, edits and rules. In this section, we’ll briefly review each in turn, with references to the source headers for up-to-date information. First, though, we clarify what rewrite rules are actually rewriting.
Rewriting ASTs to… Text?¶
The astute reader may have noticed that we’ve been somewhat vague in our explanation of what the rewrite rules are actually rewriting. We’ve referred to “code”, but code can be represented both as raw source text and as an abstract syntax tree. So, which one is it?
Ideally, we’d be rewriting the input AST to a new AST, but clang’s AST is not terribly amenable to this kind of transformation. So, we compromise: we express our patterns and the names that they bind in terms of the AST, but our changes in terms of source code text. We’ve designed Transformer’s language to bridge the gap between the two representations, in an attempt to minimize the user’s need to reason about source code locations and other, low-level syntactic details.
Range Selectors¶
Transformer provides a small API for describing source ranges: the
RangeSelector
combinators. These ranges are most commonly used to specify the
source code affected by an edit and to extract source code in constructing new
text.
Roughly, there are two kinds of range combinators: ones that select a source
range based on the AST, and others that combine existing ranges into new ranges.
For example, node
selects the range of source spanned by a particular AST
node, as we’ve seen, while after
selects the (empty) range located immediately
after its argument range. So, after(node("id"))
is the empty range immediately
following the AST node bound to id
.
For the full collection of RangeSelector
s, see the header,
clang/Tooling/Transformer/RangeSelector.h
Stencils¶
Transformer offers a large and growing collection of combinators for
constructing output. Above, we demonstrated cat
, the core function for
constructing stencils. It takes a series of arguments, of three possible kinds:
Raw text, to be copied directly to the output.
Selector: specified with a
RangeSelector
, indicates a range of source text to copy to the output.Builder: an operation that constructs a code snippet from its arguments. For example, the
access
function we saw above.
Data of these different types are all represented (generically) by a Stencil
.
cat
takes text and RangeSelector
s directly as arguments, rather than
requiring that they be constructed with a builder; other builders are
constructed explicitly.
In general, Stencil
s produce text from a match result. So, they are not
limited to generating source code, but can also be used to generate diagnostic
messages that reference (named) elements of the matched code, like we saw in the
example of rewriting method calls.
Further details of the Stencil
type are documented in the header file
clang/Tooling/Transformer/Stencil.h.
Edits¶
Transformer supports additional forms of edits. First, in a changeTo
, we can
specify the particular portion of code to be replaced, using the same
RangeSelector
we saw earlier. For example, we could change the function name
in a function declaration with:
makeRule(functionDecl(hasName("bad")).bind(f),
changeTo(name(f), cat("good")),
cat("bad is now good"));
We also provide simpler editing primitives for insertion and deletion:
insertBefore
, insertAfter
and remove
. These can all be found in the header
file
clang/Tooling/Transformer/RewriteRule.h.
We are not limited one edit per match found. Some situations require making multiple edits for each match. For example, suppose we wanted to swap two arguments of a function call.
For this, we provide an overload of makeRule
that takes a list of edits,
rather than just a single one. Our example might look like:
makeRule(callExpr(...),
{changeTo(node(arg0), cat(node(arg2))),
changeTo(node(arg2), cat(node(arg0)))},
cat("swap the first and third arguments of the call"));
EditGenerator
s (Advanced)¶
The particular edits we’ve seen so far are all instances of the ASTEdit
class,
or a list of such. But, not all edits can be expressed as ASTEdit
s. So, we
also support a very general signature for edit generators:
using EditGenerator = MatchConsumer<llvm::SmallVector<Edit, 1>>;
That is, an EditGenerator
is function that maps a MatchResult
to a set
of edits, or fails. This signature supports a very general form of computation
over match results. Transformer provides a number of functions for working with
EditGenerator
s, most notably
flatten
EditGenerator
s, like list flattening. For the full list, see the header file
clang/Tooling/Transformer/RewriteRule.h.
Rules¶
We can also compose multiple rules, rather than just edits within a rule,
using applyFirst
: it composes a list of rules as an ordered choice, where
Transformer applies the first rule whose pattern matches, ignoring others in the
list that follow. If the matchers are independent then order doesn’t matter. In
that case, applyFirst
is simply joining the set of rules into one.
The benefit of applyFirst
is that, for some problems, it allows the user to
more concisely formulate later rules in the list, since their patterns need not
explicitly exclude the earlier patterns of the list. For example, consider a set
of rules that rewrite compound statements, where one rule handles the case of an
empty compound statement and the other handles non-empty compound statements.
With applyFirst
, these rules can be expressed compactly as:
applyFirst({
makeRule(compoundStmt(statementCountIs(0)).bind("empty"), ...),
makeRule(compoundStmt().bind("non-empty"),...)
})
The second rule does not need to explicitly specify that the compound statement
is non-empty – it follows from the rules position in applyFirst
. For more
complicated examples, this can lead to substantially more readable code.
Sometimes, a modification to the code might require the inclusion of a
particular header file. To this end, users can modify rules to specify include
directives with addInclude
.
For additional documentation on these functions, see the header file clang/Tooling/Transformer/RewriteRule.h.
Using a RewriteRule as a clang-tidy check¶
Transformer supports executing a rewrite rule as a
clang-tidy check, with the class
clang::tidy::utils::TransformerClangTidyCheck
. It is designed to require
minimal code in the definition. For example, given a rule
MyCheckAsRewriteRule
, one can define a tidy check as follows:
class MyCheck : public TransformerClangTidyCheck {
public:
MyCheck(StringRef Name, ClangTidyContext *Context)
: TransformerClangTidyCheck(MyCheckAsRewriteRule, Name, Context) {}
};
TransformerClangTidyCheck
implements the virtual registerMatchers
and
check
methods based on your rule specification, so you don’t need to implement
them yourself. If the rule needs to be configured based on the language options
and/or the clang-tidy configuration, it can be expressed as a function taking
these as parameters and (optionally) returning a RewriteRule
. This would be
useful, for example, for our method-renaming rule, which is parameterized by the
original name and the target. For details, see
clang-tools-extra/clang-tidy/utils/TransformerClangTidyCheck.h