Exploring Clang Tooling – Using Build Tools with clang-tidy

This post is part of a regular series of posts where the C++ product team and other guests answer questions we have received from customers. The questions can be about anything C++ related: MSVC toolset, the standard language and library, the C++ standards committee, isocpp.org, CppCon, etc.

Today’s post is by guest author Stephen Kelly, who is a developer at Havok, a contributor to Qt and CMake and a blogger. This post is part of a series where he is sharing his experience using Clang tooling in his current team.

The previous series about clang-tidy on this blog covered the basics of creating a clang-tidy extension and tooling to support that in the form of clang-query.

While the series focused on single-file examples for simplicity, developers progressing in this direction will need to run the tooling on all of the files in their project at once, or on all files which match a specific pattern.

Delayed refactoring

The first problem with processing multiple files is that we can no longer change files as we process them and discover locations to refactor. Tools like clang-tidy only work if the source code compiles, so a process which changed a header file while processing the first source file would cause the next source file to not be compilable.

To resolve this problem, clang-tidy has the ability to export refactoring changes to a .yaml file, instead of changing the files directly.

The clang-apply-replacements tool can then be run on a directory of .yaml files in order to apply the changes to all of the files at once.

The run-clang-tidy script in the clang repository helps with these tasks. It accepts a pattern of files and processes all matching files in parallel, making use of all available cores.

Build tools

Consider the similarity between using a compiler with a .cpp file to produce an object file and using clang-tidy to produce a .yaml file.

This similarity implies that we can use build tools with clang-tidy.

We can use any tool to generate a Ninja buildsystem, but generally they are not currently optimized for generating commands which invoke clang-tidy instead of a compiler. Although CMake has clang-tidy support, it doesn’t have direct support for delayed refactoring, so the CMake integration is currently more suitable to linting instead of refactoring tasks.

For now, using some tricks, we can use CMake to generate a buildsystem from a compile_commands.json file. The generated ‘buildsystem’ simply uses clang-tidy in place of the compiler, so that it outputs .yaml files instead of object files. The CMake script produces a ‘buildsystem’ based on the content of a compile_commands.json file which you have already generated.

We can instruct CMake to generate a Ninja ‘buildsystem’ and run a ‘build’ in the normal way to invoke the refactor:

cmake .. -G Ninja -DCMAKE_CXX_COMPILER=<path_to_clang_tidy>
cmake --build .

Ninja processes the inputs in parallel, so this results in a collection of .yaml files in the fixes directory. We can use clang-apply-replacements to apply those fixes to the source code.

Using CMake and Ninja brings advantages that the run-clang-tidy script doesn’t provide. Because we are modelling mechanical refactoring as a build task, we can use other build tools which work with Ninja and CMake. To start, we can convert the log of Ninja performing the refactor to a trace which is compatible with the Chrome about:tracing tool. This gives output showing the length of time taken for each translation unit:

We can also take advantage of the fact that we are now using CMake to handle the refactoring. Using Visual Studio Code and the CMake Tools plugin, we can simply open the folder containing the CMakeLists.txt and trigger the refactoring task from there.

Add a custom kit to the CMake Tools for running clang-tidy:

{
  "name": "Clang tidy",
  "compilers": {
    "CXX": "C:/dev/prefix/bin/clang-tidy.exe"
  }
}

Now, when we invoke Build in Visual Studio Code, the refactoring is started. Diagnostics are also collected with easy navigation to the source code.

Because CMake can generate Visual Studio solutions, it is also possible to control the refactoring from within Visual Studio. As this requires creating a Toolset file to replace the compiler with clang-tidy, it is slightly out of scope of this post but it follows the same pattern to achieve the result.

Distributing the refactor

Consider how we distribute our build tasks on the network.

If we treat clang-tidy as a compiler, then we should be able to use a build-distribution tool to distribute our refactoring task on the network.

One such build distribution tool is Icecream, which is popular on Linux systems and available under the GPL. Icecream works by sending an archive of the build tools to client machines so that the actual compilation is run on the remote machine and the resulting object file is sent back to the client.

By packaging the clang-tidy executable, renamed to clang so that Icecream accepts it, we can refactor on remote machines and send resulting .obj files (named so that Icecream accepts them, but containing yaml content) to clients. The Icecream Monitor tool then shows a progress of the distributed task among the build nodes.

This work distribution brings a significant increase in speed to the refactoring task. Using this technique I have been able to make mechanical changes to the LLVM/Clang source (millions of lines of code) in minutes which would otherwise take hours if run only locally. Because there is no need to link libraries while refactoring, each refactor does not conflict with any other and the process can be embarrassingly parallel.

Conclusion

Mechanical refactoring with clang-tidy requires distribution over a network in order to complete in reasonable time on large codebases. What other build tools do you think would be adaptable for refactoring tasks? Let us know in the comments below or contact the author directly via e-mail at stkelly@microsoft.com, or on Twitter @steveire.