How to Use CMake Without the Agonizing Pain - Part 2

Mon 31 May 2021

Welcome back to Part 2 of this series! I was very happy to see the warm reception Part 1 got over on /r/cpp. Before we get started, I thought I would take this opportunity to clarify a couple of points about this series.

First, this series is not a tutorial, at least not in the traditional sense. My hope with this project is to show you how to reason about CMake so that it feels intuitive. I want readers to see the big picture and to develop a taste for quality build code. Still, there will be some space dedicated to exploring specific effective practices, and pointing out common mistakes, superseded features, etc. but all with an eye towards understanding why.

Second, while I complained about the ocean of bad CMake resources, I forgot to recognize the handful of good resources that have taught me well. I have added a list of these resources to the end of Part 1.

Today, I'd like to talk about what you should expect from a CMake build, and some common pitfalls that violate these expectations. Not every CMake project you encounter will meet these criteria. I would encourage you to begin a friendly dialogue with the maintainers of non-conforming projects to see if they can be fixed (and, in the spirit of open source, try opening a PR!).

Expect vanilla builds to work

I'm going to make a bold claim, here: it should be possible to build any CMake project using any generator with the following sequence of commands, assuming all its dependencies are installed to system locations:

# For a single-configuration generator:
$ cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
$ cmake --build build
$ cmake --install build --prefix /path/to/wherever

# For a multi-configuration generator:
$ cmake -S . -B build
$ cmake --build build --config Release
$ cmake --install build --config Release --prefix /path/to/wherever

Furthermore, if the code is standards-compliant and platform-independent, this sequence should work with any compiler on any operating system.

Pitfall: unnecessary flags and settings

Obviously, if you're building a Linux-only tool that depends on GNU extensions, then you will need GCC or Clang. Unfortunately, many CMake builds assume too much about the environment or toolchain and inject optional, compiler-specific flags into their builds. Often, they provide no way to disable them. Such projects might needlessly fail on a different compiler or even a different version of the same compiler used by the author.

The most common example is adding -Werror unconditionally. The meaning of -Wall changes across compiler versions, so while this code might work for you today, it is at high risk of bit-rotting:

# BAD: don't do this!
target_compile_options(target PRIVATE -Wall -Werror)

For a subtler example, both GCC and Clang provide warning flags for missing uses of the C++11 override keyword. On GCC 5.1 and newer, it's -Wsuggest-override and on Clang 10 and below the check is split between two flags: -Winconsistent-missing-destructor-override and -Winconsistent-missing-override. Providing a Clang-only flag to GCC will throw an error, and providing the GCC-only flag to Clang will produce a warning that may be upgraded to an error if -Werror is also specified. Thus, if you naively write

# BAD: don't do this!
target_compile_options(target PRIVATE -Winconsistent-missing-override)

then your build will break with GCC! If you add -Wsuggest-override like this, then your build will break with -Werror on Clang 10! Ask yourself: do you really want to track warning flag compatibility across compiler vendors and versions? Is that a good use of your time?

I'm here to tell you that you don't want to, and that it's a waste of time. You can save yourself a lot of hassle if you only include firm build requirements in the CMakeLists.txt. Your code will build without any warnings enabled, so they don't belong there. In the past, you would have needed to create a toolchain file or at least guard these settings with the appropriate checks and options() to disable them. However, since CMake 3.19, you can add these to a preset. Create a file named CMakePresets.json next to your CMakeLists.txt with these contents:

{
   "version": 1,
   "cmakeMinimumRequired": {
      "major": 3,
      "minor": 19,
      "patch": 0
   },
   "configurePresets": [
      {
         "name": "gcc",
         "displayName": "GCC",
         "description": "Default build options for GCC",
         "generator": "Ninja",
         "binaryDir": "${sourceDir}/build",
         "cacheVariables": {
            "CMAKE_CXX_FLAGS": "-Wsuggest-override"
         }
      },
      {
         "name": "clang",
         "displayName": "Clang",
         "description": "Default build options for Clang",
         "generator": "Ninja",
         "binaryDir": "${sourceDir}/build",
         "cacheVariables": {
            "CMAKE_CXX_FLAGS": "-Winconsistent-missing-override -Winconsistent-missing-destructor-override"
         }
      }
   ]
}

Then someone (an end-user, CI, you) can use your preset like so:

$ cmake --preset=gcc -DCMAKE_BUILD_TYPE=Release
$ cmake --build build

Presets will fundamentally change the way people work with CMake and share their optional (but desired) build settings with users. They also significantly reduce the risk of your build breaking with a different compiler or version. Remember: it is much easier to write a correct build by keeping your CMakeLists.txt minimal and writing an opt-in preset than by checking all the relevant factors (compiler vendor, version, active language, etc.) before adding a flag.

To really drive this point home, this code safely adds -Wsuggest-override. It should burn your eyeballs:

# My eyes! The goggles do nothing!
option(MyProj_ENABLE_WARNINGS "Compile MyProj with warnings used by upstream" OFF)
if (MyProj_ENABLE_WARNINGS)
   # keep line width low
   set(is_clang "$<COMPILE_LANG_AND_ID:CXX,Clang>")
   set(is_gcc "$<COMPILE_LANG_AND_ID:CXX,GNU>")
   set(ver "$<CXX_COMPILER_VERSION>")

   target_compile_options(
     target
     PRIVATE
       "$<$<AND:${is_clang},$<VERSION_GREATER_EQUAL:${ver},11>>:-Wsuggest-override>"
       "$<$<AND:${is_gcc},$<VERSION_GREATER_EQUAL:${ver},5>>:-Wsuggest-override>"
   )
endif ()

This sort of thing does not scale. If a preset doesn't work for an end-user, they can override it piecemeal at the command line. On the other hand, incorrect CMake code inflicts an error with no recourse but to patch your build.

Don't forget that other people besides your core development team will use your build. Package maintainers, consumers of your library (if applicable), and power users looking to be on the cutting edge will all want to build your package with a slightly different set of flags, compilers, versions, and operating systems. The path of least resistance (using presets) both makes your CMakeLists.txt easy to maintain for you and easy to consume for all your users.

Pitfall: bad dependency management

If you use only well-behaved CMake packages with find_package, this will largely take care of itself. Unfortunately, many CMake packages are not well-behaved. To keep this article focused, strategies for wrangling bad CMake (and non-CMake) dependencies will be covered in Part 3.

Expect incremental builds to work

Some particularly pathological projects require you to run CMake twice up front in order to get a correct build. This should never be the case and is covered by the one-configure build recipe above. It's also fairly uncommon.

However, disappointingly many projects require you to manually re-run CMake before any incremental build. The whole point of CMake is to generate faithful implementations of the abstract build model. One configure step ought to be all you need. After the first run, the build tool (e.g. make) should know when it needs to re-run CMake.

The technical term here is "idempotence": running the CMake configure step twice with the same inputs should be no different from running it once. Any other behavior is unfriendly to developers and should be considered a bug with the project. (Note: Xcode has some architectural limitations that make this impossible; see this Discourse discussion for more details)

Pitfall: terrifying cache behavior

There are several ways you can unintentionally break idempotence. If you use set(CACHE), there's a good chance your build is broken. Here's an example from a bug report I filed recently. If you were wondering what the "agonizing pain" I've been talking about is, look no further. This is the kind of thing nobody should ever need to know in the first place. Suppose you have the following:

cmake_minimum_required(VERSION 3.20)
project(test LANGUAGES NONE)

set(var 1)
set(var 2 CACHE STRING "")

message(STATUS "var = ${var}")

What does it print? Let's see:

$ cmake -S . -B build
-- var = 2

What happened here? Really take a minute to think about what the underlying rule could be. Now let's try running the same command again, without changing absolutely anything:

$ cmake -S . -B build
-- var = 1

Whatever you thought the rule was, I bet you did not expect this. Why is it 1, now?

I'll fill you in: when CMake runs, it loads the cache into a special, global scope. When set(CACHE) runs, it checks to see if there is already an entry in the cache. If not, then it creates one and deletes the normal variable binding to expose the newly cached value. Otherwise, it won't do anything at all (unless FORCE is specified). Don't ask me how it works if there are multiple variables of the same name in nested directory or function scopes. I'm not sure I even want to know.

Now let's try to set the cache variable at the command line:

$ cmake -S . -B build -Dvar=3
-- var = 3

What happened here?! Neither value mattered! The normal variable won before, but now set(CACHE) overwrote it? Why? Do command-line variables have their own, special, innermost scope? Are they immutable?

Well, here's the answer: setting a cache variable at the command line with no type deletes the type that was already established (what?), and so set(CACHE) will add the type when it runs (ok...), and when this happens it will also delete the normal binding as if the variable did not exist at all (what?!), and that isn't even documented behavior (WHAT?!). If you use -Dvar:STRING=3 instead, then it will print 1.

Here's what the docs do have to say about this:

If the cache entry does not exist prior to the call or the FORCE option is given then the cache entry will be set to the given value. Furthermore, any normal variable binding in the current scope will be removed to expose the newly cached value to any immediately following evaluation.

It is possible for the cache entry to exist prior to the call but have no type set if it was created on the cmake(1) command line by a user through the -D=<var>=<value> option without specifying a type. In this case the set command will add the type.

Nowhere in that last sentence does it say it will delete the normal variable binding when the type is not set. This whole behavior is downright byzantine.

Thankfully, the devs have implemented a policy fix that will ship in CMake 3.21! With CMP0126 enabled, the set command will not touch normal variables, meaning that they always "win". This is how option() works and is yet another reason to use the newest CMake. Until then, I believe the best practice is to only cache an existing normal variable, guarded by a check if it already exists:

if (NOT DEFINED var)
  # compute default value for var
  set(var "${var}" CACHE <TYPE> "doc string")
endif ()

This ensures that the value of var is consistent no matter the state of the cache. After CMake 3.21, you may safely set the cache variable directly and to any default value.

Pitfall: configure-step dependencies

Another common cause of build issues is to fail to declare a dependency for the configure-step. If your project makes heavy use of execute_process or otherwise reads and writes files during the configure step, those files should be added to the CMAKE_CONFIGURE_DEPENDS directory property, like so:

# both `.` and `file` are relative to current source directory
set_property(DIRECTORY . APPEND PROPERTY CMAKE_CONFIGURE_DEPENDS "file")

This will cause the generated build to check those files and re-run CMake if they have changed. Some commands, like configure_file, are smart enough to update this property automatically. Others, like file(COPY) are not; use configure_file in favor of other "equivalent" commands when you can. Check the documentation (or better yet, write a test case) if you are ever unsure.

Pitfall: file globbing

This same problem also affects globbing for source files in CMake:

# WARNING: this code breaks idempotence
file(GLOB sources "*.cpp")
add_executable(my_app ${sources})

If you have this code, then adding a new .cpp file to the directory will not trigger a re-configure in an incremental build. As we discussed above, this is bad behavior because it forces a developer to re-run CMake as opposed to just the build tool.

One solution is to use CONFIGURE_DEPENDS, which will cause the generated build to re-evaluate the globs and re-configure if anything changes. This code correctly sets dependencies.

# This code is fine, but with caveats.
file(GLOB sources CONFIGURE_DEPENDS "*.cpp")
add_executable(my_app ${sources})

However, the developers do not promise that it will work on every generator. Here's what the documentation says:

Note: We do not recommend using GLOB to collect a list of source files from your source tree. If no CMakeLists.txt file changes when a source is added or removed then the generated build system cannot know when to ask CMake to regenerate. The CONFIGURE_DEPENDS flag may not work reliably on all generators, or if a new generator is added in the future that cannot support it, projects using it will be stuck. Even if CONFIGURE_DEPENDS works reliably, there is still a cost to perform the check on every rebuild.

This is not a theoretical concern: the immensely popular Ninja generator has a bug until 1.10.2 (which at time of writing is the newest one). Here is a link to a GitHub issue about this.

I understand this is controversial, but given that the CMake maintainers are so explicit about not globbing, I strongly believe the best thing to do is to list source files explicitly. In general, it is a good idea to avoid doing things that are explicitly unsupported because when you run into problems, the maintainers will simply tell you to fix your code.

Besides, manually listing source files is typically only annoying at the start of a new project, when the code structure is much more fluid. In the steady state, file lists change only occasionally, and the pain of updating a file list is not very great. You can (and should) typically split up your file lists using target_sources and add_subdirectory. That way no one CMakeLists.txt gets too long.

Update: a note on performance

An earlier version of this article repeated the old saw that globs are slow. In response to the discussion on Reddit, I ran some tests myself and got a mixed bag. Here's a table of my results:

Disk Filesystem OS Generator N Time (s)
Samsung SSD 970 EVO ext4 (WSL) Ubuntu 20.04 (WSL) Ninja 1000 0.0069
SanDisk SDSSDHII ext4 Ubuntu 20.04 Ninja 1000 0.0162
SanDisk SDSSDHII NTFS Windows 10 Ninja 1000 0.0364
Samsung SSD 970 EVO ext4 (WSL) Ubuntu 20.04 (WSL) Ninja 10000 0.0481
SanDisk SDSSDHII ext4 Ubuntu 20.04 Ninja 10000 0.0594
SanDisk SDSSDHII NTFS Windows 10 VS 2019 1000 0.0731
Samsung SSD 970 EVO NTFS Windows 10 Ninja 1000 0.0832
Samsung SSD 970 EVO NTFS Windows 10 VS 2019 1000 0.1012
Samsung SSD 970 EVO NTFS (3g) Ubuntu 20.04 Ninja 1000 0.1146
SanDisk SDSSDHII NTFS (3g) Ubuntu 20.04 Ninja 1000 0.1170
SanDisk SDSSDHII NTFS (9p) Ubuntu 20.04 (WSL) Ninja 100 0.2062
Samsung SSD 970 EVO NTFS (9p) Ubuntu 20.04 (WSL) Ninja 100 0.2268
SanDisk SDSSDHII NTFS Windows 10 Ninja 10000 0.2743
Samsung SSD 970 EVO ext4 (WSL) Ubuntu 20.04 (WSL) Ninja 100000 0.3712
SanDisk SDSSDHII ext4 Ubuntu 20.04 Ninja 100000 0.4383
SanDisk SDSSDHII NTFS Windows 10 VS 2019 10000 0.4710
Samsung SSD 970 EVO NTFS Windows 10 Ninja 10000 0.5616
Samsung SSD 970 EVO NTFS Windows 10 VS 2019 10000 0.8158
SanDisk SDSSDHII NTFS (3g) Ubuntu 20.04 Ninja 10000 1.1119
Samsung SSD 970 EVO NTFS (3g) Ubuntu 20.04 Ninja 10000 1.4825
SanDisk SDSSDHII NTFS (9p) Ubuntu 20.04 (WSL) Ninja 1000 1.9585
Samsung SSD 970 EVO NTFS (9p) Ubuntu 20.04 (WSL) Ninja 1000 2.1879

From my testing, it seems ext4 is a remarkably resilient filesystem. I think there is no performance argument to be made against globbing on ext4. It's also pretty clear that you should not use ntfs-3g, or especially the WSL2 NTFS 9p FUSE drivers. Build on ext4 and copy the outputs to an NTFS volume if need be. VS 2019 is slower than Ninja, but even at 10000 files, it took under a second to scan 10000 sources, so this is likely not a problem in absolute terms.

For some strange reason, NTFS was slower on my NVMe drive than on my SATA drive. I tested both drives with winsat disk -drive X, and it showed my NVMe drive is significantly faster. Maybe there's some driver weirdness here since the fastest result for N=1000 was (virtualized!) ext4 on that drive.

I have published the Python script I used for testing this here. There's a GitHub Actions workflow that runs the script on Windows, macOS, and Linux for N=1000. I expected the virtualized disks on GitHub Actions to be slow, but they were actually plenty fast, with results very similar to what I reported above.

I am curious to hear reports from readers and from the Meson and Ninja developers to see if they have more data on why globs are too slow for their systems.

Expect standard CMake variables to be honored

A great number of variables in CMake are designed to be set externally. Perhaps the most famous of these is CMAKE_CXX_FLAGS and its configuration-specific variants CMAKE_CXX_FLAGS_DEBUG, CMAKE_CXX_FLAGS_RELEASE, etc. Do not touch these variables!

As a baseline, do not touch any standard variables if they are already defined when your build runs. Move your preferred defaults to presets or use the techniques above to update the cache safely. On older CMake versions, they may be set in a toolchain file as an alternative to presets. A full list of variables may be found in the documentation, but most start with CMAKE_. Notable exceptions include BUILD_SHARED_LIBS and <PackageName>_ROOT.

In many cases, there are better ways to set a build requirement than through clobbering a reserved variable. For instance, if you want to set the C++ version then you should use target features, rather than setting CMAKE_CXX_STANDARD or (gasp!) editing CMAKE_CXX_FLAGS.

target_compile_features(my_exe PRIVATE cxx_std_14)
target_compile_features(my_lib PUBLIC cxx_std_17)  # PUBLIC so that linkees use >= C++17

Setting the standard requirement as a PUBLIC (really INTERFACE) property on a library will propagate this to linkees even after exporting my_lib for use in a find_package module. We'll talk more about packaging and being a good dependency in a few weeks.

Some libraries (like abseil) change their ABI depending on the active standard version. If you have to do this, then you can encode the requirement by checking CMAKE_CXX_STANDARD to pick the correct cxx_std_N feature to act as a usage requirement:

# C++14 or greater is required for my_lib
if (CMAKE_CXX_STANDARD GREATER 14)
  target_compile_features(my_lib PUBLIC cxx_std_${CMAKE_CXX_STANDARD})
else ()
  target_compile_features(my_lib PUBLIC cxx_std_14)
endif ()

Either way, your users can set a higher CMAKE_CXX_STANDARD value at the command line. This empowers your users to ensure ABI compatibility when using experimental support for draft C++ standards when building from source. If you set CMAKE_CXX_STANDARD unconditionally, you take this control away from your users.

Conclusion

This is what you should take away from this post:

  1. Your CMakeLists.txt file should be minimal and include only firm build requirements; everything else should be opt-in (preferably in a preset). Warning flags are not firm requirements.
  2. The configure step of your build should never need to run twice in a row with the same settings, and incremental builds should not require the user to manually re-run CMake. This means using CONFIGURE_DEPENDS on globs or, better yet, avoiding them.
  3. Be careful when setting a cache variable, even without FORCE, as it might remove a normal variable unpredictably. Before CMake 3.21 (unreleased), don't set(CACHE) without confirming the variable does not exist.
  4. Avoid touching standard CMake variables; prefer target properties or move such settings to the presets (at least make your edits opt-in somehow). Stop thinking in terms of flags and start thinking in terms of goals. It's very common for novice (or even adept) CMake programmers to work themselves into an XY problem and try to shoehorn in a compiler-specific setting that has already been abstracted.

Next time, we'll talk about the target model and how to manage dependencies in modern CMake. Until then, join the conversation here on Reddit!


If this article helps you with your work, consider saying thank you by buying me a coffee! Buy me a coffee