C++20 Coroutines

My experiments may well make for a good tutorial.

Aug 28, 2024

Here comes my unprovoked attempt at a C++20 coroutines tutorial.

Intro

Let me share a secret to begin with. When I began looking into coroutines in C++, I was terrified at first. There’s tons of extra bloat in syntax. There are far too many ways to make a mistake.

Errors, both static and runtime, are not easy to make sense of. With static checks, after a while, I began to understand the logic of the compiler so that I could make sense of why exactly the code is wrong and should not build. In runtime, it was also not trivial for me to grasp when exactly the promise and the return type object are constructed, from what parameters and how many times.

In short, for the first several hours, C++ coroutines are about as what one would expect from C++ in the worst way possible: a lot of nontrivial things to wrap your head around, with no clear picture forming right away, and with compiler diagnostic messages that do not seem helpful at first.

But then it hit me.

C++ is always about not paying for what you do not use.

So the Committee will always err on the side of making sure the resulting binary code is not doing extra work unless truly needed. And from this point, it is easy to explain all those constexpr return types such as suspend_always and suspend_never.

Then next step for me was to realize where the true genius of C++ coroutines is. It is in separating the definitions of the coroutines from their runtime.

See, to embed coroutines into the language, there are two whales to stand on:

Language support for stackless functions, and
The execution model for these functions.

Language support, (1), is simply about being able to write await (or, in C++, co_await), and have the code yield execution to some other logic. Yes, eventually it is about fibers and green threads, and yes, this is about making sure the code lives in the user space for as long as possible, since kernel calls are expensive in massively parallel settings. But on the language support level, these are not part of the picture yet.

Another way to put it: language support, (1), is about introducing the way for the code to read linearly, without the callback hell.

The execution model, (2), is the mechanism that makes sure the code implemented within the body of a a coroutine does get executed. Eventually, each “elementary piece of coroutine code”, “from one suspend to another” if you wish, would need to be run by some OS/kernel thread.

Here’s what the C++ Committee effectively says here. Or at least that’s the reading of the Standard that brought peace and enlightenment upon me.

C++ should be a language for everyone. Everyone includes ultra-low-power and ultra-low-bandwidth CPUs, also known as MCUs. Take an 8-bit microcontroller with literally under one megahertz clock rate. If you can use C on it — and you can! — then you should be able to use C++ on it. Including using the C++ coroutines mechanism.

I can not emphasize enough how genius I now believe it is. Coroutines in C++20 are:

100% language constructs support + 0% execution layer.

Seen in this light, the challenge is to introduce stackless (and threadless!) functions so that the core language remains intact. And this is exactly what the std::coroutine_handle<> magic does.

This also means that in the example code I wrote I had to implement the “trivial” execution layer. Which I did. Granted, in an overly-safe mutex-locked way, but I believe for [self-]educational purposes it is just fine.

Also, at some point, mostly for myself, I decided to replace the “true” calls to sleep() by the notion of “fake time” that the executor keeps track of. And I was pleasantly surprised by how simple this turned out to be.

Last but not least: My long-held dream was to implement the wrapper that would allow synchronous and asynchronous functions operate interchangeably. In simple terms: if there is a type Async<int> for an int result value that can be await-ed, I wanted to make sure an instance of this type can be created from a plain int value, with no worries on the developer side of whether a certain piece of await-driven logic is guaranteed to be a pass-through no-op, as the value is immediately available. The user may want to check this. This user may even want to check this at compile time, and perhaps add a static_assert, or even some SFINAE magic. But the decision to use or to bypass this logic should be on the writer of the code. And this is exactly what I managed to enable.

Nuff talk. Let’s get to the code.

Code

All the code I am referring to can be found at https://github.com/dkorolev/cofizzbuzz.

During my experiments it became clear quite early on that if I structure the commits just right this easy-to-follow text will be a useful byproduct. Here it is.

Most of the code snippets below loosely follow the FizzBuzz example. It’s a toy problem, and it will take under one minute to get familiar with, on an off chance you have not heard of it before.

Part I

Also, the solution to this problem in C fits a tweet. The old, 2015, tweet. The 140 chatacters tweet. Including the #Ha hashtag at the end.

I took the liberty to modify the problem so that for 15 it should print “Fizz” followed by “Buzz”, not just “FizzBuzz”. As two distinct output lines, as in, as two distinct “callback” invocations. This is to make the early stopping illustration even more explicit.

Dima Korolev

C++20 Coroutines

My experiments may well make for a good tutorial.

Intro

C++ is always about not paying for what you do not use.

100% language constructs support + 0% execution layer.

Code

Part I

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Step 8

Step 9

Step 10

Step 11

Part II

Step 12

Step 13

Step 14

Step 15

Step 16

Part III

Step 17

Step 18

Step 19

Step 20

Step 21

Step 22

Step 23

Step24

Outro

Discussion about this post