Docker and Make

Or, rather, Dockerfiles and Makefiles.

Feb 28, 2023

Lately, the setups I’m working with involve more and more Docker.

Here’s a brief list of best practices I’ve distilled for myself. Comments are more than welcome. On top of what I have a fairly good idea of, there are also pieces where I would much appreciate some advice.

Principles

Problem statement first. Here are my success criteria, in the order of importance. They can be treated as definitions of done too.

How to build application inside and outside Docker. Dockerfile ...

Build Code inside Docker

In addition to an increasingly commonly used production / deployment technique, Docker is an amazing tool to use as the build environment.

Today, I use literally zero “virtual environments”. The code has to build inside Docker. That’s the rule.

The code, of course, does not only have to build inside Docker. I just converged to a simple wisdom: it should not be a requirement to whoever clones your code to have a development environment set up.

In a perfect world, in a team of, say, eight people, two or three may well develop on a fresh machine every single time. Put your keys into this machine, clone the code, make a change, test it locally, make a pull request.

Personally, I would go as far as saying that it should be the very same Docker-based build that can build production binaries and/or production containers. So that, in theory, a binary built on your local box is identical to the binary that would go to prod from this commit. This is not a hard rule though, just a litmus test.

Incremental Builds

Of course, production builds are best to run from scratch.

However, I see a lot of value in making intra-Docker builds incremental too.

One reason is that it just flows naturally from the above point: I want to support those developers who choose to not have a development environment set up. We already have everything ready for them to develop on their local box, running builds inside a Docker container. Why not take the natural next step and make the build scripts expose a few directories as volumes to the host machine, .gitignore-d, of course, so that the consecutive builds don’t need to redo any work.

For example, chances are, most rebuilds don’t need .proto files recompilation.

Of course, while it is possible to “mix” locally-built artifacts with the Docker-built ones, I would not recommend it. Building locally? It’s incremental. Building inside Docker? Incremental. Built locally for a week and now want to build inside Docker? You’d have to wait for all the steps.

Expose Static Binaries

If the language and the toolchain allow for it, I prefer to make it trivial to copy binaries outside the containers where they are built.

Ideally, this will happen automatically, with the binaries copied over, to a .gitignore-d dir, on a host machine, as long as the build scripts are used in the recommended way.

If your prod service is in, say, Golang, and there is a performance test for it in C++ (my case now, btw), it should be straightforward to have these two binaries talk to each other all four ways: via docker compose, both run locally, or either of the two binaries run locally and speaking with the other one that is dockerized. Because each of the binaries is there, available and ready to use, as soon as the Docker-based build succeeds.

In fact, if our production setup would not use Docker, I would still vote for dockerizing production builds. And I would still vote for putting these builds into Github actions, for what it’s worth. Just out of respect to my fellow colleagues and to my future self. Few things are more annoying than checking out a quarter-old code and seeing that it fails to build and run the tests.

Implementation Details

If you have comments on the above, I’d be curious, please make yourself comfortable in the comments.

Footnotes

Of course, the above only makes sense if make is an organic part of your build cycle. If you use, say, gradle, there is no problem in sight, to my taste: have top-level make build-ish targets to build containers and such, and don’t bring in Makefile-s into the very Docker container at all.
It is worth mentioning that I have, a while ago, converted to the religion that it’s best to keep the repository directory clean, free of build artifacts, etc. For instance, because it may be part of a different file system, journaled et. al., where large binary blobs to not belong. So, in my perfect world, I vote for some .bashrc-set vars for build & artifact destinations, and I vote for the scripts in the repo to respect those vars. Still, the argument about being able to build & run the code on a virgin machine that only has git and docker holds, and that’s why I don’t think it’s an overkill to put the “default” “output” directories into .gitignore.
One worthy addendum that did not make it to the text: use Docker’s cache wisely. For instance, for npm-based code, I would first copy package{-lock}.json into the container, then run npm install inside it, and only then copy over the rest of the code. This way, the resulting container, after npm i is run, but before the user code is there, is cached on the Docker level, thus dramatically speeding up one line changes. without having to mount the dreaded node_modules/ anywhere. Similarly, if there’s a step of .proto files compilation, I would copy them over first, and run that build target separately, on the Dockerfile level, just to make sure one line changes that do not affect .proto files are quick. In my case, by the way, this separate command would be a make target, so that the build would work just fine without running it manually up front. Can’t comment on how universal this trick is, but to me these minor things is what differentiates good engineers from the great ones.
I’ve heard arguments that Docker not following symlinks is a bug, and that due to this bug one has to run docker build from the top-level directory. Personally, I believe it is a feature of Docker. (It may be a bug of *nix, but that’s a different conversation). Nothing prevents me from creating a symlink to /etc, or out right to /, in my Github repo, so that a seemingly innocent docker build . would legitimate have access to my /etc/passwd. So, yes, we do need scripts that run docker build from the right directory, even if all one needs is just docker build . in a way that can access ../proto/.
Part of me is still uneasy that, unlike what’s in the Dockerfile, what’s in the Makefile is run from under my username, “natively”. In other words, while I am within the realm of Docker, I am quite confident about both my personal files and my dev environment, this is far less true about make. I have yet to find a good way to work around this technological problem and/or mental trick though. More and more creative attacks are invented every day, and only building inside Docker (and not as a root user!) may well be the safe path here, but I’m just not ready to push for such a solution over my teams yet.

Dima Korolev

Discussion about this post