Lately, the setups I’m working with involve more and more Docker.
Here’s a brief list of best practices I’ve distilled for myself. Comments are more than welcome. On top of what I have a fairly good idea of, there are also pieces where I would much appreciate some advice.
Principles
Problem statement first. Here are my success criteria, in the order of importance. They can be treated as definitions of done too.
Build Code inside Docker
In addition to an increasingly commonly used production / deployment technique, Docker is an amazing tool to use as the build environment.
Today, I use literally zero “virtual environments”. The code has to build inside Docker. That’s the rule.
The code, of course, does not only have to build inside Docker. I just converged to a simple wisdom: it should not be a requirement to whoever clones your code to have a development environment set up.
In a perfect world, in a team of, say, eight people, two or three may well develop on a fresh machine every single time. Put your keys into this machine, clone the code, make a change, test it locally, make a pull request.
Personally, I would go as far as saying that it should be the very same Docker-based build that can build production binaries and/or production containers. So that, in theory, a binary built on your local box is identical to the binary that would go to prod from this commit. This is not a hard rule though, just a litmus test.
Incremental Builds
Of course, production builds are best to run from scratch.
However, I see a lot of value in making intra-Docker builds incremental too.
One reason is that it just flows naturally from the above point: I want to support those developers who choose to not have a development environment set up. We already have everything ready for them to develop on their local box, running builds inside a Docker container. Why not take the natural next step and make the build scripts expose a few directories as volumes to the host machine, .gitignore
-d, of course, so that the consecutive builds don’t need to redo any work.
For example, chances are, most rebuilds don’t need .proto
files recompilation.
Of course, while it is possible to “mix” locally-built artifacts with the Docker-built ones, I would not recommend it. Building locally? It’s incremental. Building inside Docker? Incremental. Built locally for a week and now want to build inside Docker? You’d have to wait for all the steps.
Expose Static Binaries
If the language and the toolchain allow for it, I prefer to make it trivial to copy binaries outside the containers where they are built.
Ideally, this will happen automatically, with the binaries copied over, to a .gitignore
-d dir, on a host machine, as long as the build scripts are used in the recommended way.
If your prod service is in, say, Golang, and there is a performance test for it in C++ (my case now, btw), it should be straightforward to have these two binaries talk to each other all four ways: via docker compose
, both run locally, or either of the two binaries run locally and speaking with the other one that is dockerized. Because each of the binaries is there, available and ready to use, as soon as the Docker-based build succeeds.
In fact, if our production setup would not use Docker, I would still vote for dockerizing production builds. And I would still vote for putting these builds into Github actions, for what it’s worth. Just out of respect to my fellow colleagues and to my future self. Few things are more annoying than checking out a quarter-old code and seeing that it fails to build and run the tests.
Implementation Details
If you have comments on the above, I’d be curious, please make yourself comfortable in the comments.
Now, to the part where I’m still struggling to find solid grounds.
This has to do with make
.
I am a big fan of the good old make
, and, to my taste, it can play two distinct roles:
To manage dependencies and to orchestrate incremental builds, and
To simply have a standard set of
make test
/make clean
/ etc. commands.
When it comes to the above, Docker-based, build setup, these two roles appear to be conflicting.
On the one hand, for role (1) to be most effective, it should orchestrate builds both on my dev box w/o Docker and within the Docker container itself. Thus, assuming the build itself is make-based, the Makefile for (1) should clearly be copied inside the respective Docker container, and
make
should be run there.On the other hand, on the outer level, not only do I want to have
make test
andmake clean
, but also somemake build
, to build the very Docker container. And, clearly, the (part of the) Makefile the role of which is to invokedocker build
does not belong inside Docker.
While I’m still conflicted, rationally, as of now, I see nothing wrong with two Makefiles.
Especially given that docker container builds would need to be called from the root folder, because, for security reasons, Docker does not follow symlinks.
Thus, if, say, the proto/
directory is shared between two different builds, the idea of a plain, “vanilla” Dockerfile in the root of the respective repository is just wrong. In reality, there will likely be some src_code/Dockerfile
and src_test/Dockerfile
. Seen in this light, this src_code/Makefile.impl
and src_test/Makefile.impl
make perfect sense to me.
There can then be a top-level Makefile
, and or inner level src_code/Makefile
and src_test/Makefile
-s, that would just be clean wrappers over calling make
with the respective Makefile.impl
-s.
This way we get intuitive (and tab-completed!) make build/test/clean commands, plus no redundant code and scripts copied over into the very build Docker containers.
What do you think?
Footnotes
Of course, the above only makes sense if
make
is an organic part of your build cycle. If you use, say,gradle
, there is no problem in sight, to my taste: have top-levelmake build
-ish targets to build containers and such, and don’t bring inMakefile
-s into the very Docker container at all.It is worth mentioning that I have, a while ago, converted to the religion that it’s best to keep the repository directory clean, free of build artifacts, etc. For instance, because it may be part of a different file system, journaled et. al., where large binary blobs to not belong. So, in my perfect world, I vote for some
.bashrc
-set vars for build & artifact destinations, and I vote for the scripts in the repo to respect those vars. Still, the argument about being able to build & run the code on a virgin machine that only hasgit
anddocker
holds, and that’s why I don’t think it’s an overkill to put the “default” “output” directories into.gitignore
.One worthy addendum that did not make it to the text: use Docker’s cache wisely. For instance, for
npm
-based code, I would first copypackage{-lock}.json
into the container, then runnpm install
inside it, and only then copy over the rest of the code. This way, the resulting container, afternpm i
is run, but before the user code is there, is cached on the Docker level, thus dramatically speeding up one line changes. without having to mount the dreadednode_modules/
anywhere. Similarly, if there’s a step of.proto
files compilation, I would copy them over first, and run that build target separately, on the Dockerfile level, just to make sure one line changes that do not affect.proto
files are quick. In my case, by the way, this separate command would be amake
target, so that the build would work just fine without running it manually up front. Can’t comment on how universal this trick is, but to me these minor things is what differentiates good engineers from the great ones.I’ve heard arguments that Docker not following symlinks is a bug, and that due to this bug one has to run
docker build
from the top-level directory. Personally, I believe it is a feature of Docker. (It may be a bug of *nix, but that’s a different conversation). Nothing prevents me from creating a symlink to/etc
, or out right to/
, in my Github repo, so that a seemingly innocentdocker build .
would legitimate have access to my/etc/passwd
. So, yes, we do need scripts that rundocker build
from the right directory, even if all one needs is justdocker build .
in a way that can access../proto/
.Part of me is still uneasy that, unlike what’s in the
Dockerfile
, what’s in theMakefile
is run from under my username, “natively”. In other words, while I am within the realm of Docker, I am quite confident about both my personal files and my dev environment, this is far less true aboutmake
. I have yet to find a good way to work around this technological problem and/or mental trick though. More and more creative attacks are invented every day, and only building inside Docker (and not as aroot
user!) may well be the safe path here, but I’m just not ready to push for such a solution over my teams yet.