docker buildx caching

finally looking at buildx

SEAN K.H. LIAO

docker buildx caching

finally looking at buildx

buildkit

docker BuildKit is getting some interesting new features setting it apart from the other container building tools. Unfortunately, it means a new cli subcommand that's not entirely backwards compatible, say hello to docker buildx

tldr

stateless build workers

You get a fresh build environment every time, use registry caching.

 1#syntax=docker/dockerfile:1.2
 2FROM golang:rc-alpine AS build
 3WORKDIR /workspace
 4COPY go.mod go.sum .
 5RUN go mod download
 6COPY . .
 7RUN go build -o app .
 8
 9FROM scratch
10COPY --from=build /workspace/app /app
11ENTRYPOINT ["/app"]

Probably need to spend some time cleaning up the registry every once in a while. You could make life easier by using a separate registry for caching and nuking it every 2 weeks...

1docker buildx \
2  --cache-from type=registry,ref=your.registry/image \
3  --cache-to   type=registry,ref=your.registry/image,mode=max \
4  --tag your.registry/image:tag \
5  --push .
stateless with ci cache

like the above, but use an external system to save/restore the cache

1# use some tool to restore a cache from previous builds
2cache-restore /tmp/docker-cache
3docker buildx \
4  --cache-from type=local,dest=/tmp/docker-cache \
5  --cache-to   type=local,dest=/tmp/docker-cache,mode=max \
6  --tag your.registry/image:tag \
7  --push .
8# use some tool to save the cache for future use
9cache-save /tmp/docker-cache
statefull build workers

Your workers have a chance to reuse the local, docker-managed cache between builds, and even across builds for different apps, but not with the host system.

 1#syntax=docker/dockerfile:1.2
 2FROM golang:rc-alpine AS build
 3WORKDIR /workspace
 4COPY . .
 5RUN --mount=type=cache,id=gomod,target=/go/pkg/mod \
 6    --mount=type=cache,id=gobuild,target=/root/.cache/go-build \
 7    go build -o app .
 8
 9FROM scratch
10COPY --from=build /workspace/app /app
11ENTRYPOINT ["/app"]

You could still use --cache-to/from if your image is more complex and you'd like to reuse layers

1docker buildx \
2  --tag your.registry/image:tag \
3  --push .

caching

The fun of trying to munge together the:

dependencies

If you use any sane language package manager, somewhere on disk will be a global cache for your dependencies. This should be reproducible, ie given the same package list, the resulting cache should always be the same.

If you use java/maven, despair as your developers don't properly declare all their deps and dynamically add them at test time.

build

Some tools like go have a global build cache with proper keys. Others just use the local directory and some time checks to guess if something is out of date. The first could probably be shared across machines, the second probably can't.

docker

docker caches by layer, keyed by the previous layer and the context copied in so far. Since dependencies change relatively infrequently compared to source code, a good strategy is to copy the dependency manifest and download them, allowing these layers to be cached. However, since layers are cached as a whole, any change in dependency invalidates the entire layer. Also, since the dependency manifest is part of the cache key, you're unlikely to be able to share layer caches across applications (unless you purposefully construct a dedicated fat caching layer).

--cache-from and --cache-to support several outputs, but the interesting ones are to a local directory (so it can be cached by your ci system) and to a remote registry. The important flag is mode=max for --cache-to to include all layers in multistage builds. Which is faster and whether you want to fill a registry with cache images is up to you, though not all registries support cache manifests, ex: gcr

The other fun feature is RUN --mount=type=cache,target=/some/path your command. This creates a directory that's excluded from the image and can be reused across invocations, making it a good place to put the dependency/build cache, assuming your machines build multiple images. Being able to control where this is on the host (for ci systems to save/restore) is still an open issue

ci

There are 2 main flavours of CI: stateful and stateless. Stateful systems persist between builds, giving you the ability to share state (eg cache directories) between builds on the same machine. Stateless systems give you a clean environment every time, good for ensuring no external factors affect your build, bad if you trust your build/dependency cache. Either way, they will sometimes provide you with tools to save/restore from persistent caches shared by all workers.