docker BuildKit
is getting some interesting new features
setting it apart from the other container building tools.
Unfortunately, it means a new cli subcommand that's not entirely backwards compatible,
say hello to docker buildx
You get a fresh build environment every time, use registry caching.
1#syntax=docker/dockerfile:1.2
2FROM golang:rc-alpine AS build
3WORKDIR /workspace
4COPY go.mod go.sum .
5RUN go mod download
6COPY . .
7RUN go build -o app .
8
9FROM scratch
10COPY --from=build /workspace/app /app
11ENTRYPOINT ["/app"]
Probably need to spend some time cleaning up the registry every once in a while. You could make life easier by using a separate registry for caching and nuking it every 2 weeks...
1docker buildx \
2 --cache-from type=registry,ref=your.registry/image \
3 --cache-to type=registry,ref=your.registry/image,mode=max \
4 --tag your.registry/image:tag \
5 --push .
like the above, but use an external system to save/restore the cache
1# use some tool to restore a cache from previous builds
2cache-restore /tmp/docker-cache
3docker buildx \
4 --cache-from type=local,dest=/tmp/docker-cache \
5 --cache-to type=local,dest=/tmp/docker-cache,mode=max \
6 --tag your.registry/image:tag \
7 --push .
8# use some tool to save the cache for future use
9cache-save /tmp/docker-cache
Your workers have a chance to reuse the local, docker-managed cache between builds, and even across builds for different apps, but not with the host system.
1#syntax=docker/dockerfile:1.2
2FROM golang:rc-alpine AS build
3WORKDIR /workspace
4COPY . .
5RUN --mount=type=cache,id=gomod,target=/go/pkg/mod \
6 --mount=type=cache,id=gobuild,target=/root/.cache/go-build \
7 go build -o app .
8
9FROM scratch
10COPY --from=build /workspace/app /app
11ENTRYPOINT ["/app"]
You could still use --cache-to/from
if your image is more complex
and you'd like to reuse layers
1docker buildx \
2 --tag your.registry/image:tag \
3 --push .
The fun of trying to munge together the:
If you use any sane language package manager, somewhere on disk will be a global cache for your dependencies. This should be reproducible, ie given the same package list, the resulting cache should always be the same.
If you use java/maven, despair as your developers don't properly declare all their deps and dynamically add them at test time.
Some tools like go have a global build cache with proper keys. Others just use the local directory and some time checks to guess if something is out of date. The first could probably be shared across machines, the second probably can't.
docker caches by layer, keyed by the previous layer and the context copied in so far. Since dependencies change relatively infrequently compared to source code, a good strategy is to copy the dependency manifest and download them, allowing these layers to be cached. However, since layers are cached as a whole, any change in dependency invalidates the entire layer. Also, since the dependency manifest is part of the cache key, you're unlikely to be able to share layer caches across applications (unless you purposefully construct a dedicated fat caching layer).
--cache-from
and --cache-to
support several outputs,
but the interesting ones are to a local directory
(so it can be cached by your ci system)
and to a remote registry.
The important flag is mode=max
for --cache-to
to include all layers in multistage builds.
Which is faster and whether you want to fill a registry with cache images
is up to you, though not all registries support cache manifests,
ex: gcr
The other fun feature is RUN --mount=type=cache,target=/some/path your command
.
This creates a directory that's excluded from the image
and can be reused across invocations, making it a good place to put the
dependency/build cache, assuming your machines build multiple images.
Being able to control where this is on the host
(for ci systems to save/restore)
is still an open issue
There are 2 main flavours of CI: stateful and stateless. Stateful systems persist between builds, giving you the ability to share state (eg cache directories) between builds on the same machine. Stateless systems give you a clean environment every time, good for ensuring no external factors affect your build, bad if you trust your build/dependency cache. Either way, they will sometimes provide you with tools to save/restore from persistent caches shared by all workers.