blog

SEAN K.H. LIAO

non root

By default, containers are root in their own little sandbox, which is nice from a don't-need-to-think-about-permissions perspective, but a bit less nice if you're concerned about privilege escalation.

So you have the ability to run as not-root, either by setting USER xxx in a Dockerfile or at runtime. But sometimes you need to do special things, like binding to port 80. Linux gives us capabilities, granting little slices of elevated permissions.

docker / kubernetes

Unfortunately for us, it doesn't work just by granting the capability to everything in the container. So you actually need to use setcap to set the permissions you want on the binary (sometimes necessitating a custom image build) before the capabilities will work.

Dockerfile for a simple Go http server

FROM golang:1.16-alpine AS build
WORKDIR /workspace

# static binary for scratch container
ENV CGO_ENABLED=0

# get the setcap command
RUN apk add --update --no-cache libcap

# main.go:
#     package main
#
#     import (
#             "log"
#             "net/http"
#     )
#
#     func main() {
#             log.Fatal(http.ListenAndServe(":80", http.HandlerFunc(func(rw http.ResponseWriter, r *http.Request) {
#                     rw.Write([]byte("ok"))
#             })))
#     }
COPY go.mod main.go .
RUN go build -o /bin/app .

# set net_bind both [e]ffective and [p]ermitted
RUN setcap cap_net_bind_service+ep /bin/app


FROM scratch
COPY --from=build /bin/app /bin/app
ENTRYPOINT ["/bin/app"]

k8s Pod (can also be combined with PodSecurityPolicy)

apiVersion: v1
kind: Pod
metadata:
  name: http
spec:
  # run as not root
  securityContext:
    runAsGroup: 65535
    runAsNonRoot: true
    runAsUser: 65535
  containers:
    - name: http
      image: http:latest
      imagePullPolicy: IfNotPresent
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          # set permitted privileges
          add:
            - NET_BIND_SERVICE
          # default drop all
          drop:
            - all
        privileged: false
        readOnlyRootFilesystem: true

bonus

test with KinD

docker build -t http .
kind create cluster
kind load docker-image http
kubectl apply -f pod.yaml