k8s clustered apps

starting clustered applications in k8s


So you want to run a clustered thing in k8s? Likely a database, using raft or similar.

Use a statefulset: you get your own persistentvolume per pod, and you get your own stable, addressable hostname. This can be retrieved either as the HOSTNAME env var (possibly unstable?), or set by custom env var with fieldref.

Use cert-manager: who wants to futz around with csrs

use publishNotReadyAddresses: true on a headless service to get name resolution before pods are ready, pods need to see each other before they are ready.


People on the internet only talk about running etcd outside the cluster for k8s...

Assumes a CA is available and called internal-ca.

Just works (I think)

  1apiVersion: cert-manager.io/v1
  2kind: Certificate
  4  name: etcd-certs
  6  secretName: etcd-certs
  7  duration: 2160h
  8  renewBefore: 360h
  9  dnsNames:
 10    - "localhost"
 11    - "etcd"
 12    - "etcd.default"
 13    - "etcd.default.svc"
 14    - "etcd.default.svc.cluster.local"
 15    - "*.etcd-headless"
 16    - "*.etcd-headless.default"
 17    - "*.etcd-headless.default.svc"
 18    - "*.etcd-headless.default.svc.cluster.local"
 19  ipAddresses:
 20    - ""
 21    - "::1"
 22  issuerRef:
 23    name: internal-ca
 24    kind: ClusterIssuer
 26apiVersion: v1
 27kind: Secret
 29  name: etcd
 30  labels:
 31    app.kubernetes.io/name: etcd
 32type: Opaque
 34  etcd-root-password: "eDgzelB1aVlsUQ=="
 36apiVersion: v1
 37kind: Service
 39  name: etcd-headless
 40  labels:
 41    app.kubernetes.io/name: etcd
 43  type: ClusterIP
 44  clusterIP: None
 45  publishNotReadyAddresses: true
 46  ports:
 47    - name: client
 48      port: 2379
 49      targetPort: client
 50    - name: peer
 51      port: 2380
 52      targetPort: peer
 53  selector:
 54    app.kubernetes.io/name: etcd
 56apiVersion: v1
 57kind: Service
 59  name: etcd
 60  labels:
 61    app.kubernetes.io/name: etcd
 63  type: ClusterIP
 64  ports:
 65    - name: client
 66      port: 2379
 67      targetPort: client
 68    - name: peer
 69      port: 2380
 70      targetPort: peer
 71  selector:
 72    app.kubernetes.io/name: etcd
 74apiVersion: apps/v1
 75kind: StatefulSet
 77  name: etcd
 78  labels:
 79    app.kubernetes.io/name: etcd
 81  selector:
 82    matchLabels:
 83      app.kubernetes.io/name: etcd
 84  serviceName: etcd-headless
 85  podManagementPolicy: Parallel
 86  replicas: 3
 87  updateStrategy:
 88    type: RollingUpdate
 89  template:
 90    metadata:
 91      labels:
 92        app.kubernetes.io/name: etcd
 93    spec:
 94      securityContext:
 95        fsGroup: 1001
 96        runAsUser: 1001
 97      containers:
 98        - name: etcd
 99          image: docker.io/bitnami/etcd:3.4.13-debian-10-r22
100          imagePullPolicy: "IfNotPresent"
101          command:
102            - etcd
103          env:
104            - name: POD_NAME
105              valueFrom:
106                fieldRef:
107                  fieldPath: metadata.name
108            - name: ETCDCTL_API
109              value: "3"
110            - name: ETCD_NAME
111              value: "$(POD_NAME)"
112            - name: ETCD_DATA_DIR
113              value: /bitnami/etcd/data
114            - name: ETCD_ADVERTISE_CLIENT_URLS
115              value: "https://$(POD_NAME).etcd-headless.default.svc.cluster.local:2379"
116            - name: ETCD_LISTEN_CLIENT_URLS
117              value: ""
119              value: "https://$(POD_NAME).etcd-headless.default.svc.cluster.local:2380"
120            - name: ETCD_LISTEN_PEER_URLS
121              value: ""
122            - name: ALLOW_NONE_AUTHENTICATION
123              value: "yes"
124            - name: ETCD_ROOT_PASSWORD
125              valueFrom:
126                secretKeyRef:
127                  name: etcd
128                  key: etcd-root-password
129            - name: ETCD_INITIAL_CLUSTER
130              value: "etcd-0=https://etcd-0.etcd-headless.default.svc.cluster.local:2380,etcd-1=https://etcd-1.etcd-headless.default.svc.cluster.local:2380,etcd-2=https://etcd-2.etcd-headless.default.svc.cluster.local:2380"
131            - name: ETCD_INITIAL_CLUSTER_STATE
132              value: new
133            - name: ETCD_CLIENT_CERT_AUTH
134              value: "true"
135            - name: ETCD_TRUSTED_CA_FILE
136              value: /var/secret/tls/ca.crt
137            - name: ETCD_CERT_FILE
138              value: /var/secret/tls/tls.crt
139            - name: ETCD_KEY_FILE
140              value: /var/secret/tls/tls.key
141            - name: ETCD_PEER_CLIENT_CERT_AUTH
142              value: "true"
143            - name: ETCD_PEER_TRUSTED_CA_FILE
144              value: /var/secret/tls/ca.crt
145            - name: ETCD_PEER_CERT_FILE
146              value: /var/secret/tls/tls.crt
147            - name: ETCD_PEER_KEY_FILE
148              value: /var/secret/tls/tls.key
149          ports:
150            - name: client
151              containerPort: 2379
152            - name: peer
153              containerPort: 2380
154            - name: metrics
155              containerPort: 2381
156          livenessProbe:
157            httpGet:
158              path: /health
159              port: 2381
160          readinessProbe:
161            httpGet:
162              path: /health
163              port: 2381
164          volumeMounts:
165            - name: certs
166              mountPath: /var/secret/tls
167            - name: data
168              mountPath: /bitnami/etcd
169      volumes:
170        - name: certs
171          secret:
172            secretName: etcd-certs
173        - name: data
174          emptyDir: {}


Notes: will complain if certs have wider perms than rwx------, which will cause issues if running as non-root in k8s (uses fsGroups to keep volume owner as root). Slightly modified from official manifests (change service names, certs, data dir for kind). Assumes a CA is available and called internal-ca.

Nodes need a manual action to join, could be a Job but need to time it right (after cert signing, nodes started):

1kubectl exec -it cockroachdb-0 -- /cockroach/cockroach init --certs-dir=/cockroach/cockroach-certs


  1apiVersion: cert-manager.io/v1
  2kind: Certificate
  4  name: cockroachdb-node
  6  secretName: cockroachdb-node
  7  duration: 2160h
  8  renewBefore: 360h
  9  dnsNames:
 10    - "node"
 11    - "localhost"
 12    - "cockroachdb"
 13    - "cockroachdb.default"
 14    - "cockroachdb.default.svc"
 15    - "cockroachdb.default.svc.cluster.local"
 16    - "*.cockroachdb-headless"
 17    - "*.cockroachdb-headless.default"
 18    - "*.cockroachdb-headless.default.svc"
 19    - "*.cockroachdb-headless.default.svc.cluster.local"
 20  ipAddresses:
 21    - ""
 22    - "::1"
 23  issuerRef:
 24    name: internal-ca
 25    kind: ClusterIssuer
 27apiVersion: cert-manager.io/v1
 28kind: Certificate
 30  name: cockroachdb-client-root
 32  secretName: cockroachdb-client-root
 33  duration: 2160h
 34  renewBefore: 360h
 35  commonName: root
 36  usages:
 37    - client auth
 38  issuerRef:
 39    name: internal-ca
 40    kind: ClusterIssuer
 42apiVersion: v1
 43kind: ServiceAccount
 45  name: cockroachdb
 46  labels:
 47    app: cockroachdb
 49apiVersion: rbac.authorization.k8s.io/v1beta1
 50kind: Role
 52  name: cockroachdb
 53  labels:
 54    app: cockroachdb
 56  - apiGroups:
 57      - ""
 58    resources:
 59      - secrets
 60    verbs:
 61      - get
 63apiVersion: rbac.authorization.k8s.io/v1beta1
 64kind: RoleBinding
 66  name: cockroachdb
 67  labels:
 68    app: cockroachdb
 70  apiGroup: rbac.authorization.k8s.io
 71  kind: Role
 72  name: cockroachdb
 74  - kind: ServiceAccount
 75    name: cockroachdb
 76    namespace: default
 78apiVersion: v1
 79kind: Service
 81  name: cockroachdb
 82  labels:
 83    app: cockroachdb
 85  ports:
 86    - port: 26257
 87      targetPort: 26257
 88      name: grpc
 89    - port: 8080
 90      targetPort: 8080
 91      name: http
 92  selector:
 93    app: cockroachdb
 95apiVersion: v1
 96kind: Service
 98  name: cockroachdb-headless
 99  labels:
100    app: cockroachdb
101  annotations:
102    prometheus.io/scrape: "true"
103    prometheus.io/path: "_status/vars"
104    prometheus.io/port: "8080"
106  ports:
107    - port: 26257
108      targetPort: 26257
109      name: grpc
110    - port: 8080
111      targetPort: 8080
112      name: http
113  publishNotReadyAddresses: true
114  clusterIP: None
115  selector:
116    app: cockroachdb
118apiVersion: apps/v1
119kind: StatefulSet
121  name: cockroachdb
123  serviceName: "cockroachdb-headless"
124  replicas: 3
125  selector:
126    matchLabels:
127      app: cockroachdb
128  template:
129    metadata:
130      labels:
131        app: cockroachdb
132    spec:
133      serviceAccountName: cockroachdb
134      affinity:
135        podAntiAffinity:
136          preferredDuringSchedulingIgnoredDuringExecution:
137            - weight: 100
138              podAffinityTerm:
139                labelSelector:
140                  matchExpressions:
141                    - key: app
142                      operator: In
143                      values:
144                        - cockroachdb
145                topologyKey: kubernetes.io/hostname
146      containers:
147        - name: cockroachdb
148          image: cockroachdb/cockroach:v20.1.5
149          imagePullPolicy: IfNotPresent
150          ports:
151            - containerPort: 26257
152              name: grpc
153            - containerPort: 8080
154              name: http
155          livenessProbe:
156            httpGet:
157              path: "/health"
158              port: http
159              scheme: HTTPS
160            initialDelaySeconds: 30
161            periodSeconds: 5
162          readinessProbe:
163            httpGet:
164              path: "/health?ready=1"
165              port: http
166              scheme: HTTPS
167            initialDelaySeconds: 10
168            periodSeconds: 5
169            failureThreshold: 2
170          volumeMounts:
171            - name: datadir
172              mountPath: /cockroach/cockroach-data
173            - name: certs
174              mountPath: /cockroach/cockroach-certs/ca.crt
175              subPath: ca.crt
176            - name: certs
177              mountPath: /cockroach/cockroach-certs/node.crt
178              subPath: tls.crt
179            - name: certs
180              mountPath: /cockroach/cockroach-certs/node.key
181              subPath: tls.key
182            - name: client
183              mountPath: /cockroach/cockroach-certs/client.root.crt
184              subPath: tls.crt
185            - name: client
186              mountPath: /cockroach/cockroach-certs/client.root.key
187              subPath: tls.key
188          env:
189            - name: COCKROACH_CHANNEL
190              value: kubernetes-secure
191            - name: POD_NAME
192              valueFrom:
193                fieldRef:
194                  fieldPath: metadata.name
195            - name: GOMAXPROCS
196              valueFrom:
197                resourceFieldRef:
198                  resource: limits.cpu
199                  divisor: "1"
200            - name: MEMORY_LIMIT_MIB
201              valueFrom:
202                resourceFieldRef:
203                  resource: limits.memory
204                  divisor: "1Mi"
205          command:
206            - /bin/sh
207            - -exc
208            - >
209              /cockroach/cockroach
210              start
211              --logtostderr=WARNING
212              --certs-dir=/cockroach/cockroach-certs
213              --advertise-host=$(POD_NAME).cockroachdb-headless.default
214              --http-addr=
215              --join=cockroachdb-0.cockroachdb-headless.default,cockroachdb-1.cockroachdb-headless.default,cockroachdb-2.cockroachdb-headless.default              
216      terminationGracePeriodSeconds: 60
217      volumes:
218        - name: datadir
219          emptyDir: {}
220        - name: certs
221          secret:
222            secretName: cockroachdb-node
223            defaultMode: 256
224        - name: client
225          secret:
226            secretName: cockroachdb-client-root
227            defaultMode: 256
228  podManagementPolicy: Parallel
229  updateStrategy:
230    type: RollingUpdate