YAML engineer reporting in.
note: yaml is long and repetitive,
I'm still not sure if I'm happy I introduced yaml anchors
to my team. tldr, the 2 docs below are equivalent, anchors do not carry across documents (---
):
1name: &name foo
2somewhere:
3 else:
4 x: *name
5---
6name: foo
7somewhere:
8 else:
9 x: foo
Every object has them: names, labels, annotations. They even have a recommended set of labels
1metadata:
2 name: foo
3 annotations:
4 # stick values that you don't want to filter by here,
5 # such as info for other apps that read service definitions
6 # or as a place to store data to make your controller stateless
7 labels:
8 # sort of duplicates metadata.name
9 app.kubernetes.io/name: foo
10
11 # separate multiple instances, not really necessary if you do app-per-namespace
12 app.kubernetes.io/instance: default
13
14 # you might not want to add this on everything (eg namespaces, security stuff)
15 # since with least privilege you can't change them
16 # and they don't really change that often(?)
17 app.kubernetes.io/version: "1.2.3"
18
19 # the hardest part is probably getting it to not say "helm" when you don't actually use helm
20 app.kubernetes.io/managed-by: helm
21
22 # these two aren't really necessary for single deployment apps
23 #
24 # the general purpose of "name", eg name=envoy component=proxy
25 app.kubernetes.io/component: server
26 # what the entire this is
27 app.kubernetes.io/part-of: website
The hardest part about namespaces is your namespace allocation policy, do you:
default
?)Hierarchical Namespaces might help a bit, making the latter ones more tenable but still, things to think about.
Currently I'm in the "each app their own namespace" camp, and live with the double names in service addresses
1apiVersion: v1
2kind: Namespace
3metadata:
4 name: foo
The least common denominator of L4/L7 routing...
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: foo
spec:
# for if you run multiple ingress controllers
ingressClassName: default
rules:
# DNS style wildcards only
- host: "*.example.com"
http:
paths:
- path: /
pathType: Prefix # or Exact, prefix uses path segment matching
backend:
service:
name: foo
port:
name: http
# number: 80
tls:
secretName: foo-tls
hosts:
- "*.example.com"
1apiVersion: v1
2kind: Service
3metadata:
4 name: foo
5spec:
6 # change as needed
7 type: ClusterIP
8
9 # only for type LoadBalancer
10 externalTrafficPolicy: Local
11
12 # for statefulsets that need peer discovery,
13 # eg. etcd or cockroachdb
14 publishNotReadyAddresses: true
15
16 ports:
17 - appProtocol: opentelemetry
18 name: otlp
19 port: 4317
20 protocol: TCP
21 targetPort: otlp # name or number, defaults to port
22
23 selector:
24 # these 2 should be enough to uniquely identify apps,
25 # note this value cannot change once created
26 app.kubernetes.io/name: foo
27 app.kubernetes.io/instance: default
note: while it does have a spec.secrets
field, it currently doesn't really do anything useful.
1apiVersion: v1
2kind: ServiceAccount
3metadata:
4 name: foo
5 annotations:
6 # workload identity for attaching to GCP service accounts in GKE
7 iam.gke.io/gcp-service-account: GSA_NAME@PROJECT_ID.iam.gserviceaccount.com
Use only if you app is truly stateless:
no PersistentVolumeClaims unless it's ReadOnlyMany
,
even then PVCs still restrict the nodes you can run on.
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: foo
5spec:
6 # don't set if you plan on autoscaling
7 replicas: 1
8
9 # stop cluttering kubectl get all with old replicasets,
10 # your gitops tooling should let you roll back
11 revisionHistoryLimit: 3
12
13 selector:
14 matchLabels:
15 # these 2 should be enough to uniquely identify apps,
16 # note this value cannot change once created
17 app.kubernetes.io/name: foo
18 app.kubernetes.io/instance: default
19
20 # annoyingly named differently from StatefulSet or DaemonSet
21 strategy:
22 # prefer maxSurge to keep availability during upgrades / migrations
23 rollingUpdate:
24 maxSurge: 25% # rounds up
25 maxUnavailable: 0
26
27 # Recreate if you want blue-green style
28 # or if you're stuck with a PVC
29 type: RollingUpdate
30
31 template: # see pod below
If your app has any use for persistent data, use this, even if you only have a single instance. Also gives you nice DNS names per pod.
1apiVersion: apps/v1
2kind: StatefulSet
3metadata:
4 name: foo
5spec:
6 # or Parallel for all at once
7 podManagementPolicy: OrderedReady
8 replicas: 3
9
10 # stop cluttering kubectl get all with old replicasets,
11 # your gitops tooling should let you roll back
12 revisionHistoryLimit: 3
13
14 selector:
15 matchLabels:
16 # these 2 should be enough to uniquely identify apps,
17 # note this value cannot change once created
18 app.kubernetes.io/name: foo
19 app.kubernetes.io/instance: default
20
21 # even though they say it must exist, it doesn't have to
22 # (but you lose per pod DNS)
23 serviceName: foo
24
25 template: # see pod below
26
27 updateStrategy:
28 rollingUpdate: # this should only be used by tooling
29 type: RollingUpdate
30
31 volumeClaimTemplates: # see pvc below
1apiVersion: apps/v1
2kind: DaemonSet
3metadata:
4 name: foo
5spec:
6 # stop cluttering kubectl get all with old replicasets,
7 # your gitops tooling should let you roll back
8 revisionHistoryLimit: 3
9
10 selector:
11 matchLabels:
12 # these 2 should be enough to uniquely identify apps,
13 # note this value cannot change once created
14 app.kubernetes.io/name: foo
15 app.kubernetes.io/instance: default
16
17 template: # see pod below
18
19 updateStrategy:
20 rollingUpdate:
21 # make it faster for large clusters
22 maxUnavailable: 30%
23 type: RollingUpdate
1apiVersion: v1
2kind: Pod
3metadata:
4 name: foo
5spec:
6 containers:
7 - name: foo
8 args:
9 - -flag1=v1
10 - -flag2=v2
11 envFrom
12 - configMapRef:
13 name: foo-env
14 optional: true
15 prefix: APP_
16 image: docker.example.com/app:v1
17 imagePullPolicy: IfNotPresent
18
19 ports:
20 - containerPort: 4317
21 name: otlp
22 protocol: TCP
23
24 # do extra stuff
25 lifecycle:
26 postStart:
27 preStop:
28
29 startupProbe: # allow a longer startup
30 livenessProbe: # stay alive to not get killed
31 readinessProbe: # stay alive to route traffic
32
33 securityContext:
34 allowPrivilegeEscalation: false
35 capabilities:
36 add:
37 - CAP_NET_ADMIN
38 privileged: false
39 readOnlyRootFilesystem: true
40
41 resources:
42 # ideally set after running some time and profiling actual usage,
43 # prefer to start high and rachet down
44 requests:
45 cpu: 500m
46 memory: 128Mi
47 limits:
48 cpu: 1500m
49 memory: 512Mi
50
51 volumeMounts: # as needed
52
53 # don't inject env with service addresses/ports
54 # not many things use them, they clutter up the env
55 # and may be a performance hit with large number of services
56 enableServiceLinks: false
57
58 # do create PriorityClasses and every pod one,
59 # helps with deciding which pods to kill first
60 priorityClassName: critical
61
62 securityContext:
63 fsGroup: 65535
64 runAsGroup: 65535
65 runAsNonRoot: true
66 runAsUser: 65535 # may conflict with container setting and need for $HOME
67
68 serviceAccountName: foo
69
70 terminationGracePeriodSeconds: 30
71
72 volumes: # set as needed
theres is some overlap in managing pod scheduling, especially around where they run:
affinity
: these only let you select to either run 0 or unlimited pods per selectoraffinity.nodeAffinity
: general purpose choose a nodeaffinity.podAffinity
: general purpose choose to schedule next to thingsaffinity.podAntiAffinity
: general purpose choose not to schedule next to thingsnodeSelector
: shorthand for choosing nodes with labelstolerations
: allow scheduling on nodes with taintstopologySpreadConstraints
: choose how many to schedule in a single topology domain 1apiVersion: v1
2kind: Pod
3metadata:
4 name: foo
5spec:
6 affinity:
7 nodeAffinity:
8 requiredDuringSchedulingIgnoredDuringExecution:
9 nodeSelectorTerms: # OR
10 # has to be pool-0
11 - matchExpressions: # AND
12 - key: cloud.google.com/gke-nodepool
13 operator: In
14 values:
15 - pool-0
16 preferredDuringSchedulingIgnoredDuringExecution
17 # prefer zone us-central1-a
18 - weight: 25
19 preference:
20 - matchExpressions: # AND
21 - key: topology.kubernetes.io/zone
22 operator: In
23 values:
24 - us-central1-a
25
26 podAffinity:
27 preferredDuringSchedulingIgnoredDuringExecution:
28 # prefer to be on the same node as a bar
29 - weight: 25
30 podAffinityTerm:
31 labelSelector:
32 matchLabels:
33 app.kubernetes.io/name: bar
34 app.kubernetes.io/instance: default
35 topologyKey: kubernetes.io/hostname
36
37 podAntiAffinity:
38 requiredDuringSchedulingIgnoredDuringExecution: # AND
39 # never schedule in the same region as buzz
40 - labelSelector:
41 matchLabels:
42 app.kubernetes.io/name: buzz
43 app.kubernetes.io/instance: default
44 topologyKey: topology.kubernetes.io/region
45
46
47 topologySpreadConstraints: # AND
48 # limit to 1 instance per node
49 - maxSkew: 1
50 labelSelector:
51 matchLabels:
52 app.kubernetes.io/name: foo
53 app.kubernetes.io/instance: default
54 topologyKey: kubernetes.io/hostname
55 whenUnsatisfiable: DoNotSchedule # or ScheduleAnyway
args
: Docker entrypoint + container argscommand
: container commandcommand
and args
: container command + container args 1apiVersion: v1
2kind: PersistentVolumeClaim
3metadata:
4 name: foo
5spec:
6 accessModes: ReadWriteOnce # ReadOnlyMany or ReadWriteMany (rare)
7
8 dataSource: # prepopulate with data from a VolumeSnapshot or PersistentVolumeClaim
9
10 resources:
11 requests:
12 storage: 10Gi
13
14 # bind to existing PV
15 selector: matchLabels
16
17 storageClassName: ssd
18
19 volumeMode: Filesystem # or Block
1apiVersion: autoscaling/v2beta2
2kind: HorizontalPodAutoscaler
3metadata:
4 name: foo
5spec:
6 behavior: # fine tune when to scale up / down
7
8 maxReplicas: 5
9 minReplicas: 1
10
11 metrics:
12 - # TODO
13
14 scaleTargetRef:
15 apiVersion: apps/v1
16 kind: Deployment
17 name: foo
1apiVersion: policy/v1beta1
2kind: PodDisruptionBudget
3metadata:
4 name: foo
5spec:
6 # when you have a low number of replicas
7 # ensure you can disrupt them
8 maxUnavailable: 1
9
10 # allows for more disruptions
11 minAvailable: 75%
12
13 selector:
14 matchLabels:
15 app.kubernetes.io/name: foo
16 app.kubernetes.io/instance: default