YAML engineer reporting in.
note: yaml is long and repetitive,
I'm still not sure if I'm happy I introduced yaml anchors
to my team. tldr, the 2 docs below are equivalent, anchors do not carry across documents (---
):
name: &name foo
somewhere:
else:
x: *name
---
name: foo
somewhere:
else:
x: foo
Every object has them: names, labels, annotations. They even have a recommended set of labels
metadata:
name: foo
annotations:
# stick values that you don't want to filter by here,
# such as info for other apps that read service definitions
# or as a place to store data to make your controller stateless
labels:
# sort of duplicates metadata.name
app.kubernetes.io/name: foo
# separate multiple instances, not really necessary if you do app-per-namespace
app.kubernetes.io/instance: default
# you might not want to add this on everything (eg namespaces, security stuff)
# since with least privilege you can't change them
# and they don't really change that often(?)
app.kubernetes.io/version: "1.2.3"
# the hardest part is probably getting it to not say "helm" when you don't actually use helm
app.kubernetes.io/managed-by: helm
# these two aren't really necessary for single deployment apps
#
# the general purpose of "name", eg name=envoy component=proxy
app.kubernetes.io/component: server
# what the entire this is
app.kubernetes.io/part-of: website
The hardest part about namespaces is your namespace allocation policy, do you:
default
?)Hierarchical Namespaces might help a bit, making the latter ones more tenable but still, things to think about.
Currently I'm in the "each app their own namespace" camp, and live with the double names in service addresses
apiVersion: v1
kind: Namespace
metadata:
name: foo
The least common denominator of L4/L7 routing...
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: foo
spec:
# for if you run multiple ingress controllers
ingressClassName: default
rules:
# DNS style wildcards only
- host: "*.example.com"
http:
paths:
- path: /
pathType: Prefix # or Exact, prefix uses path segment matching
backend:
service:
name: foo
port:
name: http
# number: 80
tls:
secretName: foo-tls
hosts:
- "*.example.com"
apiVersion: v1
kind: Service
metadata:
name: foo
spec:
# change as needed
type: ClusterIP
# only for type LoadBalancer
externalTrafficPolicy: Local
# for statefulsets that need peer discovery,
# eg. etcd or cockroachdb
publishNotReadyAddresses: true
ports:
- appProtocol: opentelemetry
name: otlp
port: 4317
protocol: TCP
targetPort: otlp # name or number, defaults to port
selector:
# these 2 should be enough to uniquely identify apps,
# note this value cannot change once created
app.kubernetes.io/name: foo
app.kubernetes.io/instance: default
note: while it does have a spec.secrets
field, it currently doesn't really do anything useful.
apiVersion: v1
kind: ServiceAccount
metadata:
name: foo
annotations:
# workload identity for attaching to GCP service accounts in GKE
iam.gke.io/gcp-service-account: GSA_NAME@PROJECT_ID.iam.gserviceaccount.com
Use only if you app is truly stateless:
no PersistentVolumeClaims unless it's ReadOnlyMany
,
even then PVCs still restrict the nodes you can run on.
apiVersion: apps/v1
kind: Deployment
metadata:
name: foo
spec:
# don't set if you plan on autoscaling
replicas: 1
# stop cluttering kubectl get all with old replicasets,
# your gitops tooling should let you roll back
revisionHistoryLimit: 3
selector:
matchLabels:
# these 2 should be enough to uniquely identify apps,
# note this value cannot change once created
app.kubernetes.io/name: foo
app.kubernetes.io/instance: default
# annoyingly named differently from StatefulSet or DaemonSet
strategy:
# prefer maxSurge to keep availability during upgrades / migrations
rollingUpdate:
maxSurge: 25% # rounds up
maxUnavailable: 0
# Recreate if you want blue-green style
# or if you're stuck with a PVC
type: RollingUpdate
template: # see pod below
If your app has any use for persistent data, use this, even if you only have a single instance. Also gives you nice DNS names per pod.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: foo
spec:
# or Parallel for all at once
podManagementPolicy: OrderedReady
replicas: 3
# stop cluttering kubectl get all with old replicasets,
# your gitops tooling should let you roll back
revisionHistoryLimit: 3
selector:
matchLabels:
# these 2 should be enough to uniquely identify apps,
# note this value cannot change once created
app.kubernetes.io/name: foo
app.kubernetes.io/instance: default
# even though they say it must exist, it doesn't have to
# (but you lose per pod DNS)
serviceName: foo
template: # see pod below
updateStrategy:
rollingUpdate: # this should only be used by tooling
type: RollingUpdate
volumeClaimTemplates: # see pvc below
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: foo
spec:
# stop cluttering kubectl get all with old replicasets,
# your gitops tooling should let you roll back
revisionHistoryLimit: 3
selector:
matchLabels:
# these 2 should be enough to uniquely identify apps,
# note this value cannot change once created
app.kubernetes.io/name: foo
app.kubernetes.io/instance: default
template: # see pod below
updateStrategy:
rollingUpdate:
# make it faster for large clusters
maxUnavailable: 30%
type: RollingUpdate
apiVersion: v1
kind: Pod
metadata:
name: foo
spec:
containers:
- name: foo
args:
- -flag1=v1
- -flag2=v2
envFrom
- configMapRef:
name: foo-env
optional: true
prefix: APP_
image: docker.example.com/app:v1
imagePullPolicy: IfNotPresent
ports:
- containerPort: 4317
name: otlp
protocol: TCP
# do extra stuff
lifecycle:
postStart:
preStop:
startupProbe: # allow a longer startup
livenessProbe: # stay alive to not get killed
readinessProbe: # stay alive to route traffic
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- CAP_NET_ADMIN
privileged: false
readOnlyRootFilesystem: true
resources:
# ideally set after running some time and profiling actual usage,
# prefer to start high and rachet down
requests:
cpu: 500m
memory: 128Mi
limits:
cpu: 1500m
memory: 512Mi
volumeMounts: # as needed
# don't inject env with service addresses/ports
# not many things use them, they clutter up the env
# and may be a performance hit with large number of services
enableServiceLinks: false
# do create PriorityClasses and every pod one,
# helps with deciding which pods to kill first
priorityClassName: critical
securityContext:
fsGroup: 65535
runAsGroup: 65535
runAsNonRoot: true
runAsUser: 65535 # may conflict with container setting and need for $HOME
serviceAccountName: foo
terminationGracePeriodSeconds: 30
volumes: # set as needed
theres is some overlap in managing pod scheduling, especially around where they run:
affinity
: these only let you select to either run 0 or unlimited pods per selectoraffinity.nodeAffinity
: general purpose choose a nodeaffinity.podAffinity
: general purpose choose to schedule next to thingsaffinity.podAntiAffinity
: general purpose choose not to schedule next to thingsnodeSelector
: shorthand for choosing nodes with labelstolerations
: allow scheduling on nodes with taintstopologySpreadConstraints
: choose how many to schedule in a single topology domainapiVersion: v1
kind: Pod
metadata:
name: foo
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms: # OR
# has to be pool-0
- matchExpressions: # AND
- key: cloud.google.com/gke-nodepool
operator: In
values:
- pool-0
preferredDuringSchedulingIgnoredDuringExecution
# prefer zone us-central1-a
- weight: 25
preference:
- matchExpressions: # AND
- key: topology.kubernetes.io/zone
operator: In
values:
- us-central1-a
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
# prefer to be on the same node as a bar
- weight: 25
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/name: bar
app.kubernetes.io/instance: default
topologyKey: kubernetes.io/hostname
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution: # AND
# never schedule in the same region as buzz
- labelSelector:
matchLabels:
app.kubernetes.io/name: buzz
app.kubernetes.io/instance: default
topologyKey: topology.kubernetes.io/region
topologySpreadConstraints: # AND
# limit to 1 instance per node
- maxSkew: 1
labelSelector:
matchLabels:
app.kubernetes.io/name: foo
app.kubernetes.io/instance: default
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule # or ScheduleAnyway
args
: Docker entrypoint + container argscommand
: container commandcommand
and args
: container command + container argsapiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: foo
spec:
accessModes: ReadWriteOnce # ReadOnlyMany or ReadWriteMany (rare)
dataSource: # prepopulate with data from a VolumeSnapshot or PersistentVolumeClaim
resources:
requests:
storage: 10Gi
# bind to existing PV
selector: matchLabels
storageClassName: ssd
volumeMode: Filesystem # or Block
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: foo
spec:
behavior: # fine tune when to scale up / down
maxReplicas: 5
minReplicas: 1
metrics:
- # TODO
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: foo
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: foo
spec:
# when you have a low number of replicas
# ensure you can disrupt them
maxUnavailable: 1
# allows for more disruptions
minAvailable: 75%
selector:
matchLabels:
app.kubernetes.io/name: foo
app.kubernetes.io/instance: default