blog

12021-01-18

SEAN K.H. LIAO

kubernetes

You're thinking of running kubernetes. Let's think this through first.

before you start

Where do you want to run it? In the cloud with a managed control plane: GKE, EKS? In the cloud with bare VMs? Or on prem?

If it's managed, do you use terraform, use the cloud provider specific CLI / SDK / config management, or do you have a snowflake cluster where you can use gardener.

If not, you need to at least choose: a CNI provider for networking, and a CSI provider for storage. Which one is probably going to be affected by the features you need later.

Also, you need to decide how you want to spin up / manage your nodes / kubelets. kubeadm, k3s, ... or something more specialized, maybe k0s.

config management

You have a cluster, now you want to install stuff into it. Best to decide now which tool you want to use and standardize on it. helm is the package manager, but it's a shitty one, you're almost certainly better off without it unless you run stock everything. You can always use raw manifests but kustomize is a worthwhile layer on top, it even comes built in with kubectl.

Or you just need to be different, grafana tanka uses jsonnet, terraform has a kubernetes provider.

Whatever you choose, you also have the problem of secrets. sealed-secrets are a write only solution, a lot of other ones use sops under the hood, like ksops. Or be like me and hack up some dingy solution with gitattributes filter and age.

cluster basics

It's alive, now what? If you serve web traffic, you'll want an ingress controller to give you more than L4 routing, ingress-nginx is default (not to be confused with nginx ingress), traefik is pretty popular if you're not doing weird stuff, or maybe you already know you need a service mesh, istio has the most mindshare, linkerd is another big one, others seem more half-hearted.

It's 2021, you need to serve TLS, cert-manager is almost mandatory, even if it is still a pain to manage/upgrade. If you do the sane thing and give each service its own subdomain, you'll need to manage your DNS entries too, external-dns can do that for you.

Oh and if you're not in the cloud, you'll need something like MetalLB, hostports, or some other way of exposing your ingress controllers to the outside world, envoy with some static config will probably also work, you're only exposing a single service anyway.

observability

What next? you can deploy your application now and it'll serve traffic just fine. But you want to know what it's doing, how well it's performing and have something to look at when things go wrong. Say hello to the 3 pillars of oberservability: metrics, logs, tracing.

prometheus is what everyone uses for metrics (unless you use some hosted thing), it scrapes metrics from various services and stores it. If you run at some mind boggling scale, you can consider thanos or cortex for scaling prometheus, but for most people a beefy prometheus (HA pair if you care) is enough.

Along with prometheus, you'll want: kube-state-metrics to expose k8s things to prometheus, node exporter for metrics about your host, pushgateway if you have things that can't be scraped, and alertmanager for alerting.

For logs you'll want promtail to scrape the logs, and loki to store them.

Don't forget the most important part, pretty dashboards! Do you even have a choice other than grafana?

For tracing, well... until opentelemetry manages to stop making breaking api changes every few weeks, one of jaeger or zipkin will have to do (both have committed to eventually converge on opentelemetry). Maybe you can store in grafana tempo?

If you're fancy, you can run the opentelemetry collector to ingest traces and metrics so you can preprocess / filter / reexport them, you still need to store the data somewhere though, so it doesn't really save you from running the above components.

cicd

Now... you want a fluid way of getting your application code (or any of the previous components) into a running cluster. git push should be all you need!

For CI you may want some solution integrated with you code host, or you want to run something yourself, tekton is a decent choice.

What about CD? What about it? If you believe in GitOps (code in git describes desired state, controllers make it happen), fluxcd is available but argocd is more mature.

security

Ah, better have a good threat model in mind before you start thinking about this.

For you cluster, you'll want to lock down RBAC, and since it's allow only (no deny) hierarchical namespaces make managing permissions less daunting.

LimitRange can ensure nothing will exhaust all your resources, PodSecurityPolicies limits the permissions of your pods, and NetworkPolicies limits the cross talk between your services, though if you use istio you can control it at L7 with AuthorizationPolicies.

I'm sure you can dream up of various things you want to enforce on your k8s objects OpenPolicyAgent Gatekeeper makes that possible.

What about protecting your services? pomerium is a decent solution to implement SSO for multiple services with an external provider, integrating with your ingress controller. I'm sure you could cook up something similar with the ory projects.

are we done yet

Maybe? Are we ever done? At least your resume is now filled with exciting new technologies...