The DIY PaaS, and today I'm thinking about how to measure CPU and memory in this giant jigsaw puzzle.
The kubernetes control plane exposes about itself through its apiservers. Additionally, kube-state-metrics generates metrics about the things inside the cluster.
TODO:
decide if we need to replace the instance
label to a stable name in the case of multiple instances
1scrape_configs:
2 - job_name: kubernetes-apiservers
3 kubernetes_sd_configs:
4 - role: endpoints
5 scheme: https
6 tls_config:
7 ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
8 bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
9 relabel_configs:
10 - source_labels:
11 - __meta_kubernetes_namespace
12 - __meta_kubernetes_service_name
13 - __meta_kubernetes_endpoint_port_name
14 action: keep
15 regex: default;kubernetes;https
16
17 # note kube-state-metrics also has an alternate port with metrics about itself
18 - job_name: kube-state-metrics
19 kubernetes_sd_configs:
20 - role: endpoints
21 relabel_configs:
22 - source_labels:
23 - __meta_kubernetes_namespace
24 - __meta_kubernetes_service_name
25 - __meta_kubernetes_endpoint_port_name
26 action: keep
27 regex: kube-state-metrics;kube-state-metrics;kube-state-metrics
Kubernetes kubelets expose both their own metrics and the metrics on pods runnin on their node
1scrape_configs:
2 - job_name: kubernetes-nodes
3 kubernetes_sd_configs:
4 - role: node
5 scheme: https
6 tls_config:
7 ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
8 insecure_skip_verify: true
9 bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
10 relabel_configs:
11 - action: labelmap
12 regex: __meta_kubernetes_node_label_(.+)
13
14 - job_name: kubernetes-cadvisor
15 kubernetes_sd_configs:
16 - role: node
17 scheme: https
18 metrics_path: /metrics/cadvisor
19 tls_config:
20 ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
21 insecure_skip_verify: true
22 bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
23 relabel_configs:
24 - action: labelmap
25 regex: __meta_kubernetes_node_label_(.+)
26
27 - job_name: node-exporter
28 kubernetes_sd_configs:
29 # remember to create a service to capture expose this
30 # using role: node is also possible if you expose a hostport
31 - role: endpoints
32 relabel_configs:
33 - source_labels:
34 - __meta_kubernetes_namespace
35 - __meta_kubernetes_service_name
36 - __meta_kubernetes_endpoint_port_name
37 action: keep
38 regex: node-exporter;node-exporter;node-exporter
39 - action: replace # rename the instance from the discovered pod ip (we're using endpoints) to the node name
40 target_label: instance
41 source_labels:
42 - __meta_kubernetes_pod_node_name
prometheus has a useful /federate
endpoint you can use to dump out everything
after relabelling, example query:
1http://localhost:8080/federate?match[]={job=~".*"}
repo: testrepo-cluster-metrics
In the future we might be able to run
N+1
(N
= number of nodes) instances of
opentelemetry-collector,
N
as agents (DaemonSet) and 1
as gateway,
replacing the need for N
node-exporter, 1
kube-state-metrics, as well as N+1
tracing
collectors and N+1
logging collectors.
As it currently stands, it still needs some more work to export the metrics in a stable manner,
and maybe some extra exporters to write directly to storage.