SEANK.H.LIAO

kubelet stats

is it deprecated? maybe

kubelet stats

Recently, we switched over to the OpenTelemetry Collector for collecting metrics from Kubernetes. One of the things we collect is container and pod level metrics from kubelet, and it's complicated.

current state

First, what do we have to work with?

kubelet

These are the endpoints exposed by kubelet (test with kubectl get --raw "/api/v1/nodes/$NODE_NAME/proxy/$ENDPOINT"):

otelcol
kubeletstats

The kubeletstats receiver would appear to be the primary receiver to use for collecting metrics from kubelet. This reads data from /stats/summary. It only works for a single kubelet though, which is fine if you run the collector as a daemonset, but alternatively, you could use the receivercreator in combination with a k8sobserver to generate subreceivers for nodes. Example:

 1extensions:
 2  k8sobserver:
 3    observe_pods: false
 4    observe_nodes: true
 5
 6receivers:
 7  receiver_creator:
 8  watch_observers:
 9    - k8sobserver
10  receivers:
11    kubeletstats:
12      rule: type == "k8s.node"
13      config:
14        endpoint: "`endpoint`:`kubelet_endpoint_port`"
15        extra_metadata_labels:
16          - container.id
17        metric_groups:
18          - container
19          - pod
20          - node
21
22exporters:
23  logging:
24    verbosity: detailed
25
26service:
27  extensions:
28    - k8sobserver
29  pipelines:
30    metrics:
31      receivers:
32        - receiver_creator
33      exporters:
34        - logging
prometheus

Of course, you could just fall back to what prometheus has been doing all along:

 1receivers:
 2  prometheus:
 3    config:
 4      scrape_configs:
 5        - job_name: k8s
 6          kubernetes_sd_configs:
 7            - role: node
 8          scheme: https
 9          metrics_path: /metrics/cadvisor
10          tls_config:
11            ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
12          authorization:
13            credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token

future changes

Poke around enough on the internet and you might find references to Reduce the set of metrics exposed by the kubelet and related proposals.

The current state is probably best described by KEP-2371 cAdvisor-less, CRI-full Container and Pod Stats. In short, container runtimes still need direct integration with cAdvisor to expose metrics, but cAdvisor doesn't run everywhere (ex. Windows, virtual machines), and this breaks the CRI abstraction. Instead, add support for extra metrics fields in CRI, and reduce metrics cAdvisor will collect to just the node/host. If you need extra metrics, run cAdvisor as a daemonset.

Of course, all this investigation comes from wanting container_cpu_cfs_throttled_seconds_total (or equivalent metric) from /stats/summary, but that doesn't appear to be planned for CRI...

The above appears to still be in Alpha for Kubernetes 1.25, so expect more to change.