Reading ServiceMonitors in OpenTelemetry

Finally something that's not Prometheus

Reading ServiceMonitors in OpenTelemetry

Finally something that's not Prometheus

ServiceMonitors with Prometheus

ServiceMonitors are CRDs deployed as part of Prometheus Operator. They allow each service to define their own scrape configs as a dedicated Kubernetes resource, giving them more control than just the usual scheme/port/path annotations, such as over TLS, interval, relabelling, etc.

The Operator then discovers all the ServiceMonitors, injects some standard relabellings, and generates a massive prometheus config file. Prometheus reloads its config, and goes on its merry way.

Note here, the Operator needs access to Kubernetes to read ServiceMonitors, and Prometheus also needs access to Kubernetes to discover the Pods behind the Services referenced by ServiceMonitors.

OpenTelemetry Collector

OpenTelemetry Collector's prometheus receiver reuses the scraping code directly from Prometheus. This means it has access to all the usual prometheus service discovery options for scraping. But that's not enough, Prometheus itself doesn't read ServiceMonitors, and the way the Prometheus Operator interfaces with Prometheus (generate config file and reload) isn't compatible with the OpenTelemetry Collector (prometheus receiver config is inlined, $ needs escaping to $$, no hot reload).

Target Allocator

Instead, it takes a different approach: a target allocator is responsible for all the communication with kubernetes (reading ServiceMonitors, discovering actual pods behind Services). It serves up the results to the collector's prometheus receiver over the HTTP SD mechanism (JSON describing each target). The OpenTelemetry Collector just needs scrape the targets returned in the list.

Following this approach, sharding becomes possible, and also changing the allocation strategy is much easier since its a separate component. Though it does come with a slight disadvantage that it may be easier for the cached target list and the available pods to go out of sync.

And all this works as of: