Things either just don't work out or are not very interesting.

otel metrics

I don't know what the people writing OpenTelemetry use, but it's certainly not prometheus + grafana. Anyway, their runtime and host metrics use an additive counter which looks like an ever increasing graph in grafana. Adding rate() and losing fidelty in the view just doesn't seem right.

otel collector

I know it's beta, but still. v0.10.0 can't scrape prometheus metrics because it doesn't recognize up and v0.11.0 can't scrape kubernetes resources because something broke the parser for kubernetes_sd_configs.

traefik observability

traefik apparently uses some shared observability middleware because it logs the /ping and /metrics requests every time, cluttering up the logs with no way to filter them out.

What's worse is it does the same for tracing, cluttering up jaeger with a bunch of forward ping@internal/ping@internal traces (also the prometheus variant). It for some reason doesn't appear to respond to remote sampling configs. I was hoing to use the otel collector to filter it but that's broken in its own way.


gRPC doesn't actually try to make a connection on Dial, instead deferring that until a call is actually made. ~~Maybe I need to import the health and/or reflection things server side and make the client do an explicit check. I was hoping Dial would fail and I can just set the client to nil...~~

Edit: use WithBlock and rtfm.