job3 6 months

do we really understand mvps?

shifting focuses

Hot on the heels of a setup (not full migration) for a new logging vendor, we ran straight in to the setup of a new metrics vendor.

Was our new logging stack tested for load / stability? No. Was it tested with clients? Not really. Do we know when it breaks? Not unless someone complains.... But who cares, onto the next shiny new thing.

metrics

A lot of my work has been babysitting various OpenTelemetry Collector deployments into something resembling production readiness. It's a 2 layer architecture with them all feeding in to centralized (single point of failure!) Vector instances. Yes, I'm still complaining about having to write VRL rather than proper code.

We officially 3 protocols, with 3 different pain points: OTLP/gRPC: people aren't used to push protocols yet. (Dog)StatsD: "it's a simple protocol"... leading to half-baked implementations everywhere. Prometheus + ServiceMonitors: scaling the Target Allocator has thus far required an internal fork.

Where are we with metrics? Probably the same place we were with logs, but default on instead of default off for more of the pipeline.

Through various accounting tricks, the actual project scope was minimized. Easy to claim "completion" this way, but it doesn't change the reality that the actual migration of services requires a lot more supporting work to be done.

Maybe it's more politically palatable this way? But it does feel like a much worse user experience. Sometimes, I do wonder on how product-minded the team is (or is not). It's something I'll have to explore over the next quarter I guess.