vector.dev experience report

fluentbit in rust

vector dot dev

Vector is a unified observability agent, written in rust by Timber Technologies (timber.io) and acquired by Datadog. It processes logs, metrics, and spans (traces) as events through a DAG (graph) of composable transforms, from and through to a variety a sources and sinks. Its unique selling points include: its own DSL for wrangling events, vector tap to view a live stream of events flowing through the pipeline, and a unit test framework.

does it work

5 months in to implementation, we've used it as both a logging agent and aggregator, and a metrics aggregator. It works, most of the time.

good parts

Runtime taps: the aforementioned vector tap command is a great way to have visibility into the pipeline you've created. You get to see a representation of the event at ~any point within the graph, allowing for quick feedback in locating issues. Unfortunately, it can be buggy with the live reload feature...
Testing framework: the unit tests are a great way to lock in desired behaviour, or maybe confirm no breaking changes in a big refactor, but the way they've been implemented seems somewhat inconsistent, especially in logs (input is a map of VRL statements rather than an actual event).

bad parts

Inconsistent event model: proper namespacing is key to allowing different authors to work together properly, vector's internal event model doesn't care and just dumps key-value pairs into the root of the event, good luck cleaning that up.
Breaking patch releases: major.minor.patch, Semver 2.0 isn't hard to understand, and patch releases should only be fixing bugs in a backwards compatible way, but no! vector makes breaking changes to its DSL in patches.
VRL limitations: VRL, vector's own DSL, has a variety of weird restrictions or inconsistencies in how it determines whether something is fallible, or how it reports errors. Sure the one liners in the demo look nice, but if you want guarantees, your code might as well look like Go.
Code reuse: Vector builds out a processing graph, but the config for it isn't really intuitive or friendly to reuse. Each block stands alone, so without a lot of discipline in file layout, seeing how your config ties together is quite the challenge.
Resource leaks: we've observed vector just slowly breaking over the course of days, showing the classic signs of a resource leak. Interestingly, it's a CPU leak, where it'll just pin a CPU doing.... not much. Of course, it lacks the tools to observe it.
Production (un)readiness: Vector invented their own protocol for communicating between vector instances, as far as I can tell, it's based on gRPC, with no hard numbers to back up claims of "it's faster". These days, the first thing I ask when i hear gRPC is, "how do you load balance it?". Vector's answer appears to be don't bother.