promql histogram_quantile and rate

is the rate necessary?

histogram_quantile

PromQL, the query language for prometheus, has a histogram_quantile function to calculate φ-quantiles from a histogram. Or in other words, estimate where a particular quantile was from your partially aggregated data.

All the examples show it used like: histogram_quantile(0.99, rate(some_metric_query_bucket[5m])), and someone asked: is the rate() necessary?

refreshers

Let's start from basics

vector refresher

Prometheus records data in time series. An instant-vector gives you a single data point per timestamp, per time series:

1time (min)  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
2metric_one  1  1  2  2  4  4  8  8  8  8 16 16 32 32 64 64

A range-vector gives you a slice of data (covering the lookback period up to now), for each point in time. For the above with a range of 2:

1time (min)       0     1       2       3       4       5       6       7      8       9        10        11         12         13         14         15
2metric_one[2m] [1] [1 1] [1 1 2] [1 2 2] [2 2 4] [2 4 4] [4 4 8] [4 8 8] [8 8 8] [8 8 8] [8 8 16] [8 16 16] [16 16 32] [16 32 32] [32 32 64] [32 64 64]

rate refresher

rate() takes a range vector, and calculates a per second average increase over the time period, eg, for a given range [8 16 16], the increase is 8 over 2m (120s), giving a rate 8 / 120 = approx. 0.67.

1time (min)           0     1     2     3     4     5     6     7     8     9    10    11    12    13    14    15
2rate(metric_one[2m]) 0     0 0.008 0.008 0.017 0.017 0.033 0.033     0     0 0.067 0.067 0.133 0.133 0.267 0.267

histograms refresher

The histograms produced by prometheus are a collection of counters, each having a le label denoting bucket boundaries, recording the count up to now (monotonic over time) and up to le in size (monotonic over the buckets).

 1time (min)                   0  1  2  3  4  5
 2metric_two_bucket{le="10"}   0  1  1  4  4  4
 3metric_two_bucket{le="50"}   0  1  2  6  6  8
 4metric_two_bucket{le="100"}  0  2  3  7  7  9
 5metric_two_bucket{le="+Inf"} 0  2  4  8  8 10
 6---
 7represents the following events:
 8time (min)                   0  1  2  3  4  5
 90 - 10                       0  1  0  3  0  0
1010 - 50                      0  0  1  1  0  2
1150 - 100                     0  1  0  0  0  0
12100 - Inf                    0  0  1  0  0  0

histogram_quantile refresher

For a given instant, histogram_quantile looks at the increases between buckets to get a distribution of events. It then calculates the quantile from this distribution, interpolating if necessary.

ref: quantile.go

1metric_two_bucket{le="10"}    40     40
2metric_two_bucket{le="50"}    80  -> 40
3metric_two_bucket{le="100"}   90     10
4metric_two_bucket{le="+Inf"} 100     10
5---
6quantile = 0.85 sits between bucket 2 and 3,
7so: bucket_2_bound + (bucket_3_bound - bucket_2_bound) * (quantile * all_events - bucket_2_events) / events_in_bucket)
8so:        50      + (      100      -       50      ) * (  0.85   *    100     -       80       )  / (90-80)          = 75

From here, we can see that the absolute value of the buckets don't matter, only their relative sizes.

is rate necessary?

histogram_quantile has no intrinsic requirement that the argument passed to is has passed through rate, it will happily calculate the quantile for any set of buckets represented by instant vectors. For most histograms, that means a quantile representing over all of the data.

However, in most cases, we'll want rate or increase, bounding our quantile calculations to fresh data. rate(metric_three_bucket[2m]) gives us the increase over the last 2 mins, meaning our quantile calculations are for all the requests in the past 2 mins, rather than for all of time.