Histograms are back

Histograms are available again to all customers starting today. The stability of this feature has been tested by several customers in the last weeks.

We believe histograms are very important when monitoring response-times, so when we removed this feature temporarily in October we knew it would come back soon.

In statistics, the calculation of the average assumes a normal distribution of all the recorded values, an assumption that is not true for response times. The average response time is a largely misleading metric. Benchmarks calculating this value should not be trusted.

For this reason we use 95% percentile in our charts. The 95% percentile is calculated by sorting all response times in order and picking the highest value that is lower or equal to 95% of all requests. Picking the 95% percentile over 90% or 99% is rather arbitrary, we are working on better data structures that allow you to compare those values. Computing percentiles with a huge amount of data efficiently is not an easy task.

Where do histograms fit in this picture? Code usually has different execution paths depending on input, influencing the response times heavily. This could be due to caching, invalid user-input leading to early exit or many other causes.

Histograms take a different angle on data, removing the time dimension and counting the number of occurances that fall into a pre-defined interval such as 0-29ms, 30-59ms and so on.

The following histogram extracted from the Profiler is an example of how helpful histograms are:

Histogram with two Peaks

There are two peaks in this histogram, suggesting there are two execution paths in the code that handle this transaction.

Compare this to a more boring transaction that has only a single peak:

Histogram with one Peak

Histograms complements charts with time and percentile values, allowing you to better understand how you need to optimize your code to achieve better performance for your users.

Benjamin 31.01.2015