Table of Contents
Fetching ...

Tell-Tale Tail Latencies: Pitfalls and Perils in Database Benchmarking

Michael Fruth, Stefanie Scherzinger, Wolfgang Mauerer, Ralf Ramsauer

TL;DR

This paper demonstrates how Java-based benchmarking approaches can substantially distort tail latency observations, and makes a case for purposefully re-designing database benchmarking harnesses based on these observations to arrive at faithful characterisations of database performance from multiple important angles.

Abstract

The performance of database systems is usually characterised by their average-case (i.e., throughput) behaviour in standardised or de-facto standard benchmarks like TPC-X or YCSB. While tails of the latency (i.e., response time) distribution receive considerably less attention, they have been identified as a threat to the overall system performance: In large-scale systems, even a fraction of requests delayed can build up into delays perceivable by end users. To eradicate large tail latencies from database systems, the ability to faithfully record them, and likewise pinpoint them to the root causes, is imminently required. In this paper, we address the challenge of measuring tail latencies using standard benchmarks, and identify subtle perils and pitfalls. In particular, we demonstrate how Java-based benchmarking approaches can substantially distort tail latency observations, and discuss how the discovery of such problems is inhibited by the common focus on throughput performance. We make a case for purposefully re-designing database benchmarking harnesses based on these observations to arrive at faithful characterisations of database performance from multiple important angles.

Tell-Tale Tail Latencies: Pitfalls and Perils in Database Benchmarking

TL;DR

This paper demonstrates how Java-based benchmarking approaches can substantially distort tail latency observations, and makes a case for purposefully re-designing database benchmarking harnesses based on these observations to arrive at faithful characterisations of database performance from multiple important angles.

Abstract

The performance of database systems is usually characterised by their average-case (i.e., throughput) behaviour in standardised or de-facto standard benchmarks like TPC-X or YCSB. While tails of the latency (i.e., response time) distribution receive considerably less attention, they have been identified as a threat to the overall system performance: In large-scale systems, even a fraction of requests delayed can build up into delays perceivable by end users. To eradicate large tail latencies from database systems, the ability to faithfully record them, and likewise pinpoint them to the root causes, is imminently required. In this paper, we address the challenge of measuring tail latencies using standard benchmarks, and identify subtle perils and pitfalls. In particular, we demonstrate how Java-based benchmarking approaches can substantially distort tail latency observations, and discuss how the discovery of such problems is inhibited by the common focus on throughput performance. We make a case for purposefully re-designing database benchmarking harnesses based on these observations to arrive at faithful characterisations of database performance from multiple important angles.

Paper Structure

This paper contains 24 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Database throughput for MariaDB, in thousand requests per second for benchmarks NoOp, YCSB and TPC-C, and different JVM/GC configurations of OLTPBench. Throughput is affected marginally by the choice of JVM, but not the GC.
  • Figure 2: The latency distributions measured by OLTPBench for three benchmarks, visualised as box plots. Key percentiles are highlighted.
  • Figure 3: Latency time series of the NoOp benchmark. Minimum and maximum latencies measured with OLTPBench are marked by red, labelled triangles. Grey dots represent extreme values, ochre dots (down-sampled) standard observations. Latencies from the JVM log file are superimposed in black. The red line shows the sliding mean window.
  • Figure 4: Latency time series of the YCSB benchmark for read (ReadRecord) and write (UpdateRecord) transactions. Labels and colours as in Figure \ref{['fig:latenciesNoop']}.
  • Figure 5: Latency time series of the TPC-C benchmark for read (OrderStatus) and write (NewOrder) transactions. Labels and colours as in Figure \ref{['fig:latenciesNoop']}.