Table of Contents
Fetching ...

Blitzcrank: Fast Semantic Compression for In-memory Online Transaction Processing

Yiming Qiao, Yihan Gao, Huanchen Zhang

TL;DR

Blitzcrank tackles the challenge of fast, fine-grained semantic compression for in-memory OLTP by learning column distributions and encoding values with a novel delayed coding scheme. It introduces fast semantic models for discrete and continuous data, plus a fixed-length, mixed-radix encoding method that yields near-entropy compression with sub-microsecond tuple decompression. Integrated into an in-memory DBMS (Silo), Blitzcrank achieves significant memory reduction and sustains throughput in both memory-resident and larger-than-memory datasets, outperforming Zstandard and Raman in compression and latency. The results demonstrate that semantic compression can be practical for OLTP with tight latency requirements and substantial working-set reductions.

Abstract

We present BLITZCRANK, a high-speed semantic compressor designed for OLTP databases. Previous solutions are inadequate for compressing row-stores: they suffer from either low compression factor due to a coarse compression granularity or suboptimal performance due to the inefficiency in handling dynamic data sets. To solve these problems, we first propose novel semantic models that support fast inferences and dynamic value set for both discrete and continuous data types. We then introduce a new entropy encoding algorithm, called delayed coding, that achieves significant improvement in the decoding speed compared to modern arithmetic coding implementations. We evaluate BLITZCRANK in both standalone microbenchmarks and a multicore in-memory row-store using the TPC-C benchmark. Our results show that BLITZCRANK achieves a sub-microsecond latency for decompressing a random tuple while obtaining high compression factors. This leads to an 85% memory reduction in the TPC-C evaluation with a moderate (19%) throughput degradation. For data sets larger than the available physical memory, BLITZCRANK help the database sustain a high throughput for more transactions before the l/O overhead dominates.

Blitzcrank: Fast Semantic Compression for In-memory Online Transaction Processing

TL;DR

Blitzcrank tackles the challenge of fast, fine-grained semantic compression for in-memory OLTP by learning column distributions and encoding values with a novel delayed coding scheme. It introduces fast semantic models for discrete and continuous data, plus a fixed-length, mixed-radix encoding method that yields near-entropy compression with sub-microsecond tuple decompression. Integrated into an in-memory DBMS (Silo), Blitzcrank achieves significant memory reduction and sustains throughput in both memory-resident and larger-than-memory datasets, outperforming Zstandard and Raman in compression and latency. The results demonstrate that semantic compression can be practical for OLTP with tight latency requirements and substantial working-set reductions.

Abstract

We present BLITZCRANK, a high-speed semantic compressor designed for OLTP databases. Previous solutions are inadequate for compressing row-stores: they suffer from either low compression factor due to a coarse compression granularity or suboptimal performance due to the inefficiency in handling dynamic data sets. To solve these problems, we first propose novel semantic models that support fast inferences and dynamic value set for both discrete and continuous data types. We then introduce a new entropy encoding algorithm, called delayed coding, that achieves significant improvement in the decoding speed compared to modern arithmetic coding implementations. We evaluate BLITZCRANK in both standalone microbenchmarks and a multicore in-memory row-store using the TPC-C benchmark. Our results show that BLITZCRANK achieves a sub-microsecond latency for decompressing a random tuple while obtaining high compression factors. This leads to an 85% memory reduction in the TPC-C evaluation with a moderate (19%) throughput degradation. For data sets larger than the available physical memory, BLITZCRANK help the database sustain a high throughput for more transactions before the l/O overhead dominates.
Paper Structure (42 sections, 2 theorems, 28 equations, 34 figures, 3 tables, 6 algorithms)

This paper contains 42 sections, 2 theorems, 28 equations, 34 figures, 3 tables, 6 algorithms.

Key Result

Theorem 1

Every probability vector $\pi_1, \cdots, \pi_N$, can be expressed as an equiprobable mixture of $N$ two-point distributions. That is, there are $N$ pairs of integers $(\alpha_1, \beta_1)$, $\cdots$, $(\alpha_N, \beta_N)$ and probabilities $w_1, \cdots, w_N$ such that for $1 \leq i \leq N$, where $Y^{(1)}$, $\cdots$, $Y^{(N)}$ are two-point distributions.

Figures (34)

  • Figure 1: DB Size vs. Latency - Blitzcrank makes the size-latency trade-offs more attractive compared to other tools in TPC-C.
  • Figure 2: An Example of Arithmetic Coding - Arithmetic coding maps each possible string to disjoint probability intervals.
  • Figure 3: An Example of Column Correlation - Probabilities of column "gender" depends on the "name" column value.
  • Figure 4: Blitzcrank - Semantic Learner (SL), Attribute Encoder (AE), and Tuple Encoder (TE) are three components of Blitzcrank.
  • Figure 5: Interval Allocation By Pairing Symbols - There are three interval pairs $\{Y^{(1)}, Y^{(2)}, Y^{(3)}\}$. In each pair, two symbols $\{(\alpha_N, \beta_N)\}$ and the symbol boundary $\{w_N\}$ are saved.
  • ...and 29 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2