Table of Contents
Fetching ...

Bounding the Fragmentation of B-Trees Subject to Batched Insertions

Michael A. Bender, Aaron Bernstein, Nairen Cao, Alex Conway, Martín Farach-Colton, Hanna Komlós, Yarin Shechter, Nicole Wein

Abstract

The issue of internal fragmentation in data structures is a fundamental challenge in database design. A seminal result of Yao in this field shows that evenly splitting the leaves of a B-tree against a workload of uniformly random insertions achieves space utilization of around 69%. However, many database applications perform batched insertions, where a small run of consecutive keys is inserted at a single position. We develop a generalization of Yao's analysis to provide rigorous treatment of such batched workloads. Our approach revisits and reformulates the analytical structure underlying Yao's result in a way that enables generalization and is used to argue that even splitting works well for many workloads in our extended class. For the remaining workloads, we develop simple alternative strategies that provably maintain good space utilization.

Bounding the Fragmentation of B-Trees Subject to Batched Insertions

Abstract

The issue of internal fragmentation in data structures is a fundamental challenge in database design. A seminal result of Yao in this field shows that evenly splitting the leaves of a B-tree against a workload of uniformly random insertions achieves space utilization of around 69%. However, many database applications perform batched insertions, where a small run of consecutive keys is inserted at a single position. We develop a generalization of Yao's analysis to provide rigorous treatment of such batched workloads. Our approach revisits and reformulates the analytical structure underlying Yao's result in a way that enables generalization and is used to argue that even splitting works well for many workloads in our extended class. For the remaining workloads, we develop simple alternative strategies that provably maintain good space utilization.
Paper Structure (38 sections, 18 theorems, 106 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 38 sections, 18 theorems, 106 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Lemma 4

Let $A\in\mathbb{R}^{d\times d}$ be an irreducible Metzler matrix, and assume there exists a positive (entry-wise) $w$ such that $w^{T}\cdot A=r\cdot w^{T}$ for some $r>0$, then:

Figures (5)

  • Figure 1: Experimental fullness of deferred and even splitting on batch insertions on blocks of size $B=240$. 200,000 insertions are made from empty using batch insertions of varying length ($r \in [1,B]$). The mean fullness across 10 independent runs is shown. Bounds from \ref{['lem:DeferredEvenSplitting']} are shown in red for comparison.
  • Figure 2: Experimental fullness of deferred and even splitting on batch insertions on blocks of size $B=240$. 200,000 insertions are made from empty using batch insertions of varying length ($r \in [B, 5B]$). The mean fullness across 10 independent runs is shown, as a function of $\alpha = r/B$.
  • Figure 3: Our theoretical bounds, presented in \ref{['tab:fills']}, plotted as a function of $r/B$. The red segments represents the first line in \ref{['tab:fills']}, while the continuous orange plot represents the remaining entries.
  • Figure 4: TargetSplit
  • Figure 5: Example for \ref{['subsec:LargeRegime']} illustrating a batch of $r=1.7B$ insertions hitting a block of size $0.7B$.

Theorems & Definitions (39)

  • Definition 1
  • Definition 2
  • Definition 3
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • Lemma 6
  • proof
  • Definition 7
  • ...and 29 more