Table of Contents
Fetching ...

Tight Analyses of Ordered and Unordered Linear Probing

Mark Braverman, William Kuszmaul

TL;DR

The amortized complexity with matching upper and lower bounds of $\Theta(x \log^{1.5} x)$ is settled and tight bounds for the so-called path surplus problem are obtained, a problem in combinatorial geometry that has been shown to be closely related to linear probing.

Abstract

Linear-probing hash tables have been classically believed to support insertions in time $Θ(x^2)$, where $1 - 1/x$ is the load factor of the hash table. Recent work by Bender, Kuszmaul, and Kuszmaul (FOCS'21), however, has added a new twist to this story: in some versions of linear probing, if the \emph{maximum} load factor is at most $1 - 1/x$, then the \emph{amortized} expected time per insertion will never exceed $x \log^{O(1)} x$ (even in workloads that operate continuously at a load factor of $1 - 1/x$). Determining the exact asymptotic value for the amortized insertion time remains open. In this paper, we settle the amortized complexity with matching upper and lower bounds of $Θ(x \log^{1.5} x)$. Along the way, we also obtain tight bounds for the so-called path surplus problem, a problem in combinatorial geometry that has been shown to be closely related to linear probing. We also show how to extend Bender et al.'s bounds to say something not just about ordered linear probing (the version they study) but also about classical linear probing, in the form that is most widely implemented in practice.

Tight Analyses of Ordered and Unordered Linear Probing

TL;DR

The amortized complexity with matching upper and lower bounds of is settled and tight bounds for the so-called path surplus problem are obtained, a problem in combinatorial geometry that has been shown to be closely related to linear probing.

Abstract

Linear-probing hash tables have been classically believed to support insertions in time , where is the load factor of the hash table. Recent work by Bender, Kuszmaul, and Kuszmaul (FOCS'21), however, has added a new twist to this story: in some versions of linear probing, if the \emph{maximum} load factor is at most , then the \emph{amortized} expected time per insertion will never exceed (even in workloads that operate continuously at a load factor of ). Determining the exact asymptotic value for the amortized insertion time remains open. In this paper, we settle the amortized complexity with matching upper and lower bounds of . Along the way, we also obtain tight bounds for the so-called path surplus problem, a problem in combinatorial geometry that has been shown to be closely related to linear probing. We also show how to extend Bender et al.'s bounds to say something not just about ordered linear probing (the version they study) but also about classical linear probing, in the form that is most widely implemented in practice.
Paper Structure (35 sections, 66 theorems, 203 equations, 4 figures, 1 algorithm)

This paper contains 35 sections, 66 theorems, 203 equations, 4 figures, 1 algorithm.

Key Result

Theorem 1.1

Consider an ordered linear probing hash table, where deletions are implemented with tombstones, and where rebuilds are performed every $n / \operatorname{polylog} x$ insertions/deletions. If the load factor of the hash table stays at or below $1 - 1/x$ at all times, then the amortized expected time

Figures (4)

  • Figure 1: An example with $m^2 = 100$. In this case, there are 110 blue dots and 97 red dots. A surplus-maximizing monotonic path is given, and the surplus of the path is 32.
  • Figure 2: The same path as in Figure \ref{['fig:surplus']}, but in the rotated version of the problem considered in Section \ref{['sec:pathsurpluslower']}. The constraint that the path is monotonic now becomes a constraint on slope: the slope of the path must always stay in $[-1, 1]$.
  • Figure 3: An example of what the base-case subproblem (i.e., $L = L_0$) would look like if $\text{surplus}(R_L) < 0$ (so $p'_L = (x_L, y_L - q_L)$). The algorithm would then recurse on $A_L$ and $B_L$. Note that $A_L$ and $B_L$ have slopes that are very close (within $O(1/ \sqrt{\log m})$) to that of $L$. This will be important for making sure that the recursion is able to (most likely) get to depth $\Theta(\log m)$ before terminating (i.e., before getting to a line with slope close to $1$).
  • Figure 4: If we look at the implied path $\operatorname{Path}(F)$ for $F$, we take the line segment between diagonals $D^{(i - 1)}_\ell$ and $D^{(i)}_\ell$, and we extend that segment to reach diagonal $D^{(i + 1)}_\ell$, then $\Delta_i$ measures the distance (in multiples of $\sqrt{2} q_\ell$) between where the extended segment hits $D^{(i + 1)}_\ell$ versus where $\operatorname{Path}(F)$ hits $D^{(i + 1)}_\ell$. An example is shown in the figure, where the diagonals $D^{(i - 1)}_\ell, D^{(i)}_\ell, D^{(i + 1)}_\ell$ are in red with ticks every distance $q_\ell$; the path $\operatorname{Path}(F)$ is in black; the extension of the segment between $D^{(i - 1)}_\ell$ and $D^{(i)}_\ell$ is given as a dotted line; and $\Delta_i(F)$ is computed as $3$.

Theorems & Definitions (110)

  • Theorem 1.1: Upper bound of bender2022linear
  • Theorem 1.2: Lower bound of bender2022linear
  • Theorem 1.3
  • Theorem 1.4
  • Corollary 1.0
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • Lemma 3.3
  • ...and 100 more