Table of Contents
Fetching ...

Efficient $d$-ary Cuckoo Hashing at High Load Factors by Bubbling Up

William Kuszmaul, Michael Mitzenmacher

TL;DR

Bubble-up cuckoo hashing delivers an online, high-load $d$-ary cuckoo hash design that matches offline optimality in the required number of hash functions while preserving fast operations. By organizing the process into phases and exploiting a growing core table with a core-independence property, the method achieves $d = \lceil \ln \epsilon^{-1} + \alpha \rceil$ with insertion time $O(\delta^{-1})$ for load $1-\delta$, and positive query time $O(1)$ independent of $d$ and $\epsilon$. The core/bubble-up framework ensures that most probes occur in the core, enabling efficient insertions and constant-time positive queries even as $d$ grows. The work also accommodates deletions via tombstones with manageable rebuild costs and discusses practical extensions to bucketized and tabulation-based hashing. Overall, the approach narrows the gap between online and offline capabilities for high-load $d$-ary cuckoo hashing and opens avenues for robust, scalable hash-table implementations.

Abstract

A $d$-ary cuckoo hash table is an open-addressed hash table that stores each key $x$ in one of $d$ random positions $h_1(x), h_2(x), \ldots, h_d(x)$. In the offline setting, where all items are given and keys need only be matched to locations, it is possible to support a load factor of $1 - ε$ while using $d = \lceil \ln ε^{-1} + o(1) \rceil$ hashes. The online setting, where keys are moved as new keys arrive sequentially, has the additional challenge of the time to insert new keys, and it has not been known whether one can use $d = O(\ln ε^{-1})$ hashes to support $\poly(ε^{-1})$ expected-time insertions. In this paper, we introduce bubble-up cuckoo hashing, an implementation of $d$-ary cuckoo hashing that achieves all of the following properties simultaneously: (1) uses $d = \lceil \ln ε^{-1} + α\rceil$ hash locations per item for an arbitrarily small positive constant $α$. (2) achieves expected insertion time $O(δ^{-1})$ for any insertion taking place at load factor $1 - δ\le 1 - ε$. (3) achieves expected positive query time $O(1)$, independent of $d$ and $ε$. The first two properties give an essentially optimal value of $d$ without compromising insertion time. The third property is interesting even in the offline setting: it says that, even though \emph{negative} queries must take time $d$, positive queries can actually be implemented in $O(1)$ expected time, even when $d$ is large.

Efficient $d$-ary Cuckoo Hashing at High Load Factors by Bubbling Up

TL;DR

Bubble-up cuckoo hashing delivers an online, high-load -ary cuckoo hash design that matches offline optimality in the required number of hash functions while preserving fast operations. By organizing the process into phases and exploiting a growing core table with a core-independence property, the method achieves with insertion time for load , and positive query time independent of and . The core/bubble-up framework ensures that most probes occur in the core, enabling efficient insertions and constant-time positive queries even as grows. The work also accommodates deletions via tombstones with manageable rebuild costs and discusses practical extensions to bucketized and tabulation-based hashing. Overall, the approach narrows the gap between online and offline capabilities for high-load -ary cuckoo hashing and opens avenues for robust, scalable hash-table implementations.

Abstract

A -ary cuckoo hash table is an open-addressed hash table that stores each key in one of random positions . In the offline setting, where all items are given and keys need only be matched to locations, it is possible to support a load factor of while using hashes. The online setting, where keys are moved as new keys arrive sequentially, has the additional challenge of the time to insert new keys, and it has not been known whether one can use hashes to support expected-time insertions. In this paper, we introduce bubble-up cuckoo hashing, an implementation of -ary cuckoo hashing that achieves all of the following properties simultaneously: (1) uses hash locations per item for an arbitrarily small positive constant . (2) achieves expected insertion time for any insertion taking place at load factor . (3) achieves expected positive query time , independent of and . The first two properties give an essentially optimal value of without compromising insertion time. The third property is interesting even in the offline setting: it says that, even though \emph{negative} queries must take time , positive queries can actually be implemented in expected time, even when is large.
Paper Structure (32 sections, 26 theorems, 59 equations)

This paper contains 32 sections, 26 theorems, 59 equations.

Key Result

Theorem 1.1

Let $\alpha \in (0, 1)$ be a positive constant. Let $\epsilon \in (n^{-1/4}, 1)$ be sufficiently small as a function of $\alpha$ (i.e., $\epsilon$ is at most a small constant). There exists an implementation of $d$-ary cuckoo hashing that:

Theorems & Definitions (46)

  • Theorem 1.1: Restated later as Theorem \ref{['thm:main']}
  • Theorem 3.1: $2$-ary cuckoo hashing pagh2004cuckoo
  • Theorem 3.2: Random-walk $d$-ary cuckoo hashing bell20241
  • Theorem 3.3: McDiarmid's Inequality mcdiarmid1989method
  • Corollary 3.0
  • Theorem 4.1
  • Proposition 4.0
  • Lemma 4.1
  • proof
  • Lemma 4.2: The Core Independence Property
  • ...and 36 more