Table of Contents
Fetching ...

Optimal Non-Oblivious Open Addressing

Michael A. Bender, William Kuszmaul, Renfei Zhou

TL;DR

This work shows that non-oblivious open-addressed hash tables can defy traditional space-time tradeoffs by achieving constant-time operations at load factor 1, even during dynamic resizing. The core idea, partner hashing, encodes a large RAM inside the relative ordering of elements and relies on a retrieval data structure to access metadata, while maintaining all data in a single array with no external metadata. The paper develops a fixed-size construction with strong high-probability guarantees and then extends it to dynamic resizing, using a multi-tier RAM architecture (dense and sparse) and a backyard for overflow keys, all while requiring only $O(1)$-wise independent hash functions. Together, these results demonstrate that non-oblivious open-addressing can bypass prior lower bounds for oblivious variants and have practical implications for designing fast, space-efficient hash tables with robust worst-case performance. The techniques also highlight a novel RAM-encoding approach inside an implicit data structure, with implications for succinct and implicit data structure design beyond hashing.

Abstract

A hash table is said to be open-addressed (or non-obliviously open-addressed) if it stores elements (and free slots) in an array with no additional metadata. Intuitively, open-addressed hash tables must incur a space-time tradeoff: The higher the load factor at which the hash table operates, the longer insertions/deletions/queries should take. In this paper, we show that no such tradeoff exists: It is possible to construct an open-addressed hash table that supports constant-time operations even when the hash table is entirely full. In fact, it is even possible to construct a version of this data structure that: (1) is dynamically resized so that the number of slots in memory that it uses, at any given moment, is the same as the number of elements it contains; (2) supports $O(1)$-time operations, not just in expectation, but with high probability; and (3) requires external access to just $O(1)$ hash functions that are each just $O(1)$-wise independent. Our results complement a recent lower bound by Bender, Kuszmaul, and Zhou showing that oblivious open-addressed hash tables must incur $Ω(\log \log \varepsilon^{-1})$-time operations. The hash tables in this paper are non-oblivious, which is why they are able to bypass the previous lower bound.

Optimal Non-Oblivious Open Addressing

TL;DR

This work shows that non-oblivious open-addressed hash tables can defy traditional space-time tradeoffs by achieving constant-time operations at load factor 1, even during dynamic resizing. The core idea, partner hashing, encodes a large RAM inside the relative ordering of elements and relies on a retrieval data structure to access metadata, while maintaining all data in a single array with no external metadata. The paper develops a fixed-size construction with strong high-probability guarantees and then extends it to dynamic resizing, using a multi-tier RAM architecture (dense and sparse) and a backyard for overflow keys, all while requiring only -wise independent hash functions. Together, these results demonstrate that non-oblivious open-addressing can bypass prior lower bounds for oblivious variants and have practical implications for designing fast, space-efficient hash tables with robust worst-case performance. The techniques also highlight a novel RAM-encoding approach inside an implicit data structure, with implications for succinct and implicit data structure design beyond hashing.

Abstract

A hash table is said to be open-addressed (or non-obliviously open-addressed) if it stores elements (and free slots) in an array with no additional metadata. Intuitively, open-addressed hash tables must incur a space-time tradeoff: The higher the load factor at which the hash table operates, the longer insertions/deletions/queries should take. In this paper, we show that no such tradeoff exists: It is possible to construct an open-addressed hash table that supports constant-time operations even when the hash table is entirely full. In fact, it is even possible to construct a version of this data structure that: (1) is dynamically resized so that the number of slots in memory that it uses, at any given moment, is the same as the number of elements it contains; (2) supports -time operations, not just in expectation, but with high probability; and (3) requires external access to just hash functions that are each just -wise independent. Our results complement a recent lower bound by Bender, Kuszmaul, and Zhou showing that oblivious open-addressed hash tables must incur -time operations. The hash tables in this paper are non-oblivious, which is why they are able to bypass the previous lower bound.

Paper Structure

This paper contains 37 sections, 6 theorems, 11 equations, 5 figures, 11 algorithms.

Key Result

Theorem 3.1

Assuming access to a constant number of fully random hash functions, there is an open-addressing hash table that maintains $n ([)]{1 - \frac{1}{\log^c n}} \pm O(1)$ keys in $n$ slots (for any $c > 100$), supports insertions and deletions in $O(1)$ expected time, and supports queries in $O(1)$ worst-

Figures (5)

  • Figure 1: Using any one of $x, y, i, j$ to recover the others. (a) Given $i$ or $j$, we can recover $x$ and $y$, respectively, using the physical layout; (b) given $x$ or $y$, we can recover $j$ and $i$, respectively, using $h$ and the retrieval data structure (in particular $j$ and $i$ are the logical addresses for $x$ and $y$, respectively). Thus $i$ determines $x$, which determines $j$, which determines $y$, which determines $i$.
  • Figure 2: Index bins and partner bins. There are $m/2$ index and partner bins respectively, each containing $B = \mathop{\mathrm{poly}}\nolimits \log n$ slots. To encode a value $v_i$ in the $i$-th word, we swap the key $x$ stored in slot $\texttt{index}(i)$ with an arbitrary self-loop $y$ in the $([)]{(v_i \oplus r_i) + 1}$-th partner bin.
  • Figure 3: Dependency graph for the procedures.
  • Figure 4: Storing the advanced RAM using two basic RAMs. Each word $v_i$ ($i \in [mB/10]$) in the advanced RAM is either (1) stored in word $i$ of the dense RAM, while word $i$ of the sparse RAM remains empty; or (2) stored in word $i$ of the sparse RAM, while word $i$ of the dense RAM can be arbitrary. The last $mB/40$ words of the sparse RAM maintains the buffer queue, which contains all locations $i \in [mB/10]$ with buffered updates in the sparse RAM.
  • Figure 5: Dependency graph for the advanced RAM. Solid lines show a topological order.

Theorems & Definitions (22)

  • Theorem 3.1
  • Claim 3.2
  • proof
  • Theorem 3.3: demaine2006dictionariis
  • Claim 3.4
  • proof
  • Theorem 3.4
  • proof
  • Theorem 4.1
  • Claim 4.2
  • ...and 12 more