Optimal Non-Oblivious Open Addressing
Michael A. Bender, William Kuszmaul, Renfei Zhou
TL;DR
This work shows that non-oblivious open-addressed hash tables can defy traditional space-time tradeoffs by achieving constant-time operations at load factor 1, even during dynamic resizing. The core idea, partner hashing, encodes a large RAM inside the relative ordering of elements and relies on a retrieval data structure to access metadata, while maintaining all data in a single array with no external metadata. The paper develops a fixed-size construction with strong high-probability guarantees and then extends it to dynamic resizing, using a multi-tier RAM architecture (dense and sparse) and a backyard for overflow keys, all while requiring only $O(1)$-wise independent hash functions. Together, these results demonstrate that non-oblivious open-addressing can bypass prior lower bounds for oblivious variants and have practical implications for designing fast, space-efficient hash tables with robust worst-case performance. The techniques also highlight a novel RAM-encoding approach inside an implicit data structure, with implications for succinct and implicit data structure design beyond hashing.
Abstract
A hash table is said to be open-addressed (or non-obliviously open-addressed) if it stores elements (and free slots) in an array with no additional metadata. Intuitively, open-addressed hash tables must incur a space-time tradeoff: The higher the load factor at which the hash table operates, the longer insertions/deletions/queries should take. In this paper, we show that no such tradeoff exists: It is possible to construct an open-addressed hash table that supports constant-time operations even when the hash table is entirely full. In fact, it is even possible to construct a version of this data structure that: (1) is dynamically resized so that the number of slots in memory that it uses, at any given moment, is the same as the number of elements it contains; (2) supports $O(1)$-time operations, not just in expectation, but with high probability; and (3) requires external access to just $O(1)$ hash functions that are each just $O(1)$-wise independent. Our results complement a recent lower bound by Bender, Kuszmaul, and Zhou showing that oblivious open-addressed hash tables must incur $Ω(\log \log \varepsilon^{-1})$-time operations. The hash tables in this paper are non-oblivious, which is why they are able to bypass the previous lower bound.
