Optimal Static Dictionary with Worst-Case Constant Query Time
Yang Hu, Jingxun Liang, Huacheng Yu, Junkai Zhang, Renfei Zhou
TL;DR
The paper tackles building a succinct static dictionary with worst-case constant query time, achieving space of OPT plus a subpolynomial redundancy $n^{\varepsilon}$, where OPT $= \log\binom{U}{n}+n\log\sigma$. It introduces augmented retrieval and spillover representations to store variable-length components and per-bucket data, enabling near-OPT space while ensuring constant-time queries in the cell-probe model and extending to Word RAM. A key technical contribution is a sparsification technique via a hierarchical matrix construction and base-conversion-enabled retrieval, which allows concatenation of many small structures without incurring large redundancy. The work improves prior results by combining worst-case optimal space with constant-time queries and offers a framework (augmented redundancy) that may influence future static data-structure designs and open problems related to redundancy budgeting and construction complexity.
Abstract
In this paper, we design a new succinct static dictionary with worst-case constant query time. A dictionary data structure stores a set of key-value pairs with distinct keys in $[U]$ and values in $[σ]$, such that given a query $x\in [U]$, it quickly returns if $x$ is one of the input keys, and if so, also returns its associated value. The textbook solution to dictionaries is hash tables. On the other hand, the (information-theoretical) optimal space to encode such a set of key-value pairs is only $\text{OPT} := \log\binom{U}{n}+n\log σ$. We construct a dictionary that uses $\text{OPT} + n^ε$ bits of space, and answers queries in constant time in worst case. Previously, constant-time dictionaries are only known with $\text{OPT} + n/\text{poly}\log n$ space [Pǎtraşcu 2008], or with $\text{OPT}+n^ε$ space but expected constant query time [Yu 2020]. We emphasize that most of the extra $n^ε$ bits are used to store a lookup table that does not depend on the input, and random bits for hash functions. The "main" data structure only occupies $\text{OPT}+\text{poly}\log n$ bits.
