Table of Contents
Fetching ...

Optimal Static Dictionary with Worst-Case Constant Query Time

Yang Hu, Jingxun Liang, Huacheng Yu, Junkai Zhang, Renfei Zhou

TL;DR

The paper tackles building a succinct static dictionary with worst-case constant query time, achieving space of OPT plus a subpolynomial redundancy $n^{\varepsilon}$, where OPT $= \log\binom{U}{n}+n\log\sigma$. It introduces augmented retrieval and spillover representations to store variable-length components and per-bucket data, enabling near-OPT space while ensuring constant-time queries in the cell-probe model and extending to Word RAM. A key technical contribution is a sparsification technique via a hierarchical matrix construction and base-conversion-enabled retrieval, which allows concatenation of many small structures without incurring large redundancy. The work improves prior results by combining worst-case optimal space with constant-time queries and offers a framework (augmented redundancy) that may influence future static data-structure designs and open problems related to redundancy budgeting and construction complexity.

Abstract

In this paper, we design a new succinct static dictionary with worst-case constant query time. A dictionary data structure stores a set of key-value pairs with distinct keys in $[U]$ and values in $[σ]$, such that given a query $x\in [U]$, it quickly returns if $x$ is one of the input keys, and if so, also returns its associated value. The textbook solution to dictionaries is hash tables. On the other hand, the (information-theoretical) optimal space to encode such a set of key-value pairs is only $\text{OPT} := \log\binom{U}{n}+n\log σ$. We construct a dictionary that uses $\text{OPT} + n^ε$ bits of space, and answers queries in constant time in worst case. Previously, constant-time dictionaries are only known with $\text{OPT} + n/\text{poly}\log n$ space [Pǎtraşcu 2008], or with $\text{OPT}+n^ε$ space but expected constant query time [Yu 2020]. We emphasize that most of the extra $n^ε$ bits are used to store a lookup table that does not depend on the input, and random bits for hash functions. The "main" data structure only occupies $\text{OPT}+\text{poly}\log n$ bits.

Optimal Static Dictionary with Worst-Case Constant Query Time

TL;DR

The paper tackles building a succinct static dictionary with worst-case constant query time, achieving space of OPT plus a subpolynomial redundancy , where OPT . It introduces augmented retrieval and spillover representations to store variable-length components and per-bucket data, enabling near-OPT space while ensuring constant-time queries in the cell-probe model and extending to Word RAM. A key technical contribution is a sparsification technique via a hierarchical matrix construction and base-conversion-enabled retrieval, which allows concatenation of many small structures without incurring large redundancy. The work improves prior results by combining worst-case optimal space with constant-time queries and offers a framework (augmented redundancy) that may influence future static data-structure designs and open problems related to redundancy budgeting and construction complexity.

Abstract

In this paper, we design a new succinct static dictionary with worst-case constant query time. A dictionary data structure stores a set of key-value pairs with distinct keys in and values in , such that given a query , it quickly returns if is one of the input keys, and if so, also returns its associated value. The textbook solution to dictionaries is hash tables. On the other hand, the (information-theoretical) optimal space to encode such a set of key-value pairs is only . We construct a dictionary that uses bits of space, and answers queries in constant time in worst case. Previously, constant-time dictionaries are only known with space [Pǎtraşcu 2008], or with space but expected constant query time [Yu 2020]. We emphasize that most of the extra bits are used to store a lookup table that does not depend on the input, and random bits for hash functions. The "main" data structure only occupies bits.

Paper Structure

This paper contains 30 sections, 21 theorems, 24 equations, 4 figures.

Key Result

Theorem 1.1

In the word RAM model with word size $w = \Theta(\log n)$, there is a static dictionary storing $n$ keys from a universe of size $U \in [2n,\, \mathop{\mathrm{poly}}\nolimits n]$ and values from a universe of size $\sigma \in [1,\, \mathop{\mathrm{poly}}\nolimits n]$, using $\mathbf{OPT} + \mathop{\

Figures (4)

  • Figure 1: Sparsifying a row
  • Figure 2: Storing all spills of type $s$. (1) First, we convert part of $m_{\textup{fix}}$ from base $2^w$ to base $P^{(s)}$ using \ref{['thm:changing_base']}, getting $m_{\text{conv}}^{(s)}$; we also "round up" the universe of spills from $[K^{(s)}]$ to $[P^{(s)}]$. (2) Next, we build the augmented retrieval by \ref{['thm:augmented_retrieval']} using $m_{\text{conv}}^{(s)}$ as augmented elements, getting the representation of this data structure $m_{\text{retr}}^{(s)}$. (3) Finally, we convert it back to binary words using \ref{['thm:changing_base']}.
  • Figure 3: Tree of blocks with $h = 4$ levels. Every rectangle represents a block on the tree, in which the hatched area represents the supplementary columns of a block, and the gray area represents the locations of non-zero entries in the matrix.
  • Figure 4: Gaussian elimination on matrix $M$. (a) The matrix is partitioned into $3 \times 3$ blocks, where they correspond to row sets $Q_0, Q_1, Q_2$ from top to bottom, and column sets $C_1, C_2, C_0$ from left to right. The third column set $C_0$ consists of the $2\Delta_i$ supplementary columns of block $u$. Gray area of the matrix represents unknown non-zero entries. (b) Perform elimination within each child's submatrix, obtaining two identity submatrices. (c) Use two children's submatrices to eliminate entries on $Q_0$ above the identity parts. The non-zero entries in the hatched area changed during this step. (d) Permute the columns. We show that $M_{Q_0, T}$ has full rank.

Theorems & Definitions (31)

  • Theorem 1.1
  • Lemma 3.1
  • Corollary 3.2
  • Lemma 3.2
  • Theorem 4.1
  • Lemma 4.2
  • Lemma 4.3
  • Lemma 4.4: NextPrime
  • Theorem 4.5: Implicit in dodis2010changing
  • proof : Proof of \ref{['thm:cell_probe_main']}
  • ...and 21 more