Simplified Tight Bounds for Monotone Minimal Perfect Hashing
Dmitry Kosolobov
TL;DR
This work resolves the space complexity of monotone minimal perfect hash functions (MMPHFs) across the practical parameter range by establishing a tight lower bound of $\Omega\bigl(n \,\min\{\log\log\log \frac{u}{n}, \log n\}\bigr)$ bits for $u \ge (1+\varepsilon)n$, and showing this bound is achievable via a straightforward extension of Belazzougui et al.'s construction. The authors simplify Assadi et al.'s proof by removing heavy combinatorial machinery and rely primarily on probabilistic coloring arguments, complemented by a detailed, though non-novel, core component that matches previous techniques. They also provide a reduction to very large universe sizes to extend the bound across a broad range of $u$, and elucidate tight upper bounds via a bucketed, concatenated-structure construction that achieves the same space bound in} the tight regime. The results virtually settle the MMPHF space-usage problem for all reasonable $u$, and the approach offers a clearer probabilistic perspective on the interplay between colorings and data-structure encodings with potential applicability to related hashing problems.
Abstract
Given an increasing sequence of integers $x_1,\ldots,x_n$ from a universe $\{0,\ldots,u-1\}$, the monotone minimal perfect hash function (MMPHF) for this sequence is a data structure that answers the following rank queries: $rank(x) = i$ if $x = x_i$, for $i\in \{1,\ldots,n\}$, and $rank(x)$ is arbitrary otherwise. Assadi, Farach-Colton, and Kuszmaul recently presented at SODA'23 a proof of the lower bound $Ω(n \min\{\log\log\log u, \log n\})$ for the bits of space required by MMPHF, provided $u \ge n 2^{2^{\sqrt{\log\log n}}}$, which is tight since there is a data structure for MMPHF that attains this space bound (and answers the queries in $O(\log u)$ time). In this paper, we close the remaining gap by proving that, for $u \ge (1+ε)n$, where $ε> 0$ is any constant, the tight lower bound is $Ω(n \min\{\log\log\log \frac{u}{n}, \log n\})$, which is also attainable; we observe that, for all reasonable cases when $n < u < (1+ε)n$, known facts imply tight bounds, which virtually settles the problem. Along the way we substantially simplify the proof of Assadi et al. replacing a part of their heavy combinatorial machinery by trivial observations. However, an important part of the proof still remains complicated. This part of our paper repeats arguments of Assadi et al. and is not novel. Nevertheless, we include it, for completeness, offering a somewhat different perspective on these arguments.
