Table of Contents
Fetching ...

On the Relationship Between Several Variants of the Linear Hashing Conjecture

Alek Westover

TL;DR

This paper investigates the worst-case maxload of Linear Hashing (LH) and several natural variants across prime, composite, and real moduli, focusing on the open question of whether the maxload can be bounded by $n^{o(1)}$. It develops a framework of intermediate questions connecting prime, integer, and real modulus models, and proves partial progress including a transfer of Knudsen’s $\tilde{O}(n^{1/3})$ bound to composite moduli via Smart LH, alongside a strong two-bin result. A key contribution is establishing a surprising equivalence between Real LH and Random Modulus LH, suggesting the subpolynomial maxload behavior may be robust to the choice of modulus; the analysis leverages Farey sequences and careful decomposition of collision structures. Together, these results provide evidence that composite-modulus LH may mirror prime-modulus behavior and illustrate concrete, tractable directions (such as the two-bin case) toward resolving the broader open problem on LH maxload. The work lays a foundation for further reductions and cross-modulus insights with potential implications for hash function design in practice.

Abstract

In Linear Hashing ($\mathsf{LH}$) with $β$ bins on a size $u$ universe ${\mathcal{U}=\{0,1,\ldots, u-1\}}$, items $\{x_1,x_2,\ldots, x_n\}\subset \mathcal{U}$ are placed in bins by the hash function $$x_i\mapsto (ax_i+b)\mod p \mod β$$ for some prime $p\in [u,2u]$ and randomly chosen integers $a,b \in [1,p]$. The "maxload" of $\mathsf{LH}$ is the number of items assigned to the fullest bin. Expected maxload for a worst-case set of items is a natural measure of how well $\mathsf{LH}$ distributes items amongst the bins. Fix $β=n$. Despite $\mathsf{LH}$'s simplicity, bounding $\mathsf{LH}$'s worst-case maxload is extremely challenging. It is well-known that on random inputs $\mathsf{LH}$ achieves maxload $Ω\left(\frac{\log n}{\log\log n}\right)$; this is currently the best lower bound for $\mathsf{LH}$'s expected maxload. Recently Knudsen established an upper bound of $\widetilde{O}(n^{1 / 3})$. The question "Is the worst-case expected maxload of $\mathsf{LH}$ $n^{o(1)}$?" is one of the most basic open problems in discrete math. In this paper we propose a set of intermediate open questions to help researchers make progress on this problem. We establish the relationship between these intermediate open questions and make some partial progress on them.

On the Relationship Between Several Variants of the Linear Hashing Conjecture

TL;DR

This paper investigates the worst-case maxload of Linear Hashing (LH) and several natural variants across prime, composite, and real moduli, focusing on the open question of whether the maxload can be bounded by . It develops a framework of intermediate questions connecting prime, integer, and real modulus models, and proves partial progress including a transfer of Knudsen’s bound to composite moduli via Smart LH, alongside a strong two-bin result. A key contribution is establishing a surprising equivalence between Real LH and Random Modulus LH, suggesting the subpolynomial maxload behavior may be robust to the choice of modulus; the analysis leverages Farey sequences and careful decomposition of collision structures. Together, these results provide evidence that composite-modulus LH may mirror prime-modulus behavior and illustrate concrete, tractable directions (such as the two-bin case) toward resolving the broader open problem on LH maxload. The work lays a foundation for further reductions and cross-modulus insights with potential implications for hash function design in practice.

Abstract

In Linear Hashing () with bins on a size universe , items are placed in bins by the hash function for some prime and randomly chosen integers . The "maxload" of is the number of items assigned to the fullest bin. Expected maxload for a worst-case set of items is a natural measure of how well distributes items amongst the bins. Fix . Despite 's simplicity, bounding 's worst-case maxload is extremely challenging. It is well-known that on random inputs achieves maxload ; this is currently the best lower bound for 's expected maxload. Recently Knudsen established an upper bound of . The question "Is the worst-case expected maxload of ?" is one of the most basic open problems in discrete math. In this paper we propose a set of intermediate open questions to help researchers make progress on this problem. We establish the relationship between these intermediate open questions and make some partial progress on them.
Paper Structure (18 sections, 26 theorems, 12 equations)

This paper contains 18 sections, 26 theorems, 12 equations.

Key Result

Proposition 10

Let $m=k\cdot n$ for some $k>n$. There exists an $n$-element set $X\subset [m]$ on which $\bigcirc\mathbb{Z}_m\mathsf{H}$ has maxload $n$.

Theorems & Definitions (40)

  • Remark 8
  • Definition 9
  • Proposition 10
  • Proposition 11
  • Definition 12
  • Definition 13
  • Theorem 14
  • Remark 15
  • Theorem 16
  • Theorem 20
  • ...and 30 more