Table of Contents
Fetching ...

Linear Hashing Is Optimal

Michael Jaber, Vinayak M. Kumar, David Zuckerman

TL;DR

This work resolves whether a simple linear-hashing scheme over GF(2) matches the max-load of fully random hashing. By introducing potential functions and a careful decomposition into kernel vectors, the authors prove that hashing n balls into n bins via a random linear map achieves an expected maximum load of O(log n / log log n), matching the classic random-function bound up to constants. They extend the result to m balls and n bins and derive tail bounds with quadratic decay, yielding both two-sided and high-probability guarantees. The findings justify the practicality and effectiveness of linear hash families for load balancing, offering a conceptually simple, implementable alternative to more complex k-wise independent schemes while preserving near-optimal performance. The approach has implications for hashing with chaining, incremental cryptography, and streaming algorithms where simplicity and efficiency are critical.

Abstract

We prove that hashing $n$ balls into $n$ bins via a random matrix over $\mathbf{F}_2$ yields expected maximum load $O(\log n / \log \log n)$. This matches the expected maximum load of a fully random function and resolves an open question posed by Alon, Dietzfelbinger, Miltersen, Petrank, and Tardos (STOC '97, JACM '99). More generally, we show that the maximum load exceeds $r\cdot\log n/\log\log n$ with probability at most $O(1/r^2)$.

Linear Hashing Is Optimal

TL;DR

This work resolves whether a simple linear-hashing scheme over GF(2) matches the max-load of fully random hashing. By introducing potential functions and a careful decomposition into kernel vectors, the authors prove that hashing n balls into n bins via a random linear map achieves an expected maximum load of O(log n / log log n), matching the classic random-function bound up to constants. They extend the result to m balls and n bins and derive tail bounds with quadratic decay, yielding both two-sided and high-probability guarantees. The findings justify the practicality and effectiveness of linear hash families for load balancing, offering a conceptually simple, implementable alternative to more complex k-wise independent schemes while preserving near-optimal performance. The approach has implications for hashing with chaining, incremental cryptography, and streaming algorithms where simplicity and efficiency are critical.

Abstract

We prove that hashing balls into bins via a random matrix over yields expected maximum load . This matches the expected maximum load of a fully random function and resolves an open question posed by Alon, Dietzfelbinger, Miltersen, Petrank, and Tardos (STOC '97, JACM '99). More generally, we show that the maximum load exceeds with probability at most .

Paper Structure

This paper contains 19 sections, 19 theorems, 80 equations, 1 figure.

Key Result

Theorem 1

Let $u\ge \ell\ge 1$ be integers, $n\coloneqq 2^\ell$, and $\mathcal{H}$ the set of linear maps $\mathbb{F}_2^u\to\mathbb{F}_2^\ell$. For any $S\subseteq \mathbb{F}_2^u$ with cardinality $n$, we have

Figures (1)

  • Figure :

Theorems & Definitions (49)

  • Theorem 1
  • Theorem 2: \ref{['thm:startthingswithaBANG']} generalized
  • Theorem 3
  • Theorem 4
  • Theorem 5: dhardvir, Theorem 2.4
  • Definition 1
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • ...and 39 more