Linear Hashing Is Optimal

Michael Jaber; Vinayak M. Kumar; David Zuckerman

Linear Hashing Is Optimal

Michael Jaber, Vinayak M. Kumar, David Zuckerman

TL;DR

This work resolves whether a simple linear-hashing scheme over GF(2) matches the max-load of fully random hashing. By introducing potential functions and a careful decomposition into kernel vectors, the authors prove that hashing n balls into n bins via a random linear map achieves an expected maximum load of O(log n / log log n), matching the classic random-function bound up to constants. They extend the result to m balls and n bins and derive tail bounds with quadratic decay, yielding both two-sided and high-probability guarantees. The findings justify the practicality and effectiveness of linear hash families for load balancing, offering a conceptually simple, implementable alternative to more complex k-wise independent schemes while preserving near-optimal performance. The approach has implications for hashing with chaining, incremental cryptography, and streaming algorithms where simplicity and efficiency are critical.

Abstract

We prove that hashing $n$ balls into $n$ bins via a random matrix over $\mathbf{F}_2$ yields expected maximum load $O(\log n / \log \log n)$. This matches the expected maximum load of a fully random function and resolves an open question posed by Alon, Dietzfelbinger, Miltersen, Petrank, and Tardos (STOC '97, JACM '99). More generally, we show that the maximum load exceeds $r\cdot\log n/\log\log n$ with probability at most $O(1/r^2)$.

Linear Hashing Is Optimal

TL;DR

Abstract

Linear Hashing Is Optimal

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (49)