Linear Hashing Is Optimal
Michael Jaber, Vinayak M. Kumar, David Zuckerman
TL;DR
This work resolves whether a simple linear-hashing scheme over GF(2) matches the max-load of fully random hashing. By introducing potential functions and a careful decomposition into kernel vectors, the authors prove that hashing n balls into n bins via a random linear map achieves an expected maximum load of O(log n / log log n), matching the classic random-function bound up to constants. They extend the result to m balls and n bins and derive tail bounds with quadratic decay, yielding both two-sided and high-probability guarantees. The findings justify the practicality and effectiveness of linear hash families for load balancing, offering a conceptually simple, implementable alternative to more complex k-wise independent schemes while preserving near-optimal performance. The approach has implications for hashing with chaining, incremental cryptography, and streaming algorithms where simplicity and efficiency are critical.
Abstract
We prove that hashing $n$ balls into $n$ bins via a random matrix over $\mathbf{F}_2$ yields expected maximum load $O(\log n / \log \log n)$. This matches the expected maximum load of a fully random function and resolves an open question posed by Alon, Dietzfelbinger, Miltersen, Petrank, and Tardos (STOC '97, JACM '99). More generally, we show that the maximum load exceeds $r\cdot\log n/\log\log n$ with probability at most $O(1/r^2)$.
