Adaptive Hashing: Faster Hash Functions with Fewer Collisions
Gábor Melis
TL;DR
This work argues that fixing a hash function for the lifetime of a hash table is suboptimal and introduces online adaptive hashing that tunes the function to the evolving key set with minimal overhead and no API changes. It formalizes a cost framework and uses rehashing as the mechanism to switch between Hash variants (Constant, Arithmetic, Pointer-based mixes) guided by observed collisions and max-chain-length. The authors demonstrate substantial gains for string and integer/pointer keys, including cases where the adaptive approach acts like a perfect hash while retaining robustness against worst-case scenarios. Empirical results from SBCL show practical improvements in both microbenchmarks and macrobenchmarks, with open-source implementations enabling reproducibility and further exploration of adaptive strategies.
Abstract
Hash tables are ubiquitous, and the choice of hash function, which maps a key to a bucket, is key to their performance. We argue that the predominant approach of fixing the hash function for the lifetime of the hash table is suboptimal and propose adapting it to the current set of keys. In the prevailing view, good hash functions spread the keys ``randomly'' and are fast to evaluate. General-purpose ones (e.g. Murmur) are designed to do both while remaining agnostic to the distribution of the keys, which limits their bucketing ability and wastes computation. When these shortcomings are recognized, one may specify a hash function more tailored to some assumed key distribution, but doing so almost always introduces an unbounded risk in case this assumption does not bear out in practice. At the other, fully key-aware end of the spectrum, Perfect Hashing algorithms can discover hash functions to bucket a given set of keys optimally, but they are costly to run and require the keys to be known and fixed ahead of time. Our main conceptual contribution is that adapting the hash table's hash function to the keys online is necessary for the best performance, as adaptivity allows for better bucketing of keys \emph{and} faster hash functions. We instantiate the idea of online adaptation with minimal overhead and no change to the hash table API. The experiments show that the adaptive approach marries the common-case performance of weak hash functions with the robustness of general-purpose ones.
