JumpBackHash: Say Goodbye to the Modulo Operation to Distribute Keys Uniformly to Buckets
Otmar Ertl
TL;DR
JumpBackHash introduces a float-free, consistent hashing algorithm with expected constant runtime by leveraging active indices and efficient reverse-order generation. It builds on the concept of active indices from consistent weighted sampling to achieve uniform and monotonic key distribution across dynamic bucket counts, while avoiding floating-point arithmetic and maintaining a standard PRG interface. The paper provides rigorous runtime analysis showing an expected number of random value consumptions in a tight, small range and validates these findings with extensive experiments, including consistency tests and performance benchmarks. With a production-ready Java implementation in the Hash4j library, JumpBackHash offers a practical, fast replacement for modulo-based bucketing in distributed systems, reducing reassignments and improving stability.
Abstract
Introduction. Distributed data processing and storage systems require efficient methods to distribute keys across buckets. While simple and fast, the traditional modulo-based mapping is unstable when the number of buckets changes, leading to spikes in system resource utilization, such as network or database requests. Consistent hash algorithms minimize remappings but are either significantly slower, require floating-point arithmetic, or are based on a family of hash functions rarely available in standard libraries. This work introduces JumpBackHash, a consistent hash algorithm that overcomes those shortcomings. Methodology. JumpBackHash applies the concept of active indices borrowed from consistent weighted sampling, which inherently leads to consistency. It generates the active indices in reverse order, which avoids floating-point operations, enables the minimization of consumed random values and the use of a standard pseudorandom generator, and finally leads to a very efficient algorithm. Results. Theoretical analysis shows that JumpBackHash has an expected constant runtime. The expected value and the variance of the number of consumed random values perfectly agree with the experiments. Empirical tests also confirm the consistency. Conclusion. JumpBackHash offers a fast and efficient solution for uniformly distributing keys across buckets in distributed systems. Its simplicity, performance, and the availability of a production-ready Java implementation as part of the Hash4j open source library make it a viable replacement for the modulo-based approach for improving assignment and system stability.
