Table of Contents
Fetching ...

JumpBackHash: Say Goodbye to the Modulo Operation to Distribute Keys Uniformly to Buckets

Otmar Ertl

TL;DR

JumpBackHash introduces a float-free, consistent hashing algorithm with expected constant runtime by leveraging active indices and efficient reverse-order generation. It builds on the concept of active indices from consistent weighted sampling to achieve uniform and monotonic key distribution across dynamic bucket counts, while avoiding floating-point arithmetic and maintaining a standard PRG interface. The paper provides rigorous runtime analysis showing an expected number of random value consumptions in a tight, small range and validates these findings with extensive experiments, including consistency tests and performance benchmarks. With a production-ready Java implementation in the Hash4j library, JumpBackHash offers a practical, fast replacement for modulo-based bucketing in distributed systems, reducing reassignments and improving stability.

Abstract

Introduction. Distributed data processing and storage systems require efficient methods to distribute keys across buckets. While simple and fast, the traditional modulo-based mapping is unstable when the number of buckets changes, leading to spikes in system resource utilization, such as network or database requests. Consistent hash algorithms minimize remappings but are either significantly slower, require floating-point arithmetic, or are based on a family of hash functions rarely available in standard libraries. This work introduces JumpBackHash, a consistent hash algorithm that overcomes those shortcomings. Methodology. JumpBackHash applies the concept of active indices borrowed from consistent weighted sampling, which inherently leads to consistency. It generates the active indices in reverse order, which avoids floating-point operations, enables the minimization of consumed random values and the use of a standard pseudorandom generator, and finally leads to a very efficient algorithm. Results. Theoretical analysis shows that JumpBackHash has an expected constant runtime. The expected value and the variance of the number of consumed random values perfectly agree with the experiments. Empirical tests also confirm the consistency. Conclusion. JumpBackHash offers a fast and efficient solution for uniformly distributing keys across buckets in distributed systems. Its simplicity, performance, and the availability of a production-ready Java implementation as part of the Hash4j open source library make it a viable replacement for the modulo-based approach for improving assignment and system stability.

JumpBackHash: Say Goodbye to the Modulo Operation to Distribute Keys Uniformly to Buckets

TL;DR

JumpBackHash introduces a float-free, consistent hashing algorithm with expected constant runtime by leveraging active indices and efficient reverse-order generation. It builds on the concept of active indices from consistent weighted sampling to achieve uniform and monotonic key distribution across dynamic bucket counts, while avoiding floating-point arithmetic and maintaining a standard PRG interface. The paper provides rigorous runtime analysis showing an expected number of random value consumptions in a tight, small range and validates these findings with extensive experiments, including consistency tests and performance benchmarks. With a production-ready Java implementation in the Hash4j library, JumpBackHash offers a practical, fast replacement for modulo-based bucketing in distributed systems, reducing reassignments and improving stability.

Abstract

Introduction. Distributed data processing and storage systems require efficient methods to distribute keys across buckets. While simple and fast, the traditional modulo-based mapping is unstable when the number of buckets changes, leading to spikes in system resource utilization, such as network or database requests. Consistent hash algorithms minimize remappings but are either significantly slower, require floating-point arithmetic, or are based on a family of hash functions rarely available in standard libraries. This work introduces JumpBackHash, a consistent hash algorithm that overcomes those shortcomings. Methodology. JumpBackHash applies the concept of active indices borrowed from consistent weighted sampling, which inherently leads to consistency. It generates the active indices in reverse order, which avoids floating-point operations, enables the minimization of consumed random values and the use of a standard pseudorandom generator, and finally leads to a very efficient algorithm. Results. Theoretical analysis shows that JumpBackHash has an expected constant runtime. The expected value and the variance of the number of consumed random values perfectly agree with the experiments. Empirical tests also confirm the consistency. Conclusion. JumpBackHash offers a fast and efficient solution for uniformly distributing keys across buckets in distributed systems. Its simplicity, performance, and the availability of a production-ready Java implementation as part of the Hash4j open source library make it a viable replacement for the modulo-based approach for improving assignment and system stability.
Paper Structure (18 sections, 29 equations, 9 figures, 1 table, 6 algorithms)

This paper contains 18 sections, 29 equations, 9 figures, 1 table, 6 algorithms.

Figures (9)

  • Figure 1: When changing the number of buckets, some keys (letters) must be mapped to different buckets to restore balance. In contrast to the modulo operation, which puts almost all keys into different buckets, consistent hashing minimizes the expected number of reassignments (arrows).
  • Figure 2: The measured mean and variance of the number of consumed random values for JumpHash and JumpBackHash. JumpBackHash* corresponds to the case when a single random value is split into two as in \ref{['alg:jump_back_hash_java']}.
  • Figure 3: Benchmark results on an Amazon EC2 c5.metal instance with Intel Xeon Platinum 8275CL CPU and processor P-state set to 1. Unlike \ref{['fig:time_complexity']}, a logarithmic scale is used for the y-axis.
  • Figure : Specialization of *ICWS Ioffe2010 to map a key $k$ consistently to $\llbracket n\rrbracket:= \lbrace 0,1,2,\ldots,n-1\rbrace$.
  • Figure : The JumpHash algorithm Lamping2014 consistently maps $k$ to the integer range $\llbracket n\rrbracket:= \lbrace 0,1,2,\ldots,n-1\rbrace$.
  • ...and 4 more figures