Table of Contents
Fetching ...

Hash & Adjust: Competitive Demand-Aware Consistent Hashing

Arash Pourdamghani, Chen Avin, Robert Sama, Maryam Shiran, Stefan Schmid

Abstract

Distributed systems often serve dynamic workloads and resource demands evolve over time. Such a temporal behavior stands in contrast to the static and demand-oblivious nature of most data structures used by these systems. In this paper, we are particularly interested in consistent hashing, a fundamental building block in many large distributed systems. Our work is motivated by the hypothesis that a more adaptive approach to consistent hashing can leverage structure in the demand, and hence improve storage utilization and reduce access time. We initiate the study of demand-aware consistent hashing. Our main contribution is H&A, a constant-competitive online algorithm (i.e., it comes with provable performance guarantees over time). H&A is demand-aware and optimizes its internal structure to enable faster access times, while offering a high utilization of storage. We further evaluate H&A empirically.

Hash & Adjust: Competitive Demand-Aware Consistent Hashing

Abstract

Distributed systems often serve dynamic workloads and resource demands evolve over time. Such a temporal behavior stands in contrast to the static and demand-oblivious nature of most data structures used by these systems. In this paper, we are particularly interested in consistent hashing, a fundamental building block in many large distributed systems. Our work is motivated by the hypothesis that a more adaptive approach to consistent hashing can leverage structure in the demand, and hence improve storage utilization and reduce access time. We initiate the study of demand-aware consistent hashing. Our main contribution is H&A, a constant-competitive online algorithm (i.e., it comes with provable performance guarantees over time). H&A is demand-aware and optimizes its internal structure to enable faster access times, while offering a high utilization of storage. We further evaluate H&A empirically.

Paper Structure

This paper contains 19 sections, 5 theorems, 3 equations, 5 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

proof:thm: MTHC Algorithm Alg: MTHC is $2 \cdot (1+\omega)$ competitive, where $\omega$ is the cost of moving items between two adjacent servers.

Figures (5)

  • Figure 1: Figure \ref{['fig: model']} shows an example of our model with $8$ servers, each with capacity $4$, and $24$ items. In this example, item $v$ was initially inserted into server $head(v)$ (it had the closest hash value); however, because $head(v)$ and $head(v)^+$ were full, it is moved to $host(v)$. Figure \ref{['fig: decomposition']} depicts decomposition of the previous example into ServerLists. Each ServerList is shown by a different color, and servers have different gradients of the color of their ServerList as they are different heads on their own. Items are colored by the color of their host.
  • Figure 2: An example of the ServerList access problem with three heads and a capacity of four. In this figure, the relation between items and their original host is shown by using the same color. Assume access to the item $7$ with $Head_2$. As it is a green item, that access starts from the $Head_2$, the head for green items (Figure \ref{['subfig: access']}). Then we search server $Head_2$ and the servers that come after it for item $7$ (Figure \ref{['subfig: search']}). After accessing item $7$ we swap it with the oldest items in servers between its current server and its host server (Figure \ref{['subfig: reconfigure']}), to reach the final configuration (Figure \ref{['subfig: final']}).
  • Figure 3: \ref{['fig: intro-prac']} compares the average cost and memory utilization of the H&A and the WBL algorithm MirrokniTZ18 and Traditional method KargerClassic1. We normalized the access cost by the number of items. This figure considers $100,000$ requests, $10, 000$ items, and $20$ servers, and uses an instance generated by temporal locality $0.75$. Figure \ref{['fig: res-loc-aug']} compares the access cost of WBL with H&A, by varying temporal locality of the input. For this figure, we consider same setup.
  • Figure 4: Comparing average access cost of different algorithms considering the same capacity for all of the algorithms. Figure \ref{['fig: sameCap1']} shows how the cost changes for an increasing number of items. This figure considers $50$ servers and $150,000$ requests. Figure \ref{['fig: sameCap1']} shows how the cost changes for an increasing number of items. Here, the traditional algorithm stops when a server of it becomes full, and for other items, we run the experiment from scratch. The second figure considers $5,000$ items and the rest is similar to previous figure. Both figures consider the CAIDA CAIDA dataset.
  • Figure 5: Comparing the average cost of the H&A and the "With Bounded Load" (WBL) algorithm MirrokniTZ18 based on two parameters of our benchmarking tool. We normalized the access cost by the number of items. The dots in Figure \ref{['fig: stale']} shows the average server capacity by colored dots. First figure considers $1029$ items, and $100,000$ requests, and in second figure we consider $20$ servers. Both figures consider the click dataset dataset.

Theorems & Definitions (10)

  • Definition 1: Competitive ratio
  • Definition 2
  • Theorem 1
  • Definition 3: Inside position
  • Definition 4: Headed relative position
  • Definition 5: Inversion
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Theorem 2