Table of Contents
Fetching ...

Rafture: Erasure-coded Raft with Post-Dissemination Pruning

Rithwik Kerur, Divyakant Agrawal, Michael K. Reiter, Dahlia Malkhi

Abstract

Spreading and storing erasure-coded data in distributed systems effectively is challenging in real settings. Practical deployments must contend with unpredictable network latencies, particularly when information dispersal is integrated into consensus protocols, a prominent and latency-sensitive use case. Existing approaches address this challenge through timeout-based dissemination and adaptive communication or storage decisions driven by acknowledgments during dissemination. However, these designs focus almost exclusively on dissemination-time efficiency, complicate recovery with reconstruction procedures that require metadata that can differ per consensus value, and rely on a centralized leader to make storage decisions for all nodes. This paper introduces \textbf{Rafture}, a novel information dispersal algorithm, and its integration in a consensus protocol, that overcomes these limitations. Rafture is the first information dispersal solution to incorporate post-dissemination pruning, allowing systems to adapt storage costs after dissemination completes. It employs a simple, fixed-threshold erasure code while varying distinct fragment assignment along a second dimension. This ensures that reconstruction is always possible from $F+1$ fragments using the same interpolation method and no additional metadata. Rafture further enables nodes to adapt autonomously based on locally observed information, eliminating the need for global coordination. We evaluate Rafture in highly dynamic network settings and show that it simplifies recovery while significantly improving long-term storage consumption under variable network conditions.

Rafture: Erasure-coded Raft with Post-Dissemination Pruning

Abstract

Spreading and storing erasure-coded data in distributed systems effectively is challenging in real settings. Practical deployments must contend with unpredictable network latencies, particularly when information dispersal is integrated into consensus protocols, a prominent and latency-sensitive use case. Existing approaches address this challenge through timeout-based dissemination and adaptive communication or storage decisions driven by acknowledgments during dissemination. However, these designs focus almost exclusively on dissemination-time efficiency, complicate recovery with reconstruction procedures that require metadata that can differ per consensus value, and rely on a centralized leader to make storage decisions for all nodes. This paper introduces \textbf{Rafture}, a novel information dispersal algorithm, and its integration in a consensus protocol, that overcomes these limitations. Rafture is the first information dispersal solution to incorporate post-dissemination pruning, allowing systems to adapt storage costs after dissemination completes. It employs a simple, fixed-threshold erasure code while varying distinct fragment assignment along a second dimension. This ensures that reconstruction is always possible from fragments using the same interpolation method and no additional metadata. Rafture further enables nodes to adapt autonomously based on locally observed information, eliminating the need for global coordination. We evaluate Rafture in highly dynamic network settings and show that it simplifies recovery while significantly improving long-term storage consumption under variable network conditions.

Paper Structure

This paper contains 20 sections, 2 theorems, 6 figures.

Key Result

Lemma 1

For a system of $N$ nodes tolerating $F$ failures (where $N \ge 2F+1$), sufficient fragments persist to reconstruct a log entry despite $F$ failures.

Figures (6)

  • Figure 1: Scenario illustrating a single node failure (S3). In CRaft (top row), since the leader does not receive a response from S3, it reverts to full replication. In HRaft (2nd row), the leader sends S3's fragment to $F=2$ randomly picked nodes. In FlexRaft (3rd row), the leader re-encodes information using a $(F+1-1, F) = (2, 2)$ encoding scheme and disseminates new, larger fragments $\{F1', F2', F4'\}$ to the responsive nodes. In Rafture, the leader initially disseminates $2$ distinct fragments to each node and no retransmission is required.
  • Figure 2: Analytical evaluation of storage utilization under network partitions.
  • Figure 3: Empirical evaluation of storage utilization under network partition that recovers at 2000 entries
  • Figure 4: Storage cost for $N=99$ and $f$ partitioned nodes.
  • Figure 5: Network hops for CRaft, HRaft, FlexRaft, Rafture during unstable network conditions: a leader receives $N$ responses 60% of the time, with an exponentially decaying probability down to $F + 1$ responses.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Lemma 1
  • proof
  • Lemma 2
  • proof