Table of Contents
Fetching ...

On Orchestrating Parallel Broadcasts for Distributed Ledgers

Peiyao Sheng, Chenyuan Wu, Dahlia Malkhi, Michael K. Reiter, Chrysoula Stathakopoulou, Michael Wei, Maofan Yin

TL;DR

This work tackles the challenge of efficiently parallelizing Byzantine fault-tolerant atomic broadcast in partially synchronous networks by introducing ticketing to allocate slot-proposal rights. It defines managed, unmanaged, and a hybrid adaptive regime, and provides an epoch-based protocol with formal properties, pricing out the tradeoffs between adaptivity and resilience. The authors prove bounds on slot utilization and chain quality, and show that the hybrid regime enhances throughput and robustness under heterogeneous conditions. Empirical evaluation on CloudLab demonstrates that the hybrid regime achieves high throughput and resilience in both static and dynamic heterogeneous environments, offering a practical approach for robust, scalable distributed ledgers.

Abstract

This paper introduces and develops the concept of ``ticketing'', through which atomic broadcasts are orchestrated by nodes in a distributed system. The paper studies different ticketing regimes that allow parallelism, yet prevent slow nodes from hampering overall progress. It introduces a hybrid scheme which combines managed and unmanaged ticketing regimes, striking a balance between adaptivity and resilience. The performance evaluation demonstrates how managed and unmanaged ticketing regimes benefit throughput in systems with heterogeneous resources both in static and dynamic scenarios, with the managed ticketing regime performing better among the two as it adapts better. Finally, it demonstrates how using the hybrid ticketing regime performance can enjoy both the adaptivity of the managed regime and the liveness guarantees of the unmanaged regime.

On Orchestrating Parallel Broadcasts for Distributed Ledgers

TL;DR

This work tackles the challenge of efficiently parallelizing Byzantine fault-tolerant atomic broadcast in partially synchronous networks by introducing ticketing to allocate slot-proposal rights. It defines managed, unmanaged, and a hybrid adaptive regime, and provides an epoch-based protocol with formal properties, pricing out the tradeoffs between adaptivity and resilience. The authors prove bounds on slot utilization and chain quality, and show that the hybrid regime enhances throughput and robustness under heterogeneous conditions. Empirical evaluation on CloudLab demonstrates that the hybrid regime achieves high throughput and resilience in both static and dynamic heterogeneous environments, offering a practical approach for robust, scalable distributed ledgers.

Abstract

This paper introduces and develops the concept of ``ticketing'', through which atomic broadcasts are orchestrated by nodes in a distributed system. The paper studies different ticketing regimes that allow parallelism, yet prevent slow nodes from hampering overall progress. It introduces a hybrid scheme which combines managed and unmanaged ticketing regimes, striking a balance between adaptivity and resilience. The performance evaluation demonstrates how managed and unmanaged ticketing regimes benefit throughput in systems with heterogeneous resources both in static and dynamic scenarios, with the managed ticketing regime performing better among the two as it adapts better. Finally, it demonstrates how using the hybrid ticketing regime performance can enjoy both the adaptivity of the managed regime and the liveness guarantees of the unmanaged regime.
Paper Structure (15 sections, 4 theorems, 4 figures, 2 tables)

This paper contains 15 sections, 4 theorems, 4 figures, 2 tables.

Key Result

Lemma 2

For any epoch $i$ and any two correct nodes $p$ and $q$, let $C_p, C_q$ and $TR_p, TR_q$ denote their local candidate sets and tickecing regimes, then $C_p[i] = C_q[i]$ and $TR_p[i] = TR_q[i]$.

Figures (4)

  • Figure 1: A hybrid managed/unmanaged ticketing regime.
  • Figure 2: An example with six epochs, $n=4, L=4, K=2$. Log slots with $\bot$ are skipped slots and others are committed with non empty values. The example shows four possible updating rules: (a) epoch 3 uses a ticketing-server since no slots are skipped in epoch 1; (b) epoch 4 keeps using round-robin since one slot is skipped in epoch 2 and the candidate set is updated to exclude node 0; (c) though all slots in epoch 3 are skipped, the candidate set remains the same since epoch 3 uses the managed ticketing regime; (d) the candidate set is reset to the full group of nodes in epoch 5 since epoch 4 adopts the unmanaged ticketing regime and has only 2 active senders.
  • Figure 3: Throughput and latency of different ticketing regimes under dynamic slowness. Each phase in the experiment lasts $30$ seconds where certain nodes are slowed down. MTR achieves best optimal performance in all phases, demonstrating meritocracy and adaptivity to dynamic conditions.
  • Figure 4: Throughput and latency of different ticketing regimes under dynamic faults. Each phase in the experiment lasts $30$ seconds where a certain node is faulty and creates skipped slots. HTR achieves superior performance in all phases, demonstrating fault resilience.

Theorems & Definitions (5)

  • Definition 1: Synchronized slot
  • Lemma 2: Epoch consistency
  • Lemma 3: Non-skipping epoch
  • Theorem 4: Slot utilization
  • Theorem 5: Chain Quality