Table of Contents
Fetching ...

Stabl: Blockchain Fault Tolerance

Vincent Gramoli, Rachid Guerraoui, Andrei Lebedev, Gauthier Voron

TL;DR

This paper introduces Stabl, a sensitivity-based benchmark for quantifying blockchain fault tolerance by comparing latency distributions under baseline and adversarial failures, using a new sensitivity score defined as the absolute difference between super-cumulatives of latency distributions. It evaluates five modern blockchains—Algorand, Aptos, Avalanche, Redbelly, and Solana—across resilience, recoverability, partition tolerance, and Byzantine fault tolerance, employing observer nodes and a secure client to simulate faults and Byzantine scenarios. Key findings show that all except Redbelly are highly affected by small-scale failures, with Avalanche and Solana failing to recover from localized transient faults, while Redbelly’s leaderless DBFT design largely eliminates leader-induced disruption and enables faster recovery. The work highlights the tradeoffs between performance optimization (throttling, leader schedules, speculative execution) and dependability, providing practical insights for deploying robust geo-distributed blockchain services and guiding future dependability research in blockchain systems.

Abstract

Blockchain promises to make online services more fault tolerant due to their inherent distributed nature. Their ability to execute arbitrary programs in different geo-distributed regions and on diverse operating systems make them an alternative of choice to our dependence on unique software whose recent failure affected 8.5 millions of machines. As of today, it remains, however, unclear whether blockchains can truly tolerate failures. In this paper, we assess the fault tolerance of blockchain. To this end, we inject failures in controlled deployments of five modern blockchain systems, namely Algorand, Aptos, Avalanche, Redbelly and Solana. We introduce a novel sensitivity metric, interesting in its own right, as the difference between the integrals of two cumulative distribution functions, one obtained in a baseline environment and one obtained in an adversarial environment. Our results indicate that (i) all blockchains except Redbelly are highly impacted by the failure of a small part of their network, (ii) Avalanche and Redbelly benefit from the redundant information needed for Byzantine fault tolerance while others are hampered by it, and more dramatically (iii) Avalanche and Solana cannot recover from localised transient failures.

Stabl: Blockchain Fault Tolerance

TL;DR

This paper introduces Stabl, a sensitivity-based benchmark for quantifying blockchain fault tolerance by comparing latency distributions under baseline and adversarial failures, using a new sensitivity score defined as the absolute difference between super-cumulatives of latency distributions. It evaluates five modern blockchains—Algorand, Aptos, Avalanche, Redbelly, and Solana—across resilience, recoverability, partition tolerance, and Byzantine fault tolerance, employing observer nodes and a secure client to simulate faults and Byzantine scenarios. Key findings show that all except Redbelly are highly affected by small-scale failures, with Avalanche and Solana failing to recover from localized transient faults, while Redbelly’s leaderless DBFT design largely eliminates leader-induced disruption and enables faster recovery. The work highlights the tradeoffs between performance optimization (throttling, leader schedules, speculative execution) and dependability, providing practical insights for deploying robust geo-distributed blockchain services and guiding future dependability research in blockchain systems.

Abstract

Blockchain promises to make online services more fault tolerant due to their inherent distributed nature. Their ability to execute arbitrary programs in different geo-distributed regions and on diverse operating systems make them an alternative of choice to our dependence on unique software whose recent failure affected 8.5 millions of machines. As of today, it remains, however, unclear whether blockchains can truly tolerate failures. In this paper, we assess the fault tolerance of blockchain. To this end, we inject failures in controlled deployments of five modern blockchain systems, namely Algorand, Aptos, Avalanche, Redbelly and Solana. We introduce a novel sensitivity metric, interesting in its own right, as the difference between the integrals of two cumulative distribution functions, one obtained in a baseline environment and one obtained in an adversarial environment. Our results indicate that (i) all blockchains except Redbelly are highly impacted by the failure of a small part of their network, (ii) Avalanche and Redbelly benefit from the redundant information needed for Byzantine fault tolerance while others are hampered by it, and more dramatically (iii) Avalanche and Solana cannot recover from localised transient failures.
Paper Structure (43 sections, 2 equations, 6 figures, 1 table)

This paper contains 43 sections, 2 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: The sensitivity of Aptos to failures as the difference in latency distributions between a baseline environment without failure and the altered environment with failures.
  • Figure 2: Sensitivity score of 5 blockchains with $f=t$ crashes, $f=t+1$ transient node failures, transient network partition isolating $f=t+1$ nodes and redundant requests to cope with Byzantine fault tolerance.
  • Figure 3: Throughput of the 5 blockchains over time as we crash simultaneously $f = t$ nodes at time 133 as indicated by the red dashed line.
  • Figure 4: Throughput of the 5 blockchains over time as we transiently stop $f > t$ nodes at time 133 as indicated by the dashed red line and as we recover them at time 233 as indicated by the dotted red line.
  • Figure 5: Throughput of the 5 blockchains over time as we transiently partition $f > t$ nodes at time 133 as indicated by the dashed red line and as we stop the partition at time 233 as indicated by the dotted red line.
  • ...and 1 more figures