Stabl: Blockchain Fault Tolerance
Vincent Gramoli, Rachid Guerraoui, Andrei Lebedev, Gauthier Voron
TL;DR
This paper introduces Stabl, a sensitivity-based benchmark for quantifying blockchain fault tolerance by comparing latency distributions under baseline and adversarial failures, using a new sensitivity score defined as the absolute difference between super-cumulatives of latency distributions. It evaluates five modern blockchains—Algorand, Aptos, Avalanche, Redbelly, and Solana—across resilience, recoverability, partition tolerance, and Byzantine fault tolerance, employing observer nodes and a secure client to simulate faults and Byzantine scenarios. Key findings show that all except Redbelly are highly affected by small-scale failures, with Avalanche and Solana failing to recover from localized transient faults, while Redbelly’s leaderless DBFT design largely eliminates leader-induced disruption and enables faster recovery. The work highlights the tradeoffs between performance optimization (throttling, leader schedules, speculative execution) and dependability, providing practical insights for deploying robust geo-distributed blockchain services and guiding future dependability research in blockchain systems.
Abstract
Blockchain promises to make online services more fault tolerant due to their inherent distributed nature. Their ability to execute arbitrary programs in different geo-distributed regions and on diverse operating systems make them an alternative of choice to our dependence on unique software whose recent failure affected 8.5 millions of machines. As of today, it remains, however, unclear whether blockchains can truly tolerate failures. In this paper, we assess the fault tolerance of blockchain. To this end, we inject failures in controlled deployments of five modern blockchain systems, namely Algorand, Aptos, Avalanche, Redbelly and Solana. We introduce a novel sensitivity metric, interesting in its own right, as the difference between the integrals of two cumulative distribution functions, one obtained in a baseline environment and one obtained in an adversarial environment. Our results indicate that (i) all blockchains except Redbelly are highly impacted by the failure of a small part of their network, (ii) Avalanche and Redbelly benefit from the redundant information needed for Byzantine fault tolerance while others are hampered by it, and more dramatically (iii) Avalanche and Solana cannot recover from localised transient failures.
