Message Size Matters: AlterBFT's Approach to Practical Synchronous BFT in Public Clouds
Nenad Milošević, Daniel Cason, Zarko Milošević, Robert Soulé, Fernando Pedone
TL;DR
The paper addresses the latency-safety trade-off in Byzantine fault-tolerant consensus for public clouds by introducing a hybrid synchronous system model that separates small and large messages. AlterBFT leverages small, fast-coordinating messages to guarantee safety, while using large messages for value propagation under a GST-based bound to preserve liveness, achieving up to $15\times$ lower latency than synchronous contenders with comparable throughput and the same fault tolerance. A fast commit path, a refined equivocation-detection mechanism, and careful epoch-change timing further enhance performance, especially in failure-free scenarios after GST. Experimental evaluation across geo-distributed cloud deployments demonstrates substantial latency reductions for large blocks and competitive throughput relative to synchronous baselines, with robust behavior under equivocation attacks and scalable certificate management. The work has practical impact for blockchains in public clouds, enabling responsive consensus without sacrificing safety or increasing the required number of replicas beyond the traditional synchronous threshold.
Abstract
Synchronous consensus protocols offer a significant advantage over their asynchronous and partially synchronous counterparts by providing higher fault tolerance -- an essential benefit in distributed systems, like blockchains, where participants may have incentives to act maliciously. However, despite this advantage, synchronous protocols are often met with skepticism due to concerns about their performance, as the latency of synchronous protocols is tightly linked to a conservative time bound for message delivery. This paper introduces AlterBFT, a new Byzantine fault-tolerant consensus protocol. The key idea behind AlterBFT lies in the new model we propose, called hybrid synchronous system model. The new model is inspired by empirical observations about network behavior in the public cloud environment and combines elements from the synchronous and partially synchronous models. Namely, it distinguishes between small messages that respect time bounds and large messages that may violate bounds but are eventually timely. Leveraging this observation, AlterBFT achieves up to 15$\times$ lower latency than state-of-the-art synchronous protocols while maintaining similar throughput and the same fault tolerance. Compared to partially synchronous protocols, AlterBFT provides higher fault tolerance, higher throughput, and comparable latency.
