uBFT: Microsecond-scale BFT using Disaggregated Memory [Extended Version]
Marcos K. Aguilera, Naama Ben-David, Rachid Guerraoui, Antoine Murat, Athanasios Xygkis, Igor Zablotchi
TL;DR
The paper presents uBFT, a Byzantine fault-tolerant SMR system that achieves microsecond-scale latency with only $2f+1$ replicas by leveraging disaggregated memory as a small trusted computing base and a novel Consistent Tail Broadcast (CTBcast) primitive. A two-path consensus design (fast path without signatures and slow path with signatures) enables end-to-end latency as low as $10\mu s$ in synchronous conditions, and the system demonstrates practical performance on Memcached, Redis, and Liquibook with bounded memory (<1 MiB per memory pool). The authors implement a full RDMA-based prototype, analyze latency components, compare against MinBFT and Mu, and provide correctness arguments and view-change mechanisms to ensure safety and liveness. The work significantly advances practical, low-latency Byzantine tolerance in data centers and highlights disaggregated memory as a viable, minimal-trust hardware foundation for future BFT systems.
Abstract
We propose uBFT, the first State-Machine Replication (SMR) system to achieve microsecond-scale latency in data centers, while using only $2f{+}1$ replicas to tolerate $f$ Byzantine failures. The Byzantine Fault Tolerance (BFT) provided by uBFT is essential as pure crashes appear to be a mere illusion with real-life systems reportedly failing in many unexpected ways. uBFT relies on a small non-tailored trusted computing base -- disaggregated memory -- and consumes a practically bounded amount of memory (both local and disaggregated). uBFT is based on a novel abstraction called Consistent Tail Broadcast, which we use to prevent equivocation while bounding memory. We implement uBFT using RDMA-based disaggregated memory and obtain an end-to-end latency of as little as 10us. This is at least 50$\times$ faster than MinBFT , a state of the art $2f{+}1$ BFT SMR based on Intel's SGX. We use uBFT to replicate two key-value stores (Memcached and Redis), as well as a financial order matching engine (Liquibook). These applications have low latency (up to 20us) and become Byzantine tolerant with as little as 10us more. The price for uBFT is a small amount of reliable disaggregated memory (less than 1 MiB), which in our prototype consists of a small number of memory servers connected through RDMA and replicated for fault tolerance.
