It's not a lie if you don't get caught: simplifying reconfiguration in SMR through dirty logs
Allen Clement, Natacha Crooks, Neil Giridharan, Alex Shamis
TL;DR
The paper tackles the problem of rigid reconfiguration in state-machine replication by introducing Gauss, a modular reconfiguration engine that decouples membership changes and protocol upgrades from the core SMR protocol. It achieves this through an inner-outer log separation, enabling transitions between epochs with different memberships, failure thresholds, and consensus protocols while maintaining safety and liveness. The design comprises a three-phase reconfiguration workflow (Prepare, Handover, Shutdown) and a log sanitizer that translates inner logs into a single sanitized outer log, demonstrated in Rialo with promising latency results and scalable transition behavior. Practically, Gauss offers a pathway to evolvable SMR stacks with minimal downtime, facilitating independent upgrades to data dissemination, ordering, execution, and reconfiguration components. The work combines formal system modeling, architectural design, proofs of correctness, and empirical evaluation to support modular, long-lived production deployments.
Abstract
Production state-machine replication (SMR) implementations are complex, multi-layered architectures comprising data dissemination, ordering, execution, and reconfiguration components. Existing research consensus protocols rarely discuss reconfiguration. Those that do tightly couple membership changes to a specific algorithm. This prevents the independent upgrade of individual building blocks and forces expensive downtime when transitioning to new protocol implementations. Instead, modularity is essential for maintainability and system evolution in production deployments. We present Gauss, a reconfiguration engine designed to treat consensus protocols as interchangeable modules. By introducing a distinction between a consensus protocol's inner log and a sanitized outer log exposed to the RSM node, Gauss allows engineers to upgrade membership, failure thresholds, and the consensus protocol itself independently and with minimal global downtime. Our initial evaluation on the Rialo blockchain shows that this separation of concerns enables a seamless evolution of the SMR stack across a sequence of diverse protocol implementations.
