ReCraft: Self-Contained Split, Merge, and Membership Change of Raft Protocol
Kezhi Xiong, Soonwon Moon, Joshua Kang, Bryant Curto, Jieung Kim, Ji-Yong Shin
TL;DR
ReCraft tackles the challenge of reconfiguring consensus systems by introducing a self-contained Raft reconfiguration protocol that supports split, merge, and membership changes without external coordinators, thus eliminating a major single point of failure. It combines epoch-based configurations, targeted quorum management, and a pull-based catch-up mechanism to preserve safety while enabling concurrent reconfiguration. The approach is formalized with safety and liveness proofs and mechanized in Rocq, and is implemented and evaluated within etcd, showing negligible overhead and favorable performance during splits and merges compared to emulated alternatives. The work significantly advances scalable, fault-tolerant multi-cluster Raft by providing a practical, robust alternative to external cluster management for large-scale deployments.
Abstract
Designing reconfiguration schemes for consensus protocols is challenging because subtle corner cases during reconfiguration could invalidate the correctness of the protocol. Thus, most systems that embed consensus protocols conservatively implement the reconfiguration and refrain from developing an efficient scheme. Existing implementations often stop the entire system during reconfiguration and rely on a centralized coordinator, which can become a single point of failure. We present ReCraft, a novel reconfiguration protocol for Raft, which supports multi- and single-cluster-level reconfigurations. ReCraft does not rely on external coordinators and blocks minimally. ReCraft enables the sharding of Raft clusters with split and merge reconfigurations and adds a membership change scheme that improves Raft. We prove the safety and liveness of ReCraft and demonstrate its efficiency through implementations in etcd.
