Table of Contents
Fetching ...

CFT-Forensics: High-Performance Byzantine Accountability for Crash Fault Tolerant Protocols

Weizhao Tang, Peiyao Sheng, Ronghao Ni, Pronoy Roy, Xuechao Wang, Giulia Fanti, Pramod Viswanath

TL;DR

This work addresses the vulnerability of crash fault tolerant (CFT) consensus to Byzantine faults by introducing accountability through CFT-Forensics. It defines forensics-compliant protocols (including Raft and Paxos) and adds Commitment Certificates (CC) and Leader Certificates (LC) to enable an auditor to identify culprits when safety is violated, without replacing the core CFT protocol. The authors provide both theoretical overhead analyses and empirical evaluations, showing substantially lower storage and communication overhead than general-purpose approaches like PeerReview, and demonstrate near-Raft performance in Raft-Forensics with meaningful auditing capabilities. They instantiate Raft-Forensics in nuRaft and integrate it into OpenCBDC, achieving throughput close to Raft with modest latency increases in wide-area deployments, thereby highlighting the practical viability of accountable CFT protocols for critical infrastructure. Overall, accountability is presented as a complementary security property that can be implemented with lightweight protocol augmentations to enhance governance and fault attribution in distributed systems.

Abstract

Crash fault tolerant (CFT) consensus algorithms are commonly used in scenarios where system components are trusted -- e.g., enterprise settings and government infrastructure. However, CFT consensus can be broken by even a single corrupt node. A desirable property in the face of such potential Byzantine faults is \emph{accountability}: if a corrupt node breaks protocol and affects consensus safety, it should be possible to identify the culpable components with cryptographic integrity from the node states. Today, the best-known protocol for providing accountability to CFT protocols is called PeerReview; it essentially records a signed transcript of all messages sent during the CFT protocol. Because PeerReview is agnostic to the underlying CFT protocol, it incurs high communication and storage overhead. We propose CFT-Forensics, an accountability framework for CFT protocols. We show that for a special family of \emph{forensics-compliant} CFT protocols (which includes widely-used CFT protocols like Raft and multi-Paxos), CFT-Forensics gives provable accountability guarantees. Under realistic deployment settings, we show theoretically that CFT-Forensics operates at a fraction of the cost of PeerReview. We subsequently instantiate CFT-Forensics for Raft, and implement Raft-Forensics as an extension to the popular nuRaft library. In extensive experiments, we demonstrate that Raft-Forensics adds low overhead to vanilla Raft. With 256 byte messages, Raft-Forensics achieves a peak throughput 87.8\% of vanilla Raft at 46\% higher latency ($+44$ ms). We finally integrate Raft-Forensics into the open-source central bank digital currency OpenCBDC, and show that in wide-area network experiments, Raft-Forensics achieves 97.8\% of the throughput of Raft, with 14.5\% higher latency ($+326$ ms).

CFT-Forensics: High-Performance Byzantine Accountability for Crash Fault Tolerant Protocols

TL;DR

This work addresses the vulnerability of crash fault tolerant (CFT) consensus to Byzantine faults by introducing accountability through CFT-Forensics. It defines forensics-compliant protocols (including Raft and Paxos) and adds Commitment Certificates (CC) and Leader Certificates (LC) to enable an auditor to identify culprits when safety is violated, without replacing the core CFT protocol. The authors provide both theoretical overhead analyses and empirical evaluations, showing substantially lower storage and communication overhead than general-purpose approaches like PeerReview, and demonstrate near-Raft performance in Raft-Forensics with meaningful auditing capabilities. They instantiate Raft-Forensics in nuRaft and integrate it into OpenCBDC, achieving throughput close to Raft with modest latency increases in wide-area deployments, thereby highlighting the practical viability of accountable CFT protocols for critical infrastructure. Overall, accountability is presented as a complementary security property that can be implemented with lightweight protocol augmentations to enhance governance and fault attribution in distributed systems.

Abstract

Crash fault tolerant (CFT) consensus algorithms are commonly used in scenarios where system components are trusted -- e.g., enterprise settings and government infrastructure. However, CFT consensus can be broken by even a single corrupt node. A desirable property in the face of such potential Byzantine faults is \emph{accountability}: if a corrupt node breaks protocol and affects consensus safety, it should be possible to identify the culpable components with cryptographic integrity from the node states. Today, the best-known protocol for providing accountability to CFT protocols is called PeerReview; it essentially records a signed transcript of all messages sent during the CFT protocol. Because PeerReview is agnostic to the underlying CFT protocol, it incurs high communication and storage overhead. We propose CFT-Forensics, an accountability framework for CFT protocols. We show that for a special family of \emph{forensics-compliant} CFT protocols (which includes widely-used CFT protocols like Raft and multi-Paxos), CFT-Forensics gives provable accountability guarantees. Under realistic deployment settings, we show theoretically that CFT-Forensics operates at a fraction of the cost of PeerReview. We subsequently instantiate CFT-Forensics for Raft, and implement Raft-Forensics as an extension to the popular nuRaft library. In extensive experiments, we demonstrate that Raft-Forensics adds low overhead to vanilla Raft. With 256 byte messages, Raft-Forensics achieves a peak throughput 87.8\% of vanilla Raft at 46\% higher latency ( ms). We finally integrate Raft-Forensics into the open-source central bank digital currency OpenCBDC, and show that in wide-area network experiments, Raft-Forensics achieves 97.8\% of the throughput of Raft, with 14.5\% higher latency ( ms).
Paper Structure (35 sections, 2 theorems, 2 equations, 12 figures, 5 tables, 8 algorithms)

This paper contains 35 sections, 2 theorems, 2 equations, 12 figures, 5 tables, 8 algorithms.

Key Result

Proposition 8

Both Raft raft and Paxos originalpaxosheidivs are forensics-compliant.

Figures (12)

  • Figure 1: Bandwidth-latency tradeoffs of Raft vs Raft-Forensics over 4 nodes at message size of 256 Bytes.
  • Figure 2: Examples of node freshness. Each box represents an entry containing the entry's term. In all worlds, $x$'s log list is fresher than $y$'s. In World A, $x$ is as fresh as $y$. In only Worlds B and C, $x$ is strictly fresher than $y$.
  • Figure 3: Log entry attributes with and without CFT-Forensics; committed blocks are shown with a double gold outline. Our basic (unoptimized) CFT-Forensics (top) adds a hash pointer, a proposer stamp, and a leader certificate LC, all shown in red. We also store a $\mathtt{CC}$ only for the latest committed block. Our optimized CFT-Forensics (§\ref{['sec:stateopt']}) reduces storage costs by storing three hash maps: (1) one containing pointers only for the last committed block and later uncommitted blocks, (2) one storing a single leader certificate $\mathtt{LC}$ for every term, and (3) one storing a proposer stamp only for the latest proposed block in the current term.
  • Figure 4: Overhead complexities of CFT-Forensics and PeerReview in log replication when hash size $\Pi = 32$ bytes and digital signatures are $\Sigma = 65$ bytes.
  • Figure 5: Overhead complexities of CFT-Forensics and PeerReview in leader election.
  • ...and 7 more figures

Theorems & Definitions (8)

  • Definition 1: CFT(-SMR) Protocol
  • Definition 2: Accountability
  • Definition 4: Freshness
  • Definition 7: Forensics-Compliant Protocols
  • Proposition 8: Instances of Forensics-Compliant Protocols
  • Example 9: Proposer's Attack, or Split-Brains
  • Example 10: Voter's Attack
  • Theorem 11