RFCAudit: An LLM Agent for Functional Bug Detection in Network Protocols
Mingwei Zheng, Chengpeng Wang, Xuwei Liu, Jinyao Guo, Shiwei Feng, Xiangyu Zhang
TL;DR
RFCAudit tackles the challenge of detecting functional bugs in network protocol implementations by bridging informal RFC specifications with code semantics through a two-agent LLM workflow. The indexing agent builds hierarchical semantic indexes of the codebase, while the detection agent performs retrieval-guided, demand-driven analysis with tools for context expansion and self-critique to validate RFC conformance. Across six real-world protocols, RFCAudit identifies 47 unique bugs with 81.9% precision, with 20 confirmed or fixed by developers, outperforming baseline LLM-based approaches and fuzzing-based methods in both precision and semantic coverage. The work demonstrates scalability to industry-scale stacks and suggests broad applicability to domains where formal specifications are available, offering a practical path toward automating high-level semantic correctness in software systems.
Abstract
Functional correctness is critical for ensuring the reliability and security of network protocol implementations. Functional bugs, instances where implementations diverge from behaviors specified in RFC documents, can lead to severe consequences, including faulty routing, authentication bypasses, and service disruptions. Detecting these bugs requires deep semantic analysis across specification documents and source code, a task beyond the capabilities of traditional static analysis tools. This paper introduces RFCAudit, an autonomous agent that leverages large language models (LLMs) to detect functional bugs by checking conformance between network protocol implementations and their RFC specifications. Inspired by the human auditing procedure, RFCAudit comprises two key components: an indexing agent and a detection agent. The former hierarchically summarizes protocol code semantics, generating semantic indexes that enable the detection agent to narrow down the scanning scope. The latter employs demand-driven retrieval to iteratively collect additional relevant data structures and functions, eventually identifying potential inconsistencies with the RFC specifications effectively. We evaluate RFCAudit across six real-world network protocol implementations. RFCAudit identifies 47 functional bugs with 81.9% precision, of which 20 bugs have been confirmed or fixed by developers.
