Large Language Models for Validating Network Protocol Parsers
Mingwei Zheng, Danning Xie, Xiangyu Zhang
TL;DR
This paper tackles the challenge of validating network protocol parsers against formal standards by bridging the natural-language RFC descriptions and code implementations. It introduces PARVAL, a multi-agent framework that uses large language models to extract CodeSpec from parser code and DocSpec from RFCs, then performs differential analysis to identify semantic inconsistencies. Evaluated on the BFD protocol against RFC 5880, PARVAL achieves a low false positive rate of 5.6% and uncovers seven unique bugs, five of which are previously unknown. While not providing formal guarantees, PARVAL demonstrates a practical, scalable approach to parser validation that complements traditional static analysis and formal methods, with clear avenues for broader validation and automation.
Abstract
Network protocol parsers are essential for enabling correct and secure communication between devices. Bugs in these parsers can introduce critical vulnerabilities, including memory corruption, information leakage, and denial-of-service attacks. An intuitive way to assess parser correctness is to compare the implementation with its official protocol standard. However, this comparison is challenging because protocol standards are typically written in natural language, whereas implementations are in source code. Existing methods like model checking, fuzzing, and differential testing have been used to find parsing bugs, but they either require significant manual effort or ignore the protocol standards, limiting their ability to detect semantic violations. To enable more automated validation of parser implementations against protocol standards, we propose PARVAL, a multi-agent framework built on large language models (LLMs). PARVAL leverages the capabilities of LLMs to understand both natural language and code. It transforms both protocol standards and their implementations into a unified intermediate representation, referred to as format specifications, and performs a differential comparison to uncover inconsistencies. We evaluate PARVAL on the Bidirectional Forwarding Detection (BFD) protocol. Our experiments demonstrate that PARVAL successfully identifies inconsistencies between the implementation and its RFC standard, achieving a low false positive rate of 5.6%. PARVAL uncovers seven unique bugs, including five previously unknown issues.
