Synthesizing Precise Protocol Specs from Natural Language for Effective Test Generation
Kuangxiangzi Liu, Dhiman Chakraborty, Alexander Liggesmeyer, Andreas Zeller
TL;DR
This work tackles the challenge of turning natural-language protocol specifications into executable formal artifacts for automated test generation in safety- and security-critical systems. It introduces AutoSpec, a two-stage LLM-driven pipeline that first extracts protocol elements from NL RFCs and then synthesizes a formal I/O grammar, refined through an execution-guided repair loop leveraging real protocol implementations. The approach yields high client coverage and solid precision on five internet protocols, while preserving traceability to source text and enabling reusable grammars for future testing. This methodology reduces dependence on end-to-end LLM test generation, enhances reproducibility and auditability, and paves the way for building a corpus of NL-to-formal-spec mappings to bootstrap further automation and tooling in protocol testing.
Abstract
Safety- and security-critical systems have to be thoroughly tested against their specifications. The state of practice is to have _natural language_ specifications, from which test cases are derived manually - a process that is slow, error-prone, and difficult to scale. _Formal_ specifications, on the other hand, are well-suited for automated test generation, but are tedious to write and maintain. In this work, we propose a two-stage pipeline that uses large language models (LLMs) to bridge the gap: First, we extract _protocol elements_ from natural-language specifications; second, leveraging a protocol implementation, we synthesize and refine a formal _protocol specification_ from these elements, which we can then use to massively test further implementations. We see this two-stage approach to be superior to end-to-end LLM-based test generation, as 1. it produces an _inspectable specification_ that preserves traceability to the original text; 2. the generation of actual test cases _no longer requires an LLM_; 3. the resulting formal specs are _human-readable_, and can be reviewed, version-controlled, and incrementally refined; and 4. over time, we can build a _corpus_ of natural-language-to-formal-specification mappings that can be used to further train and refine LLMs for more automatic translations. Our prototype, AUTOSPEC, successfully demonstrated the feasibility of our approach on five widely used _internet protocols_ (SMTP, POP3, IMAP, FTP, and ManageSieve) by applying its methods on their _RFC specifications_ written in natural-language, and the recent _I/O grammar_ formalism for protocol specification and fuzzing. In its evaluation, AUTOSPEC recovers on average 92.8% of client and 80.2% of server message types, and achieves 81.5% message acceptance across diverse, real-world systems.
