Effective Command-line Interface Fuzzing with Path-Aware Large Language Model Orchestration
Momoko Shiraishi, Yinzhi Cao, Takahiro Shinagawa
TL;DR
PILOT presents a Path-guided, Iterative LLM-Orchestrated Testing framework to enhance CLI fuzzing by providing target-call paths as context to LLMs and by automatically generating semantically valid input files with native tools. It combines coverage-based input prioritization, path-guided context prompting, and efficient iterative refinement to reach deep target functions, achieving significantly higher code coverage and discovering 51 zero-day vulnerabilities (41 confirmed, 33 fixed, 3 CVEs) across 43 real-world programs. The results show substantial improvements over state-of-the-art fuzzers in vulnerability discovery and reach depth, underlining PILOT’s practical impact for automated, scalable CLI security testing. The work also provides insights into adaptive target selection via function centrality and suggests avenues for reducing LLM-token costs and validating robustness across models.
Abstract
Command-line interface (CLI) fuzzing tests programs by mutating both command-line options and input file contents, thus enabling discovery of vulnerabilities that only manifest under specific option-input combinations. Prior works of CLI fuzzing face the challenges of generating semantics-rich option strings and input files, which cannot reach deeply embedded target functions. This often leads to a misdetection of such a deep vulnerability using existing CLI fuzzing techniques. In this paper, we design a novel Path-guided, Iterative LLM-Orchestrated Testing framework, called PILOT, to fuzz CLI applications. The key insight is to provide potential call paths to target functions as context to LLM so that it can better generate CLI option strings and input files. Then, PILOT iteratively repeats the process, and provides reached functions as additional context so that target functions are reached. Our evaluation on real-world CLI applications demonstrates that PILOT achieves higher coverage than state-of-the-art fuzzing approaches and discovers 51 zero-day vulnerabilities. We responsibly disclosed all the vulnerabilities to their developers and so far 41 have been confirmed by their developers with 33 being fixed and three assigned CVE identifiers.
