Cottontail: Large Language Model-Driven Concolic Execution for Highly Structured Test Input Generation
Haoxin Tu, Seongmin Lee, Yuxian Li, Peng Chen, Lingxiao Jiang, Marcel Böhme
TL;DR
This work addresses the challenge of generating highly structured test inputs for parsing programs by integrating large language models with concolic execution. Cottontail introduces three innovations: an Expressive Coverage Tree that enables structure-aware path constraint selection, a Solve-Complete paradigm where an LLM first solves path constraints for satisfiability and then completes inputs for syntactic validity, and a history-guided seed acquisition strategy to sustain testing. In extensive experiments across eight libraries and four input formats, Cottontail achieves substantial gains in line and branch coverage and uncovers six new vulnerabilities, four of which were fixed. The results demonstrate the practical value of combining program analysis with LLM reasoning to improve automated testing of highly structured inputs, and the authors provide an open-source artifact for reproducibility. Overall, Cottontail advances white-box fuzzing by aligning constraint solving with input syntax and by continuously refreshing seeds through historical coverage signals.
Abstract
How can we perform concolic execution to generate highly structured test inputs for systematically testing parsing programs? Existing concolic execution engines are significantly restricted by (1) input structure-agnostic path constraint selection, leading to the waste of testing effort or missing coverage; (2) limited constraint-solving capability, yielding many syntactically invalid test inputs; (3) reliance on manual acquisition of highly structured seed inputs, resulting in non-continuous testing. This paper proposes Cottontail, a new Large Language Model (LLM)-driven concolic execution engine, to mitigate the above limitations. A more complete program path representation, named Expressive Structural Coverage Tree (ESCT), is first constructed to select structure-aware path constraints. Later, an LLM-driven constraint solver based on a Solve-Complete paradigm is designed to solve the path constraints smartly to get test inputs that are not only satisfiable to the constraints but also valid to the input syntax. Finally, a history-guided seed acquisition is employed to obtain new highly structured test inputs either before testing starts or after testing is saturated. We implemented Cottontail on top of SymCC and evaluated eight extensively tested open-source libraries across four different formats (XML, SQL, JavaScript, and JSON). Cottontail significantly outperforms baseline approaches by 30.73% and 41.32% on average in terms of line and branch coverage. Besides, Cottontail found six previously unknown vulnerabilities (six CVEs assigned). We have reported these issues to developers, and four out of them have been fixed so far.
