High-Performance Generation of Constrained Inputs
Addison Crump, Alexi Turcotte, José Antonio Zamudio Amaya, Andreas Zeller
TL;DR
This paper tackles the challenge of generating semantically valid inputs from context-free grammars with complex constraints by introducing FANDANGO-RS, a Rust-based, grammar-to-type transpilation approach combined with multi-objective evolutionary algorithms. By compiling grammars into Rust types, employing opaque node representations, and using NSGA-II–driven search, the authors achieve 3–4 orders of magnitude speedups over prior state-of-the-art and enable solving previously intractable constraint sets. They validate the approach with a case study on a C-subset, generating hundreds of diverse, valid inputs per minute and demonstrating practical viability for compiler testing and broader specification-based input generation. The work suggests significant practical impact for software testing, enabling rapid, scalable, and semantically aware input generation across complex domains, while outlining avenues for integration with coverage-guided testing and potential hybrid symbolic approaches in future work.
Abstract
Language-based testing combines context-free grammar definitions with semantic constraints over grammar elements to generate test inputs. By pairing context-free grammars with constraints, users have the expressiveness of unrestricted grammars while retaining simple structure. However, producing inputs in the presence of such constraints can be challenging. In past approaches, SMT solvers have been found to be very slow at finding string solutions; evolutionary algorithms are faster and more general, but current implementations still struggle with complex constraints that would be required for domains such as compiler testing. In this paper, we present a novel approach for evolutionary language-based testing that improves performance by 3-4 orders of magnitude over the current state of the art, reducing hours of generation and constraint solving time to seconds. We accomplish this by (1) carefully transforming grammar definitions into Rust types and trait implementations, ensuring that the compiler may near-maximally optimize arbitrary operations on arbitrary grammars; and (2) using better evolutionary algorithms that improve the ability of language-based testing to solve complex constraint systems. These performance and algorithmic improvements allow our prototype, FANDANGO-RS, to solve constraints that previous strategies simply cannot handle. We demonstrate this by a case study for a C subset, in which FANDANGO-RS is able to generate 401 diverse, complex, and valid test inputs for a C compiler per minute.
