Table of Contents
Fetching ...

High-Performance Generation of Constrained Inputs

Addison Crump, Alexi Turcotte, José Antonio Zamudio Amaya, Andreas Zeller

TL;DR

This paper tackles the challenge of generating semantically valid inputs from context-free grammars with complex constraints by introducing FANDANGO-RS, a Rust-based, grammar-to-type transpilation approach combined with multi-objective evolutionary algorithms. By compiling grammars into Rust types, employing opaque node representations, and using NSGA-II–driven search, the authors achieve 3–4 orders of magnitude speedups over prior state-of-the-art and enable solving previously intractable constraint sets. They validate the approach with a case study on a C-subset, generating hundreds of diverse, valid inputs per minute and demonstrating practical viability for compiler testing and broader specification-based input generation. The work suggests significant practical impact for software testing, enabling rapid, scalable, and semantically aware input generation across complex domains, while outlining avenues for integration with coverage-guided testing and potential hybrid symbolic approaches in future work.

Abstract

Language-based testing combines context-free grammar definitions with semantic constraints over grammar elements to generate test inputs. By pairing context-free grammars with constraints, users have the expressiveness of unrestricted grammars while retaining simple structure. However, producing inputs in the presence of such constraints can be challenging. In past approaches, SMT solvers have been found to be very slow at finding string solutions; evolutionary algorithms are faster and more general, but current implementations still struggle with complex constraints that would be required for domains such as compiler testing. In this paper, we present a novel approach for evolutionary language-based testing that improves performance by 3-4 orders of magnitude over the current state of the art, reducing hours of generation and constraint solving time to seconds. We accomplish this by (1) carefully transforming grammar definitions into Rust types and trait implementations, ensuring that the compiler may near-maximally optimize arbitrary operations on arbitrary grammars; and (2) using better evolutionary algorithms that improve the ability of language-based testing to solve complex constraint systems. These performance and algorithmic improvements allow our prototype, FANDANGO-RS, to solve constraints that previous strategies simply cannot handle. We demonstrate this by a case study for a C subset, in which FANDANGO-RS is able to generate 401 diverse, complex, and valid test inputs for a C compiler per minute.

High-Performance Generation of Constrained Inputs

TL;DR

This paper tackles the challenge of generating semantically valid inputs from context-free grammars with complex constraints by introducing FANDANGO-RS, a Rust-based, grammar-to-type transpilation approach combined with multi-objective evolutionary algorithms. By compiling grammars into Rust types, employing opaque node representations, and using NSGA-II–driven search, the authors achieve 3–4 orders of magnitude speedups over prior state-of-the-art and enable solving previously intractable constraint sets. They validate the approach with a case study on a C-subset, generating hundreds of diverse, valid inputs per minute and demonstrating practical viability for compiler testing and broader specification-based input generation. The work suggests significant practical impact for software testing, enabling rapid, scalable, and semantically aware input generation across complex domains, while outlining avenues for integration with coverage-guided testing and potential hybrid symbolic approaches in future work.

Abstract

Language-based testing combines context-free grammar definitions with semantic constraints over grammar elements to generate test inputs. By pairing context-free grammars with constraints, users have the expressiveness of unrestricted grammars while retaining simple structure. However, producing inputs in the presence of such constraints can be challenging. In past approaches, SMT solvers have been found to be very slow at finding string solutions; evolutionary algorithms are faster and more general, but current implementations still struggle with complex constraints that would be required for domains such as compiler testing. In this paper, we present a novel approach for evolutionary language-based testing that improves performance by 3-4 orders of magnitude over the current state of the art, reducing hours of generation and constraint solving time to seconds. We accomplish this by (1) carefully transforming grammar definitions into Rust types and trait implementations, ensuring that the compiler may near-maximally optimize arbitrary operations on arbitrary grammars; and (2) using better evolutionary algorithms that improve the ability of language-based testing to solve complex constraint systems. These performance and algorithmic improvements allow our prototype, FANDANGO-RS, to solve constraints that previous strategies simply cannot handle. We demonstrate this by a case study for a C subset, in which FANDANGO-RS is able to generate 401 diverse, complex, and valid test inputs for a C compiler per minute.

Paper Structure

This paper contains 40 sections, 3 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: A snippet from the CSV grammar for FANDANGO.
  • Figure 2: A FANDANGO-style constraint over CSV, from the artifact of the original paper zamudio2025fandango. The cardinality operator measures the number of nodes matching the selected expression. One may read this as "the number of $\langle$raw_field$\rangle$ within the recursive expansions of the $\langle$csv_string_list$\rangle$ of $r_1$ is equal to that of $r_2$'s."
  • Figure 3: A pair of constraints which force exactly three columns and three rows, respectively.
  • Figure 4: Definition of the expr grammar.
  • Figure 5: Graph representing $\langle$expr$\rangle$.
  • ...and 10 more figures