SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation

Jaid Monwar Chowdhury; Chi-An Fu; Reyhaneh Jabbarvand

SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation

Jaid Monwar Chowdhury, Chi-An Fu, Reyhaneh Jabbarvand

TL;DR

SPARC targets the persistent semantic gap in automated C unit test generation by combining CFG-based path enumeration with a retrieval-augmented operation map and per-path synthesis. The four-stage pipeline (pre-processing, operation map construction, per-path synthesis, and iterative validation) ensures tests are semantically grounded, compilable, and traceable to specific execution paths, with iterative repairs improving correctness. Empirical evaluation over 59 C projects shows SPARC outperforms vanilla prompting (≈31% higher line and 26% higher branch coverage) and matches or surpasses KLEE on complex subjects, while achieving 94.3% test retention after repair and a 20.78% mutation-score gain. A human study reports superior developer-perceived quality in readability, correctness, completeness, and maintainability, and cost analyses demonstrate that cost-efficient LLMs can match frontier models when used within SPARC’s structured pipeline. Overall, SPARC provides a scalable, industrial-grade approach to automated C test generation that aligns LLM reasoning with program structure and supports deployment across diverse codebases and LLM resources.

Abstract

Automated unit test generation for C remains a formidable challenge due to the semantic gap between high-level program intent and the rigid syntactic constraints of pointer arithmetic and manual memory management. While Large Language Models (LLMs) exhibit strong generative capabilities, direct intent-to-code synthesis frequently suffers from the leap-to-code failure mode, where models prematurely emit code without grounding in program structure, constraints, and semantics. This will result in non-compilable tests, hallucinated function signatures, low branch coverage, and semantically irrelevant assertions that cannot properly capture bugs. We introduce SPARC, a neuro-symbolic, scenario-based framework that bridges this gap through four stages: (1) Control Flow Graph (CFG) analysis, (2) an Operation Map that grounds LLM reasoning in validated utility helpers, (3) Path-targeted test synthesis, and (4) an iterative, self-correction validation loop using compiler and runtime feedback. We evaluate SPARC on 59 real-world and algorithmic subjects, where it outperforms the vanilla prompt generation baseline by 31.36% in line coverage, 26.01% in branch coverage, and 20.78% in mutation score, matching or exceeding the symbolic execution tool KLEE on complex subjects. SPARC retains 94.3% of tests through iterative repair and produces code with significantly higher developer-rated readability and maintainability. By aligning LLM reasoning with program structure, SPARC provides a scalable path for industrial-grade testing of legacy C codebases.

SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation

TL;DR

Abstract

Paper Structure (31 sections, 9 equations, 6 figures, 4 tables)

This paper contains 31 sections, 9 equations, 6 figures, 4 tables.

Introduction
Related Works
Methodology
Architectural Overview
Preliminaries and Formal Definitions
SPARC Multi-Stage Pipeline
Pre-processing from Source Code to Structured Metadata
Operation Map Construction and Helper Synthesis
Per-Path Test Generation
Iterative Validation and Test Merging
Evaluation
RQ1: Coverage Effectiveness
Experimental Setup
Coverage Comparison Against Existing Approaches
Coverage vs. Subject Complexity
...and 16 more sections

Figures (6)

Figure 1: An extracted execution path and its corresponding generated unit test for a BST insert function.
Figure 2: Three failure modes of vanilla prompt generation compared with SPARC on kohonen_som_trace.c.
Figure 3: The four-stage SPARC pipeline. (1) Pre-processing extracts functions, their dependencies, and control-flow paths. (2) RAG-augmented operation map construction retrieves and assigns helper functions per path. (3) Path-targeted synthesis generates a test case for each scenario. (4) Iterative validation compiles, executes, and repairs failing tests.
Figure 4: Root-cause analysis of 304 permanently dropped tests. (a) Failure categories. (b) Repair success rate.
Figure 5: Perceived unit test survey results: SPARC vs. vanilla DS (95% confidence intervals).
...and 1 more figures

SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation

TL;DR

Abstract

SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)