Table of Contents
Fetching ...

PALM: Path-aware LLM-based Test Generation with Comprehension

Yaoxuan Wu, Xiaojie Zhou, Ahmad Humayun, Muhammad Ali Gulzar, Miryung Kim

TL;DR

PALM addresses the gap between symbolic path enumeration and LLM-based test generation by constructing path-specific executable variants with embedded assertions and using LLMs to generate targeted inputs. It combines AST-level path extraction, function inlining, variable renaming, and constant propagation to create self-contained path variants, then validates LLM-generated tests by runtime execution on those variants. An interactive frontend visualizes the symbolic execution tree and path coverage, enabling users to inspect and refine tests for specific paths. Evaluations on 124 HumanEval-Java programs show PALM achieves substantial gains in path coverage over direct LLM prompting, while outperforming Symbolic PathFinder in scenarios with external API calls; a within-subject user study indicates PALM improves users' understanding of coverage and path-to-test alignment. The work demonstrates that integrating symbolic path enumeration with LLM-driven test generation and interactive visualization can enhance path-aware testing and reduce coverage gaps caused by API modeling limits.

Abstract

Symbolic execution is a widely used technique for test generation, offering systematic exploration of program paths through constraint solving. However, it is fundamentally constrained by the capability to model the target code, including library functions, in terms of symbolic constraints and by the capability of underlying constraint solvers. As a result, many paths involving complex features remain unanalyzed or insufficiently modeled. Recent advances in large language models (LLMs) have shown promise in generating diverse and valid test inputs. Yet, LLMs lack mechanisms for systematically enumerating program paths and often fail to cover subtle corner cases. We observe that directly prompting an LLM with the full program leads to missed coverage of interesting paths. In this paper, we present PALM, a test generation system that combines symbolic path enumeration with LLM-assisted test generation. PALM statically enumerates possible paths through AST-level analysis and transforms each into an executable variant with embedded assertions that specify the target path. This avoids the need to translate path constraints into SMT formulas, by instead constructing program variants that the LLM can interpret. Importantly, PALM provides an interactive frontend that visualizes path coverage alongside generated tests, assembling tests based on the specific paths they exercise. A user study with 12 participants demonstrates that PALM's frontend helps users better understand path coverage and identify which paths are actually exercised by PALM-generated tests through verification and visualization of their path profiles.

PALM: Path-aware LLM-based Test Generation with Comprehension

TL;DR

PALM addresses the gap between symbolic path enumeration and LLM-based test generation by constructing path-specific executable variants with embedded assertions and using LLMs to generate targeted inputs. It combines AST-level path extraction, function inlining, variable renaming, and constant propagation to create self-contained path variants, then validates LLM-generated tests by runtime execution on those variants. An interactive frontend visualizes the symbolic execution tree and path coverage, enabling users to inspect and refine tests for specific paths. Evaluations on 124 HumanEval-Java programs show PALM achieves substantial gains in path coverage over direct LLM prompting, while outperforming Symbolic PathFinder in scenarios with external API calls; a within-subject user study indicates PALM improves users' understanding of coverage and path-to-test alignment. The work demonstrates that integrating symbolic path enumeration with LLM-driven test generation and interactive visualization can enhance path-aware testing and reduce coverage gaps caused by API modeling limits.

Abstract

Symbolic execution is a widely used technique for test generation, offering systematic exploration of program paths through constraint solving. However, it is fundamentally constrained by the capability to model the target code, including library functions, in terms of symbolic constraints and by the capability of underlying constraint solvers. As a result, many paths involving complex features remain unanalyzed or insufficiently modeled. Recent advances in large language models (LLMs) have shown promise in generating diverse and valid test inputs. Yet, LLMs lack mechanisms for systematically enumerating program paths and often fail to cover subtle corner cases. We observe that directly prompting an LLM with the full program leads to missed coverage of interesting paths. In this paper, we present PALM, a test generation system that combines symbolic path enumeration with LLM-assisted test generation. PALM statically enumerates possible paths through AST-level analysis and transforms each into an executable variant with embedded assertions that specify the target path. This avoids the need to translate path constraints into SMT formulas, by instead constructing program variants that the LLM can interpret. Importantly, PALM provides an interactive frontend that visualizes path coverage alongside generated tests, assembling tests based on the specific paths they exercise. A user study with 12 participants demonstrates that PALM's frontend helps users better understand path coverage and identify which paths are actually exercised by PALM-generated tests through verification and visualization of their path profiles.

Paper Structure

This paper contains 36 sections, 12 figures, 7 tables, 3 algorithms.

Figures (12)

  • Figure 1: This code snippet parses the argument following "-f" as a file path. While GPT-4o generates tests covering typical cases, it misses the edge case {"-f","-v"}, where "-v" is mistakenly interpreted as the file path due to the lack of validation. This path is also challenging for symbolic execution-based testing as SMT solvers like Z3 and CVC5 do not support string operations such as equalsIgnoreCase.
  • Figure 2: PALM user interface. ●1 Code editor and symbolic-execution settings. ●2 Built-in example selector. ●3 Start symbolic execution and test generation. ●4 Symbolic execution tree (leaf nodes show coverage). ●5 Select a path (click a leaf). ●6 Path-specific program variant (assertions encode branch decisions). ●7 Iterative test-generation history. ●8 Prompt for the selected test. ●9 Test editor. ●A Verify whether a test exercises the selected path. ●B Locate the corresponding path for a given test.
  • Figure 3: PALM has two phases: (1) enumerate paths and synthesize executable path-specific variants via loop unrolling, inlining, renaming, and constant propagation/folding; (2) traverse the path tree, call an LLM to generate inputs, and iteratively validate them by execution, regenerating with feedback when a test misses the intended path.
  • Figure 5: Code snippet from any_int. The condition (int)x == x checks whether x is an integer (i.e., has no fractional part). The highlighted else-branch corresponds to inputs where at least one of x, y, or z is not an integer. LLM-generated tests (GPT-4o-mini) fail to cover this branch, whereas PALM covers with any_int(3.0,1.1,2.0).
  • Figure 6: Test coverage progress of PALM with $k$ rounds of trial for each program path using three LLM backends
  • ...and 7 more figures