Table of Contents
Fetching ...

Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM

Gabriel Ryan, Siddhartha Jain, Mingyue Shang, Shiqi Wang, Xiaofei Ma, Murali Krishna Ramanathan, Baishakhi Ray

TL;DR

The paper tackles the challenge of low-coverage test generation in regression testing by introducing SymPrompt, a code-aware, multi-stage prompting framework for LLMs. It static-analyzes the focal method to extract approximate path constraints and relevant code context, then constructs path-specific prompts that guide the LLM to generate tests targeting distinct execution paths. By leveraging path constraint prompts, type/dependency context, and iterative prompt refinement, SymPrompt achieves substantial gains in correct test generations and coverage on Python benchmarks, with pronounced benefits for larger models like GPT-4. This approach bridges symbolic-like reasoning with LLM capabilities, offering a practical path to higher-quality automated tests without additional training.

Abstract

Testing plays a pivotal role in ensuring software quality, yet conventional Search Based Software Testing (SBST) methods often struggle with complex software units, achieving suboptimal test coverage. Recent works using large language models (LLMs) for test generation have focused on improving generation quality through optimizing the test generation context and correcting errors in model outputs, but use fixed prompting strategies that prompt the model to generate tests without additional guidance. As a result LLM-generated testsuites still suffer from low coverage. In this paper, we present SymPrompt, a code-aware prompting strategy for LLMs in test generation. SymPrompt's approach is based on recent work that demonstrates LLMs can solve more complex logical problems when prompted to reason about the problem in a multi-step fashion. We apply this methodology to test generation by deconstructing the testsuite generation process into a multi-stage sequence, each of which is driven by a specific prompt aligned with the execution paths of the method under test, and exposing relevant type and dependency focal context to the model. Our approach enables pretrained LLMs to generate more complete test cases without any additional training. We implement SymPrompt using the TreeSitter parsing framework and evaluate on a benchmark challenging methods from open source Python projects. SymPrompt enhances correct test generations by a factor of 5 and bolsters relative coverage by 26% for CodeGen2. Notably, when applied to GPT-4, SymPrompt improves coverage by over 2x compared to baseline prompting strategies.

Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM

TL;DR

The paper tackles the challenge of low-coverage test generation in regression testing by introducing SymPrompt, a code-aware, multi-stage prompting framework for LLMs. It static-analyzes the focal method to extract approximate path constraints and relevant code context, then constructs path-specific prompts that guide the LLM to generate tests targeting distinct execution paths. By leveraging path constraint prompts, type/dependency context, and iterative prompt refinement, SymPrompt achieves substantial gains in correct test generations and coverage on Python benchmarks, with pronounced benefits for larger models like GPT-4. This approach bridges symbolic-like reasoning with LLM capabilities, offering a practical path to higher-quality automated tests without additional training.

Abstract

Testing plays a pivotal role in ensuring software quality, yet conventional Search Based Software Testing (SBST) methods often struggle with complex software units, achieving suboptimal test coverage. Recent works using large language models (LLMs) for test generation have focused on improving generation quality through optimizing the test generation context and correcting errors in model outputs, but use fixed prompting strategies that prompt the model to generate tests without additional guidance. As a result LLM-generated testsuites still suffer from low coverage. In this paper, we present SymPrompt, a code-aware prompting strategy for LLMs in test generation. SymPrompt's approach is based on recent work that demonstrates LLMs can solve more complex logical problems when prompted to reason about the problem in a multi-step fashion. We apply this methodology to test generation by deconstructing the testsuite generation process into a multi-stage sequence, each of which is driven by a specific prompt aligned with the execution paths of the method under test, and exposing relevant type and dependency focal context to the model. Our approach enables pretrained LLMs to generate more complete test cases without any additional training. We implement SymPrompt using the TreeSitter parsing framework and evaluate on a benchmark challenging methods from open source Python projects. SymPrompt enhances correct test generations by a factor of 5 and bolsters relative coverage by 26% for CodeGen2. Notably, when applied to GPT-4, SymPrompt improves coverage by over 2x compared to baseline prompting strategies.
Paper Structure (20 sections, 12 figures, 2 tables)

This paper contains 20 sections, 12 figures, 2 tables.

Figures (12)

  • Figure 1: Example test generations from an SBST tool (Pynguin), zero shot LLM (CodeGen2), and SymPrompt prompts for focal method exists_as in the flutils open source Python project. An SBST approach is unable to generate full coverage tests for this method without special configuration because it is unable to generate strings that represent specific types of filesystem objects (e.g., block devices). An LLM conversely is able to generate input strings associated with filesystem objects such as block devices, but in practice will only test a small subset of use cases based on the most likely usage scenarios such as paths to files and directories. SymPrompt constructs path specific prompts to guide the model to generate high coverage testsuites.
  • Figure 2: Workflow for generating path constraint prompts. The focal method shown in Figure \ref{['fig:working_ex_fm']} is first parsed and its abstract syntax tree is traversed in preorder. In step ①, the traversal first visits the first method statement, normalize_path(path), but does not record any information since it is not a branch constraint. In step ②, it then traverses to the first if statement, and records that there is a constraint path.is_dir() that must be satisfied to execute the current path on the AST. It then reaches the return 'directory' under the first if check, and records that there is an execution path where 'directory' is returned when path.is_dir() is true. The preorder traversal next visits the if path.is_file(), return 'file' branch of the AST in ③ and records a second path where path.is_dir() is false and path.is_file() is true, and the return behavior is 'file'. This traveral continues until in step ④, the final return statement is reached, based on an execution path where none of the branch constraints are true. Each collected execution path and return value is then used to construct prompts for test generations that specifies both the path constraints and return behavior for the target test case.
  • Figure 3: Example of generation used by SymPrompt. The prompt exposes both the type and dependency context of the focal method to the model in addition to the path constraint prompt.
  • Figure 4: Overview of SymPrompt's framework for test generation. In Step-I, path constraint collection is performed on the focal method. In Step-II, the type and dependency context from the focal method are parsed from the focal file along with the focal method itself. Finally, in Step-III, prompts for each set of path constraints are then constructed and iteratively passed to the model to generate test cases.
  • Figure 5: Path minimization algorithmic definition and illustration of how path minimization prevents the number of paths from growing exponentially in the number of branches. A method with $n=3$if conditions will have $2^n = 8$ possible execution paths, but applying the algorithm in \ref{['alg:path_minimization']} reduces the number of paths that need to be tested to at most $n+1 = 4$, each of which covers a unique branch condition.
  • ...and 7 more figures