Table of Contents
Fetching ...

Advancing Code Coverage: Incorporating Program Analysis with Large Language Models

Chen Yang, Junjie Chen, Bin Lin, Ziqi Wang, Jianyi Zhou

TL;DR

TELPA addresses persistent gaps in automatic test generation where branches are hard to cover due to complex object construction and inter-procedural dependencies. It fuses program-analysis-driven prompting (object construction and branch dependency analyses) with a counter-example guided, feedback-based LLM workflow to generate diverse, high-coverage tests. Extensive experiments on 27 Python projects and four Java projects show TELPA outperforms state-of-the-art SBST and LLM-based methods, especially on hard-to-cover branches, with ablations confirming the value of each component. The work demonstrates practical generalizability and provides a scalable framework for integrating static analyses with LLMs in software testing.

Abstract

Automatic test generation plays a critical role in software quality assurance. While the recent advances in Search-Based Software Testing (SBST) and Large Language Models (LLMs) have shown promise in generating useful tests, these techniques still struggle to cover certain branches. Reaching these hard-to-cover branches usually requires constructing complex objects and resolving intricate inter-procedural dependencies in branch conditions, which poses significant challenges for existing test generation techniques. In this work, we propose TELPA, a novel technique aimed at addressing these challenges. Its key insight lies in extracting real usage scenarios of the target method under test to learn how to construct complex objects and extracting methods entailing inter-procedural dependencies with hard-to-cover branches to learn the semantics of branch constraints. To enhance efficiency and effectiveness, TELPA identifies a set of ineffective tests as counter-examples for LLMs and employs a feedback-based process to iteratively refine these counter-examples. Then, TELPA integrates program analysis results and counter-examples into the prompt, guiding LLMs to gain deeper understandings of the semantics of the target method and generate diverse tests that can reach the hard-to-cover branches. Our experimental results on 27 open-source Python projects demonstrate that TELPA significantly outperforms the state-of-the-art SBST and LLM-based techniques, achieving an average improvement of 31.39% and 22.22% in terms of branch coverage.

Advancing Code Coverage: Incorporating Program Analysis with Large Language Models

TL;DR

TELPA addresses persistent gaps in automatic test generation where branches are hard to cover due to complex object construction and inter-procedural dependencies. It fuses program-analysis-driven prompting (object construction and branch dependency analyses) with a counter-example guided, feedback-based LLM workflow to generate diverse, high-coverage tests. Extensive experiments on 27 Python projects and four Java projects show TELPA outperforms state-of-the-art SBST and LLM-based methods, especially on hard-to-cover branches, with ablations confirming the value of each component. The work demonstrates practical generalizability and provides a scalable framework for integrating static analyses with LLMs in software testing.

Abstract

Automatic test generation plays a critical role in software quality assurance. While the recent advances in Search-Based Software Testing (SBST) and Large Language Models (LLMs) have shown promise in generating useful tests, these techniques still struggle to cover certain branches. Reaching these hard-to-cover branches usually requires constructing complex objects and resolving intricate inter-procedural dependencies in branch conditions, which poses significant challenges for existing test generation techniques. In this work, we propose TELPA, a novel technique aimed at addressing these challenges. Its key insight lies in extracting real usage scenarios of the target method under test to learn how to construct complex objects and extracting methods entailing inter-procedural dependencies with hard-to-cover branches to learn the semantics of branch constraints. To enhance efficiency and effectiveness, TELPA identifies a set of ineffective tests as counter-examples for LLMs and employs a feedback-based process to iteratively refine these counter-examples. Then, TELPA integrates program analysis results and counter-examples into the prompt, guiding LLMs to gain deeper understandings of the semantics of the target method and generate diverse tests that can reach the hard-to-cover branches. Our experimental results on 27 open-source Python projects demonstrate that TELPA significantly outperforms the state-of-the-art SBST and LLM-based techniques, achieving an average improvement of 31.39% and 22.22% in terms of branch coverage.
Paper Structure (27 sections, 11 figures, 9 tables)

This paper contains 27 sections, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Target method set_definitions
  • Figure 2: Target method _is_public_family
  • Figure 3: Overview of TELPA
  • Figure 4: Illustrative Example
  • Figure 5: Example of object construction analysis
  • ...and 6 more figures