Table of Contents
Fetching ...

When Automated Program Repair Meets Regression Testing -- An Extensive Study on 2 Million Patches

Yiling Lou, Jun Yang, Samuel Benton, Dan Hao, Lin Tan, Zhenpeng Chen, Lu Zhang, Lingming Zhang

TL;DR

The paper tackles the high cost of automated program repair by studying regression test selection as a means to reduce patch validation effort. It presents the first extensive, controlled analysis of RTS across 12 APR systems on over 2M patches using the Defects4J benchmark, examining class/method/statement granularities and interactions with test prioritization. Key findings show that RTS substantially lowers test executions, with method- and statement-level RTS outperforming class-level RTS, and that combining RTS with prioritization can yield further gains. The work provides actionable guidelines for integrating RTS into APR practice and highlights directions for future research, including extending analyses to more languages and novel APR approaches.

Abstract

In recent years, Automated Program Repair (APR) has been extensively studied in academia and even drawn wide attention from industry. However, APR techniques can be extremely time consuming since (1) a large number of patches can be generated for a given bug, and (2) each patch needs to be executed on the original tests to ensure its correctness. In the literature, various techniques (e.g., based on learning, mining, and constraint solving) have been proposed/studied to reduce the number of patches. Intuitively, every patch can be treated as a software revision during regression testing; thus, traditional Regression Test Selection (RTS) techniques can be leveraged to only execute the tests affected by each patch (as the other tests would keep the same outcomes) to further reduce patch execution time. However, few APR systems actually adopt RTS and there is still a lack of systematic studies demonstrating the benefits of RTS and the impact of different RTS strategies on APR. To this end, this paper presents the first extensive study of widely-used RTS techniques at different levels (i.e., class/method/statement levels) for 12 state-of-the-art APR systems on over 2M patches. Our study reveals various practical guidelines for bridging the gap between APR and regression testing.

When Automated Program Repair Meets Regression Testing -- An Extensive Study on 2 Million Patches

TL;DR

The paper tackles the high cost of automated program repair by studying regression test selection as a means to reduce patch validation effort. It presents the first extensive, controlled analysis of RTS across 12 APR systems on over 2M patches using the Defects4J benchmark, examining class/method/statement granularities and interactions with test prioritization. Key findings show that RTS substantially lowers test executions, with method- and statement-level RTS outperforming class-level RTS, and that combining RTS with prioritization can yield further gains. The work provides actionable guidelines for integrating RTS into APR practice and highlights directions for future research, including extending analyses to more languages and novel APR approaches.

Abstract

In recent years, Automated Program Repair (APR) has been extensively studied in academia and even drawn wide attention from industry. However, APR techniques can be extremely time consuming since (1) a large number of patches can be generated for a given bug, and (2) each patch needs to be executed on the original tests to ensure its correctness. In the literature, various techniques (e.g., based on learning, mining, and constraint solving) have been proposed/studied to reduce the number of patches. Intuitively, every patch can be treated as a software revision during regression testing; thus, traditional Regression Test Selection (RTS) techniques can be leveraged to only execute the tests affected by each patch (as the other tests would keep the same outcomes) to further reduce patch execution time. However, few APR systems actually adopt RTS and there is still a lack of systematic studies demonstrating the benefits of RTS and the impact of different RTS strategies on APR. To this end, this paper presents the first extensive study of widely-used RTS techniques at different levels (i.e., class/method/statement levels) for 12 state-of-the-art APR systems on over 2M patches. Our study reveals various practical guidelines for bridging the gap between APR and regression testing.

Paper Structure

This paper contains 26 sections, 2 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Ratio of patches of different fixing capabilities
  • Figure 2: Ratio of patches of different fixing code scopes
  • Figure 3: Reduction with full/partial matrices
  • Figure 4: Reduction in test time vs. # of test executions
  • Figure 5: Plausible Patches with different RTS strategies

Theorems & Definitions (4)

  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Definition 3.4