Table of Contents
Fetching ...

Scalable Similarity-Aware Test Suite Minimization with Reinforcement Learning

Sijia Gu, Ali Mesbah

TL;DR

TripRL tackles the NP-hard Multi-Criteria Test Suite Minimization problem by marrying an ILP formulation (minimizing reduced test size and pairwise similarity while preserving statement coverage and known fault detection) with a bipartite-graph embedding and reinforcement learning (PPO) framework. By representing the problem as a bipartite graph and learning via a vectorized RL environment fed with GEBE^p embeddings, TripRL achieves scalable, near-optimal reductions on large Defects4J suites, with runtime growing as $O(n)$ with problem size. Empirical results show TripRL preserves statement coverage, achieves 100% known-fault coverage, and yields higher mutation scores than baselines, while maintaining runtimes under 47 minutes on large cases. This approach markedly improves practicality for large industrial test suites and offers a robust mechanism to enhance detection of unknown faults through diversity, with potential to serve as warm-starts for ILP solvers and to generalize to broader software engineering minimization tasks.

Abstract

The Multi-Criteria Test Suite Minimization (MCTSM) problem aims to remove redundant test cases, guided by adequacy criteria such as code coverage or fault detection capability. However, current techniques either exhibit a high loss of fault detection ability or face scalability challenges due to the NP-hard nature of the problem, which limits their practical utility. We propose TripRL, a novel technique that integrates traditional criteria such as statement coverage and fault detection ability with test coverage similarity into an Integer Linear Program (ILP), to produce a diverse reduced test suite with high test effectiveness. TripRL leverages bipartite graph representation and its embedding for concise ILP formulation and combines ILP with effective reinforcement learning (RL) training. This combination renders large-scale test suite minimization more scalable and enhances test effectiveness. Our empirical evaluations demonstrate that TripRL's runtime scales linearly with the magnitude of the MCTSM problem. Notably, for large test suites from the Defects4j dataset where existing approaches fail to provide solutions within a reasonable time frame, our technique consistently delivers solutions in less than 47 minutes. The reduced test suites produced by TripRL also maintain the original statement coverage and fault detection ability while having a higher potential to detect unknown faults.

Scalable Similarity-Aware Test Suite Minimization with Reinforcement Learning

TL;DR

TripRL tackles the NP-hard Multi-Criteria Test Suite Minimization problem by marrying an ILP formulation (minimizing reduced test size and pairwise similarity while preserving statement coverage and known fault detection) with a bipartite-graph embedding and reinforcement learning (PPO) framework. By representing the problem as a bipartite graph and learning via a vectorized RL environment fed with GEBE^p embeddings, TripRL achieves scalable, near-optimal reductions on large Defects4J suites, with runtime growing as with problem size. Empirical results show TripRL preserves statement coverage, achieves 100% known-fault coverage, and yields higher mutation scores than baselines, while maintaining runtimes under 47 minutes on large cases. This approach markedly improves practicality for large industrial test suites and offers a robust mechanism to enhance detection of unknown faults through diversity, with potential to serve as warm-starts for ILP solvers and to generalize to broader software engineering minimization tasks.

Abstract

The Multi-Criteria Test Suite Minimization (MCTSM) problem aims to remove redundant test cases, guided by adequacy criteria such as code coverage or fault detection capability. However, current techniques either exhibit a high loss of fault detection ability or face scalability challenges due to the NP-hard nature of the problem, which limits their practical utility. We propose TripRL, a novel technique that integrates traditional criteria such as statement coverage and fault detection ability with test coverage similarity into an Integer Linear Program (ILP), to produce a diverse reduced test suite with high test effectiveness. TripRL leverages bipartite graph representation and its embedding for concise ILP formulation and combines ILP with effective reinforcement learning (RL) training. This combination renders large-scale test suite minimization more scalable and enhances test effectiveness. Our empirical evaluations demonstrate that TripRL's runtime scales linearly with the magnitude of the MCTSM problem. Notably, for large test suites from the Defects4j dataset where existing approaches fail to provide solutions within a reasonable time frame, our technique consistently delivers solutions in less than 47 minutes. The reduced test suites produced by TripRL also maintain the original statement coverage and fault detection ability while having a higher potential to detect unknown faults.
Paper Structure (18 sections, 2 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 18 sections, 2 equations, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: Bipartite Graph representation for Table \ref{['tab:example']}
  • Figure 2: Overview of TripRL.
  • Figure 3: Reduced test suite size
  • Figure 4: Fault detection rate comparison
  • Figure 5: Bar plot with error bars of mutation scores among different techniques
  • ...and 1 more figures

Theorems & Definitions (1)

  • definition 1: Set Covering Problem