LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models

Rongqi Pan; Taher A. Ghaleb; Lionel Briand

LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models

Rongqi Pan, Taher A. Ghaleb, Lionel Briand

TL;DR

This work proposes LTM, a novel, scalable, and black-box similarity-based TSM approach based on large language models (LLMs), which is the first application of LLMs in the context of TSM, and investigates five different pre-trained language models to support similarity measurement using test method embeddings.

Abstract

Test suites tend to grow when software evolves, making it often infeasible to execute all test cases with the allocated testing budgets, especially for large software systems. Test suite minimization (TSM) is employed to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources, while maintaining the fault detection capability of the test suite. Most existing TSM approaches rely on code coverage (white-box) or model-based features, which are not always available to test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. To address the scalability, we propose LTM (Language model-based Test suite Minimization), a novel, scalable, and black-box similarity-based TSM approach based on large language models (LLMs), which is the first application of LLMs in the context of TSM. To support similarity measurement for test code embeddings, we investigate five pre-trained language models: CodeBERT, GraphCodeBERT, UniXcoder, StarEncoder, and CodeLlama, on which we compute two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA) to search for optimal minimized test suites, thus reducing the overall search time. Experimental results show that the best configuration of LTM (UniXcoder/Cosine) outperforms ATM in three aspects: (a) achieving a slightly greater saving rate of testing time (41.72% versus 41.02%, on average); (b) attaining a significantly higher fault detection rate (0.84 versus 0.81, on average); and, most importantly, (c) minimizing test suites nearly five times faster on average, with higher gains for larger test suites and systems, thus achieving much higher scalability.

LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models

TL;DR

Abstract

Paper Structure (28 sections, 8 equations, 4 figures, 5 tables)

This paper contains 28 sections, 8 equations, 4 figures, 5 tables.

Introduction
Related Work
LTM: Language Model-based Test Suite Minimization
Language Models for Test Method Representation
Tokenizing Test Methods
Generating Test Method Embeddings
Similarity Measurement of Test Method Embeddings
Search-based Test Suite Minimization
Validation
Research Questions
Experimental Design and Dataset
Baseline approach (ATM)
Dataset
Evaluation metrics
Results
...and 13 more sections

Figures (4)

Figure 1: Main steps of LTM to perform test suite minimization
Figure 2: An example of how similarity values of pairs of test cases are represented using a matrix. Values on and below the diagonal are set to 0 as they are either useless or duplicates.
Figure 3: FDR across projects for each generation of LTM and ATM
Figure 4: Scatter plots of the number of test cases and MT, preparation time, and search time (in min), for LTM (UniXcoder/Cosine) and ATM, across all the $661$ project versions for the 50% minimization budget

LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models

TL;DR

Abstract

LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)