Table of Contents
Fetching ...

Towards Explainable Test Case Prioritisation with Learning-to-Rank Models

Aurora Ramírez, Mario Berrios, José Raúl Romero, Robert Feldt

TL;DR

The paper addresses the explainability gap in learning-to-rank based test case prioritisation (TCP) for regression testing. It argues for both global and local explanations, extending to cross-build temporal analyses, and proposes concrete scenarios and methods (e.g., Break Down, contrastive and counterfactual explanations) to illuminate why rankings arise and how they evolve. Through a preliminary study on the angel dataset using LambdaMART and Break Down, it shows that global feature contributions align with prior findings and that local explanations can reveal similar drivers for top-ranked failures, while also highlighting time-dependent variations across builds. The work outlines open issues and motivates future research on interpretable LTR and industrial evaluation to make TCP explanations actionable in practice.

Abstract

Test case prioritisation (TCP) is a critical task in regression testing to ensure quality as software evolves. Machine learning has become a common way to achieve it. In particular, learning-to-rank (LTR) algorithms provide an effective method of ordering and prioritising test cases. However, their use poses a challenge in terms of explainability, both globally at the model level and locally for particular results. Here, we present and discuss scenarios that require different explanations and how the particularities of TCP (multiple builds over time, test case and test suite variations, etc.) could influence them. We include a preliminary experiment to analyse the similarity of explanations, showing that they do not only vary depending on test case-specific predictions, but also on the relative ranks.

Towards Explainable Test Case Prioritisation with Learning-to-Rank Models

TL;DR

The paper addresses the explainability gap in learning-to-rank based test case prioritisation (TCP) for regression testing. It argues for both global and local explanations, extending to cross-build temporal analyses, and proposes concrete scenarios and methods (e.g., Break Down, contrastive and counterfactual explanations) to illuminate why rankings arise and how they evolve. Through a preliminary study on the angel dataset using LambdaMART and Break Down, it shows that global feature contributions align with prior findings and that local explanations can reveal similar drivers for top-ranked failures, while also highlighting time-dependent variations across builds. The work outlines open issues and motivates future research on interpretable LTR and industrial evaluation to make TCP explanations actionable in practice.

Abstract

Test case prioritisation (TCP) is a critical task in regression testing to ensure quality as software evolves. Machine learning has become a common way to achieve it. In particular, learning-to-rank (LTR) algorithms provide an effective method of ordering and prioritising test cases. However, their use poses a challenge in terms of explainability, both globally at the model level and locally for particular results. Here, we present and discuss scenarios that require different explanations and how the particularities of TCP (multiple builds over time, test case and test suite variations, etc.) could influence them. We include a preliminary experiment to analyse the similarity of explanations, showing that they do not only vary depending on test case-specific predictions, but also on the relative ranks.
Paper Structure (11 sections, 2 figures, 1 table)

This paper contains 11 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Relative ranking positions of test cases across builds (angel system).
  • Figure 2: Feature contributions for the test case at top of the predicted ranking.