Table of Contents
Fetching ...

On Rank Aggregating Test Prioritizations

Shouvick Mondal, Tse-Hsun Chen

TL;DR

The paper addresses the challenge of robust test-case prioritization by introducing Ensemble Test Prioritization (EnTP), a three-stage pipeline that combines diversity-based ensemble selection with social-choice based rank aggregation to derive consensus prioritizations for system-level regression tests. It leverages 16 standalone heuristics to form a 25-permutation ensemble, uses Kendall-tau distance to select diverse subsets, and aggregates them via Kemeny-Young, Borda-count, or mean/median methods to schedule tests. Empirical evaluation on 20 open-source C projects (694,512 SLOC, 280 versions, 69,305 test-cases) shows that EnTP with a top-75% diversity budget often outperforms standalone heuristics and state-of-the-art approaches, particularly in cost-aware metrics like $APFD_c$ under highly imbalanced test costs. The work demonstrates the practical value of consensus-based TCP and provides public artifacts to support replication, with future directions including broader benchmarks, CI integration, and deeper exploration of domain-aware diversity. $

Abstract

Test case prioritization (TCP) has been an effective strategy to optimize regression testing. Traditionally, test cases are ordered based on some heuristic and rerun against the version under test with the goal of yielding a high failure throughput. Almost four decades of TCP research has seen extensive contributions in the light of individual prioritization strategies. However, test case prioritization via preference aggregation has largely been unexplored. We envision this methodology as an opportunity to obtain robust prioritizations by consolidating multiple standalone ranked lists, i.e., performing a consensus. In this work, we propose Ensemble Test Prioritization (EnTP) as a three stage pipeline involving: (i) ensemble selection, (ii) rank aggregation, and (iii) test case execution. We evaluate EnTP on 20 open-source C projects from the Software-artifact Infrastructure Repository and GitHub (totaling: 694,512 SLOC, 280 versions, and 69,305 system level test-cases). We employ an ensemble of 16 standalone prioritization plans, four of which are imposed due to respective state-of-the-art approaches. We build EnTP on the foundations of Hansie, an existing framework on consensus prioritization and show that EnTP's diversity based ensemble selection budget of top-75% followed by rank aggregation can outperform Hansie, and the employed standalone prioritization approaches.

On Rank Aggregating Test Prioritizations

TL;DR

The paper addresses the challenge of robust test-case prioritization by introducing Ensemble Test Prioritization (EnTP), a three-stage pipeline that combines diversity-based ensemble selection with social-choice based rank aggregation to derive consensus prioritizations for system-level regression tests. It leverages 16 standalone heuristics to form a 25-permutation ensemble, uses Kendall-tau distance to select diverse subsets, and aggregates them via Kemeny-Young, Borda-count, or mean/median methods to schedule tests. Empirical evaluation on 20 open-source C projects (694,512 SLOC, 280 versions, 69,305 test-cases) shows that EnTP with a top-75% diversity budget often outperforms standalone heuristics and state-of-the-art approaches, particularly in cost-aware metrics like under highly imbalanced test costs. The work demonstrates the practical value of consensus-based TCP and provides public artifacts to support replication, with future directions including broader benchmarks, CI integration, and deeper exploration of domain-aware diversity. $

Abstract

Test case prioritization (TCP) has been an effective strategy to optimize regression testing. Traditionally, test cases are ordered based on some heuristic and rerun against the version under test with the goal of yielding a high failure throughput. Almost four decades of TCP research has seen extensive contributions in the light of individual prioritization strategies. However, test case prioritization via preference aggregation has largely been unexplored. We envision this methodology as an opportunity to obtain robust prioritizations by consolidating multiple standalone ranked lists, i.e., performing a consensus. In this work, we propose Ensemble Test Prioritization (EnTP) as a three stage pipeline involving: (i) ensemble selection, (ii) rank aggregation, and (iii) test case execution. We evaluate EnTP on 20 open-source C projects from the Software-artifact Infrastructure Repository and GitHub (totaling: 694,512 SLOC, 280 versions, and 69,305 system level test-cases). We employ an ensemble of 16 standalone prioritization plans, four of which are imposed due to respective state-of-the-art approaches. We build EnTP on the foundations of Hansie, an existing framework on consensus prioritization and show that EnTP's diversity based ensemble selection budget of top-75% followed by rank aggregation can outperform Hansie, and the employed standalone prioritization approaches.

Paper Structure

This paper contains 26 sections, 5 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Original C source (source.c from hansie_2020) and its modified revision (source-v1.c). Test-suite description appears below the code.
  • Figure 2: Conventional prioritization ranking (top), versus consensus prioritization by rank aggregation (bottom). Failed test cases are marked in red.
  • Figure 3: Workflow of EnTP.
  • Figure 4: Execution of consensus prioritization as per aggregated preferences.
  • Figure 5: Distribution of test case costs within the full test-suite at base version $v_0$ for our dataset. (boxplot: a datapoint represents a test case).
  • ...and 8 more figures