Table of Contents
Fetching ...

Practical Pipeline-Aware Regression Test Optimization for Continuous Integration

Daniel Schwendner, Maximilian Jungwirth, Martin Gruber, Martin Knoche, Daniel Merget, Gordon Fraser

TL;DR

This work tackles the efficiency of continuous integration in massive monorepos by introducing a pipeline-aware, language-agnostic regression test optimization method based on reinforcement learning. The approach uses a deep Q-network with lightweight, history-based features and two pipeline-specific reward functions (CostRank for pre-submit and CostChangeRank for post-submit) to prioritize or select tests under resource budgets. Evaluations on two 20-week industrial datasets from BMW show that the method (PR-DQL) outperforms state-of-the-art baselines, placing the first failing test earlier in pre-submit and reliably identifying developer-relevant transitions in post-submit with substantial budget savings. The results demonstrate substantial reductions in feedback latency and CI resource consumption, highlighting practical impact for large-scale CI in fast-evolving multilang monorepos.

Abstract

Massive, multi-language, monolithic repositories form the backbone of many modern, complex software systems. To ensure consistent code quality while still allowing fast development cycles, Continuous Integration (CI) is commonly applied. However, operating CI at such scale not only leads to a single point of failure for many developers, but also requires computational resources that may reach feasibility limits and cause long feedback latencies. To address these issues, developers commonly split test executions across multiple pipelines, running small and fast tests in pre-submit stages while executing long-running and flaky tests in post-submit pipelines. Given the long runtimes of many pipelines and the substantial proportion of passing test executions (98% in our pre-submit pipelines), there not only a need but also potential for further improvements by prioritizing and selecting tests. However, many previously proposed regression optimization techniques are unfit for an industrial context, because they (1) rely on complex and difficult-to-obtain features like per-test code coverage that are not feasible in large, multi-language environments, (2) do not automatically adapt to rapidly changing systems where new tests are continuously added or modified, and (3) are not designed to distinguish the different objectives of pre- and post-submit pipelines: While pre-submit testing should prioritize failing tests, post-submit pipelines should prioritize tests that indicate non-flaky changes by transitioning from pass to fail outcomes or vice versa. To overcome these issues, we developed a lightweight and pipeline-aware regression test optimization approach that employs Reinforcement Learning models trained on language-agnostic features. We evaluated our approach on a large industry dataset collected over a span of 20 weeks of CI test executions. When predicting...

Practical Pipeline-Aware Regression Test Optimization for Continuous Integration

TL;DR

This work tackles the efficiency of continuous integration in massive monorepos by introducing a pipeline-aware, language-agnostic regression test optimization method based on reinforcement learning. The approach uses a deep Q-network with lightweight, history-based features and two pipeline-specific reward functions (CostRank for pre-submit and CostChangeRank for post-submit) to prioritize or select tests under resource budgets. Evaluations on two 20-week industrial datasets from BMW show that the method (PR-DQL) outperforms state-of-the-art baselines, placing the first failing test earlier in pre-submit and reliably identifying developer-relevant transitions in post-submit with substantial budget savings. The results demonstrate substantial reductions in feedback latency and CI resource consumption, highlighting practical impact for large-scale CI in fast-evolving multilang monorepos.

Abstract

Massive, multi-language, monolithic repositories form the backbone of many modern, complex software systems. To ensure consistent code quality while still allowing fast development cycles, Continuous Integration (CI) is commonly applied. However, operating CI at such scale not only leads to a single point of failure for many developers, but also requires computational resources that may reach feasibility limits and cause long feedback latencies. To address these issues, developers commonly split test executions across multiple pipelines, running small and fast tests in pre-submit stages while executing long-running and flaky tests in post-submit pipelines. Given the long runtimes of many pipelines and the substantial proportion of passing test executions (98% in our pre-submit pipelines), there not only a need but also potential for further improvements by prioritizing and selecting tests. However, many previously proposed regression optimization techniques are unfit for an industrial context, because they (1) rely on complex and difficult-to-obtain features like per-test code coverage that are not feasible in large, multi-language environments, (2) do not automatically adapt to rapidly changing systems where new tests are continuously added or modified, and (3) are not designed to distinguish the different objectives of pre- and post-submit pipelines: While pre-submit testing should prioritize failing tests, post-submit pipelines should prioritize tests that indicate non-flaky changes by transitioning from pass to fail outcomes or vice versa. To overcome these issues, we developed a lightweight and pipeline-aware regression test optimization approach that employs Reinforcement Learning models trained on language-agnostic features. We evaluated our approach on a large industry dataset collected over a span of 20 weeks of CI test executions. When predicting...
Paper Structure (29 sections, 5 equations, 8 figures, 5 tables)

This paper contains 29 sections, 5 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Comparison of the test target execution duration (seconds) between our pre- and post-submit pipelines.
  • Figure 2: Overview of the executed CI pipelines in the software development process. For each code change, Zuul executes three pipelines. The check and gate pipelines are executed pre-submit, whereas the post pipeline is executed post-submit.
  • Figure 3: Test execution results over consecutive CI cycles (post-submit pipelines). Only the blue marked test results are relevant for developers, as they indicate changes in code quality.
  • Figure 4: Test target prioritization and selection process.
  • Figure 5: Overview of our regression test prioritization and selection infrastructure: The RL agent estimates a priority score being the state-action values using a deep Q-network. The RL environment schedules the test suite $\mathcal{T}_t'$ and provides a reward based on the test results. Historical CI execution results are stored in a database.
  • ...and 3 more figures