Practical Pipeline-Aware Regression Test Optimization for Continuous Integration
Daniel Schwendner, Maximilian Jungwirth, Martin Gruber, Martin Knoche, Daniel Merget, Gordon Fraser
TL;DR
This work tackles the efficiency of continuous integration in massive monorepos by introducing a pipeline-aware, language-agnostic regression test optimization method based on reinforcement learning. The approach uses a deep Q-network with lightweight, history-based features and two pipeline-specific reward functions (CostRank for pre-submit and CostChangeRank for post-submit) to prioritize or select tests under resource budgets. Evaluations on two 20-week industrial datasets from BMW show that the method (PR-DQL) outperforms state-of-the-art baselines, placing the first failing test earlier in pre-submit and reliably identifying developer-relevant transitions in post-submit with substantial budget savings. The results demonstrate substantial reductions in feedback latency and CI resource consumption, highlighting practical impact for large-scale CI in fast-evolving multilang monorepos.
Abstract
Massive, multi-language, monolithic repositories form the backbone of many modern, complex software systems. To ensure consistent code quality while still allowing fast development cycles, Continuous Integration (CI) is commonly applied. However, operating CI at such scale not only leads to a single point of failure for many developers, but also requires computational resources that may reach feasibility limits and cause long feedback latencies. To address these issues, developers commonly split test executions across multiple pipelines, running small and fast tests in pre-submit stages while executing long-running and flaky tests in post-submit pipelines. Given the long runtimes of many pipelines and the substantial proportion of passing test executions (98% in our pre-submit pipelines), there not only a need but also potential for further improvements by prioritizing and selecting tests. However, many previously proposed regression optimization techniques are unfit for an industrial context, because they (1) rely on complex and difficult-to-obtain features like per-test code coverage that are not feasible in large, multi-language environments, (2) do not automatically adapt to rapidly changing systems where new tests are continuously added or modified, and (3) are not designed to distinguish the different objectives of pre- and post-submit pipelines: While pre-submit testing should prioritize failing tests, post-submit pipelines should prioritize tests that indicate non-flaky changes by transitioning from pass to fail outcomes or vice versa. To overcome these issues, we developed a lightweight and pipeline-aware regression test optimization approach that employs Reinforcement Learning models trained on language-agnostic features. We evaluated our approach on a large industry dataset collected over a span of 20 weeks of CI test executions. When predicting...
