FlakeRanker: Automated Identification and Prioritization of Flaky Job Failure Categories

Henri Aïdasso

FlakeRanker: Automated Identification and Prioritization of Flaky Job Failure Categories

Henri Aïdasso

TL;DR

The paper tackles the problem of wasteful CI/CD reruns caused by flaky job failures by identifying 46 failure categories and prioritizing them using $R$, $F$, and $M$-based measures, culminating in an $RFM$ clustering approach to highlight the most costly and persistent categories. It introduces FlakeRanker, a CLI that streamlines automated labeling, $RFM$ analysis, and clustering-based prioritization, and provides complete analysis results beyond the top results previously reported. The artifact includes full notebooks, the labeling tool, and a Veloren-based open dataset to demonstrate replication and reuse, making the methodology accessible for industry and research contexts. Overall, the work offers a practical, reusable workflow to diagnose, quantify, and prioritize flaky job failures to reduce infrastructure waste in CI/CD pipelines.

Abstract

This document presents the artifact associated with the ICSE SEIP 25 paper titled On the Diagnosis of Flaky Job Failures: Understanding and Prioritizing Failure Categories. The original paper identifies and analyzes 46 distinct categories of flaky job failures that developers encounter, using Recency (R), Frequency (F), and Monetary (M) measures. In addition, it uses an RFM clustering model to identify and prioritize the most wasteful and persistent. The original paper only discusses the rankings and evolution of the top 20 categories in the results. This artifact contains (1) the regex and scripts used to automate the labeling process for RQ1, (2) complete analysis results, including the ranking of all 46 categories by cost in RQ2 and the evolution of these categories over time in RQ3, and (3) the RFM dataset and scripts used to create the RFM clustering model for prioritization in RQ4. In addition, we engineered the labeling tool and the RFM-based prioritization methodology in a command-line interface (CLI) called FLAKERANKER to facilitate reuse and repurposing in future studies.

FlakeRanker: Automated Identification and Prioritization of Flaky Job Failure Categories

TL;DR

The paper tackles the problem of wasteful CI/CD reruns caused by flaky job failures by identifying 46 failure categories and prioritizing them using

, and

-based measures, culminating in an

clustering approach to highlight the most costly and persistent categories. It introduces FlakeRanker, a CLI that streamlines automated labeling,

analysis, and clustering-based prioritization, and provides complete analysis results beyond the top results previously reported. The artifact includes full notebooks, the labeling tool, and a Veloren-based open dataset to demonstrate replication and reuse, making the methodology accessible for industry and research contexts. Overall, the work offers a practical, reusable workflow to diagnose, quantify, and prioritize flaky job failures to reduce infrastructure waste in CI/CD pipelines.

FlakeRanker: Automated Identification and Prioritization of Flaky Job Failure Categories

TL;DR

Abstract

FlakeRanker: Automated Identification and Prioritization of Flaky Job Failure Categories

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)