Table of Contents
Fetching ...

Practitioners' Challenges and Perceptions of CI Build Failure Predictions at Atlassian

Yang Hong, Chakkrit Tantithamthavorn, Jirat Pasuksmit, Patanamon Thongtanunam, Arik Friedman, Xing Zhao, Anton Krasikov

TL;DR

This study investigates CI build failures at Atlassian and evaluates CI build prediction within Bitbucket through an empirical mix of large-scale internal data analysis and practitioner surveys. An analysis of $350{,}037$ PRs across $1{,}630$ projects yields an $AUC$ of $0.82$ for a logistic regression model, with repository-history signals identified as the strongest predictors. Qualitative insights from 53 practitioners reveal that predictions offer proactive value but raise concerns about accuracy and over-reliance, underscoring the need for context-aware explanations. The work provides industry-grounded guidance for integrating CI build failure predictions into CI workflows, emphasizing human-in-the-loop design and explainability to enhance adoption and impact.

Abstract

Continuous Integration (CI) build failures could significantly impact the software development process and teams, such as delaying the release of new features and reducing developers' productivity. In this work, we report on an empirical study that investigates CI build failures throughout product development at Atlassian. Our quantitative analysis found that the repository dimension is the key factor influencing CI build failures. In addition, our qualitative survey revealed that Atlassian developers perceive CI build failures as challenging issues in practice. Furthermore, we found that the CI build prediction can not only provide proactive insight into CI build failures but also facilitate the team's decision-making. Our study sheds light on the challenges and expectations involved in integrating CI build prediction tools into the Bitbucket environment, providing valuable insights for enhancing CI processes.

Practitioners' Challenges and Perceptions of CI Build Failure Predictions at Atlassian

TL;DR

This study investigates CI build failures at Atlassian and evaluates CI build prediction within Bitbucket through an empirical mix of large-scale internal data analysis and practitioner surveys. An analysis of PRs across projects yields an of for a logistic regression model, with repository-history signals identified as the strongest predictors. Qualitative insights from 53 practitioners reveal that predictions offer proactive value but raise concerns about accuracy and over-reliance, underscoring the need for context-aware explanations. The work provides industry-grounded guidance for integrating CI build failure predictions into CI workflows, emphasizing human-in-the-loop design and explainability to enhance adoption and impact.

Abstract

Continuous Integration (CI) build failures could significantly impact the software development process and teams, such as delaying the release of new features and reducing developers' productivity. In this work, we report on an empirical study that investigates CI build failures throughout product development at Atlassian. Our quantitative analysis found that the repository dimension is the key factor influencing CI build failures. In addition, our qualitative survey revealed that Atlassian developers perceive CI build failures as challenging issues in practice. Furthermore, we found that the CI build prediction can not only provide proactive insight into CI build failures but also facilitate the team's decision-making. Our study sheds light on the challenges and expectations involved in integrating CI build prediction tools into the Bitbucket environment, providing valuable insights for enhancing CI processes.
Paper Structure (17 sections, 4 figures, 5 tables)

This paper contains 17 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The potential usage scenario of the CI build predictions within the CI build process at Atlassian.
  • Figure 2: The relationship between the explanatory variable (x-axis) and the likelihood of a build failing (y-axis). The larger the odds value, the higher the likelihood that a build will fail. The gray area represents the 95% confidence interval.
  • Figure 3: A UI prototype with mocked data and the prediction of the CI build outcome. (1) shows the likelihood that the build will fail. (2) suggests the factors that are associated with the CI build failure. (3) explains the factor that may result in CI build failure. (4) provides a suggestion that may reduce the likelihood of CI build failure.
  • Figure 4: Practitioners' agreement on the explanations and suggestions based on PR-dimension factors.