Table of Contents
Fetching ...

What Happened in This Pipeline? Diffing Build Logs with CiDiff

Nicolas Hubner, Jean-Rémy Falleri, Raluca Uricaru, Thomas Degueule, Thomas Durieux

TL;DR

This paper tackles the challenge of diagnosing CI pipeline regressions by diffing failing and passing build logs. It introduces CiDiff, a log-diff tool that uses a log-line similarity metric and a genome-inspired seed-and-extend strategy to detect updated and moved lines, producing concise edit scripts with six action types. Through large-scale evaluation on 17,906 regression pairs and baselines like LCS-diff, bigrams, and keywords, CiDiff achieves substantially shorter diffs and higher precision while maintaining strong recall, and is preferred by participants in a majority of user-study cases. The work demonstrates practical impact by reducing the number of lines to inspect and delivering a usable open-source tool with a rich dataset, enabling broader adoption for CI failure debugging. Future work points to extending CiDiff to more use cases and integrating parsers or LLM-based similarity enhancements to further improve accuracy and coverage.

Abstract

Continuous integration (CI) is widely used by developers to ensure the quality and reliability of their software projects. However, diagnosing a CI regression is a tedious process that involves the manual analysis of lengthy build logs. In this paper, we explore how textual differencing can support the debugging of CI regressions. As off-the-shelf diff algorithms produce suboptimal results, in this work we introduce a new diff algorithm specifically tailored to build logs called CiDiff. We evaluate CiDiff against several baselines on a novel dataset of 17 906 CI regressions, performing an accuracy study, a quantitative study and a user-study. Notably, our algorithm reduces the number of lines to inspect by about 60 % in the median case, with reasonable overhead compared to the state-of-practice LCS-diff. Finally, our algorithm is preferred by the majority of participants in 70 % of the regression cases, whereas LCS-diff is preferred in only 5 % of the cases.

What Happened in This Pipeline? Diffing Build Logs with CiDiff

TL;DR

This paper tackles the challenge of diagnosing CI pipeline regressions by diffing failing and passing build logs. It introduces CiDiff, a log-diff tool that uses a log-line similarity metric and a genome-inspired seed-and-extend strategy to detect updated and moved lines, producing concise edit scripts with six action types. Through large-scale evaluation on 17,906 regression pairs and baselines like LCS-diff, bigrams, and keywords, CiDiff achieves substantially shorter diffs and higher precision while maintaining strong recall, and is preferred by participants in a majority of user-study cases. The work demonstrates practical impact by reducing the number of lines to inspect and delivering a usable open-source tool with a rich dataset, enabling broader adoption for CI failure debugging. Future work points to extending CiDiff to more use cases and integrating parsers or LLM-based similarity enhancements to further improve accuracy and coverage.

Abstract

Continuous integration (CI) is widely used by developers to ensure the quality and reliability of their software projects. However, diagnosing a CI regression is a tedious process that involves the manual analysis of lengthy build logs. In this paper, we explore how textual differencing can support the debugging of CI regressions. As off-the-shelf diff algorithms produce suboptimal results, in this work we introduce a new diff algorithm specifically tailored to build logs called CiDiff. We evaluate CiDiff against several baselines on a novel dataset of 17 906 CI regressions, performing an accuracy study, a quantitative study and a user-study. Notably, our algorithm reduces the number of lines to inspect by about 60 % in the median case, with reasonable overhead compared to the state-of-practice LCS-diff. Finally, our algorithm is preferred by the majority of participants in 70 % of the regression cases, whereas LCS-diff is preferred in only 5 % of the cases.

Paper Structure

This paper contains 28 sections, 10 figures, 1 algorithm.

Figures (10)

  • Figure 1: Output of a classical diff between two simplified logs. Added (resp. deleted) lines are displayed in green (resp. red). It displays a moved line as being removed from the reference log (line 3.0) and added at a different position in the modified log (line 4.0); and three "equivalent" lines as being removed from the reference log (lines 6.0, 7.0, and 8.0) and added in the modified log (lines 6.0, 8.0 and 9.0).
  • Figure 2: Desired output of a diff algorithm for the build logs of \ref{['fig:default-diff-shortcomings']}. Added (resp. deleted) lines are displayed in green (resp. red) while updated (resp. moved) lines are displayed in orange (resp. purple). Note that only the variable part is highlighted in orange in the case of updated lines.
  • Figure 3: Example of a log statement
  • Figure 4: Pair of logs, on the left side the reference passing log and right side the modified failing log.
  • Figure 5: Steps of the seed-and-extend algorithm applied on the pair of logs in \ref{['fig:pair-of-log']}. Orange dashes denote every possible match between lines. Linked black squares depict the initial seeds and the blue parts correspond to the extension parts. The final seeds in black show seeds after overlap removal. The purple seeds are found during the selection of the additional seeds.
  • ...and 5 more figures