Table of Contents
Fetching ...

An Exploratory Study of Bug-Introducing Changes: Exploring Relationships in Bug-Introducing Changes Towards Causal Understanding

Lukas Schulte, Anamaria Mojica-Hanke, Mario Linares-Vásquez, Steffen Herbold

TL;DR

This exploratory study assembles a rich, manually coded dataset of 71 bug-introducing commits and 71 non-bug-introducing commits from two open-source projects to map broad software engineering practices. By modeling 81 variables across 19 logical groups and conducting extensive pairwise analyses (numeric, nominal, and mixed) with groups that account for data sparsity, the work reveals numerous interdependencies and potential confounders linked to bug introduction. Although strictly correlational at this stage, the findings provide a concrete framework (G2/G3 relations and 19 logical groups) to guide future causal modeling, causal discovery, and inference to quantify and control for intermediate effects. The study also documents deviations from preregistration and discusses limitations in data availability and generalizability while offering concrete directions for applying DAG-based causal methods in subsequent research. Overall, the work establishes a foundation for moving beyond correlations toward causal understanding of how development practices influence bug introduction in software engineering.

Abstract

Context: Many studies consider the relation between individual aspects of the software engineering process and bug-introduction, e.g., software testing and code review. These studies typically only identify correlations between their set of variables without accounting for interactions with external variables, such as confounding factors. Objective: Within this study, we provide a broad empirical view on practices of software development and their relation to bug-introducing changes \rev{to enable} future work on causal relations between those aspects. Method: We consider the bugs, the type of change that introduced the bug, aspects of the build process, code review, software tests, and any other discussion related to the bug that we can identify. We use a qualitative approach that first describes variables of the development process and then groups the variables based on their relations. From these groups, we deduce how their (pairwise) interactions affect bug-introducing changes. Results: We found multiple relevant relations within the development process of bug-introducing changes. Logical groups of variables and their relations provide a framework for discovering areas of interest regarding intermediate effects in the process and confounders towards bug-introduction. Conclusion: Software engineering practices applied during the development of bug-introducing changes are interdependent. This work lays the foundation to understand why bugs are introduced using causal modeling, discovery, and inference.

An Exploratory Study of Bug-Introducing Changes: Exploring Relationships in Bug-Introducing Changes Towards Causal Understanding

TL;DR

This exploratory study assembles a rich, manually coded dataset of 71 bug-introducing commits and 71 non-bug-introducing commits from two open-source projects to map broad software engineering practices. By modeling 81 variables across 19 logical groups and conducting extensive pairwise analyses (numeric, nominal, and mixed) with groups that account for data sparsity, the work reveals numerous interdependencies and potential confounders linked to bug introduction. Although strictly correlational at this stage, the findings provide a concrete framework (G2/G3 relations and 19 logical groups) to guide future causal modeling, causal discovery, and inference to quantify and control for intermediate effects. The study also documents deviations from preregistration and discusses limitations in data availability and generalizability while offering concrete directions for applying DAG-based causal methods in subsequent research. Overall, the work establishes a foundation for moving beyond correlations toward causal understanding of how development practices influence bug introduction in software engineering.

Abstract

Context: Many studies consider the relation between individual aspects of the software engineering process and bug-introduction, e.g., software testing and code review. These studies typically only identify correlations between their set of variables without accounting for interactions with external variables, such as confounding factors. Objective: Within this study, we provide a broad empirical view on practices of software development and their relation to bug-introducing changes \rev{to enable} future work on causal relations between those aspects. Method: We consider the bugs, the type of change that introduced the bug, aspects of the build process, code review, software tests, and any other discussion related to the bug that we can identify. We use a qualitative approach that first describes variables of the development process and then groups the variables based on their relations. From these groups, we deduce how their (pairwise) interactions affect bug-introducing changes. Results: We found multiple relevant relations within the development process of bug-introducing changes. Logical groups of variables and their relations provide a framework for discovering areas of interest regarding intermediate effects in the process and confounders towards bug-introduction. Conclusion: Software engineering practices applied during the development of bug-introducing changes are interdependent. This work lays the foundation to understand why bugs are introduced using causal modeling, discovery, and inference.
Paper Structure (41 sections, 14 figures, 9 tables)

This paper contains 41 sections, 14 figures, 9 tables.

Figures (14)

  • Figure 1: Abstract overview of the visual groups. Bug group (BG) in red, control group (CG) in green.
  • Figure 2: G1: C19 - Design changes / R3 - # Reviewers. The left panel shows the bug group (BG, 38 valid pairs), the right panel shows the control group (CG, 39 valid pairs). Outliers are hidden.
  • Figure 3: G2: C21 - Change of external dependencies / C3 - # Files changed. The left panel shows the bug group (BG, 71 valid pairs), the right panel shows the control group (CG, 71 valid pairs). Outliers are hidden.
  • Figure 4: G2 (one versus rest): I2 - Introducing issue types / R3 - # Reviewers. The left panel shows the bug group (BG, 22 valid pairs), the right panel shows the control group (CG, 21 valid pairs). Outliers are hidden.
  • Figure 5: G3 (one versus rest): R4 - Reviewer types / I7 - # Introducing issue comments. The left panel shows the bug group (BG, 32 valid pairs), the right panel shows the control group (CG, 21 valid pairs). Outliers are hidden.
  • ...and 9 more figures