Table of Contents
Fetching ...

Code Ownership: The Principles, Differences, and Their Associations with Software Quality

Patanamon Thongtanunam, Chakkrit Tantithamthavorn

TL;DR

This paper investigates how two prevalent code ownership approximations—commit-based and line-based—differ in the sets of developers they identify, the ownership values assigned, and the expertise levels inferred, and how these metrics relate to defect-proneness. Using 25 releases across seven large open-source systems, the authors build cross-release defect-prediction models with six ownership metrics and assess their importance via permutation, NPSK ranking, and LIME explanations, achieving a median defect-prediction AUC of $0.80$. They find that commit-based ownership has a stronger association with software quality than line-based ownership, while line-based ownership is more suitable for accountability tasks. The work provides guidance on when to apply each approximation and demonstrates the value of explainable AI techniques for interpreting defect-proneness in relation to ownership metrics, with replication data available on GitHub.

Abstract

Code ownership -- an approximation of the degree of ownership of a software component -- is one of the important software measures used in quality improvement plans. However, prior studies proposed different variants of code ownership approximations. Yet, little is known about the difference in code ownership approximations and their association with software quality. In this paper, we investigate the differences in the commonly used ownership approximations (i.e., commit-based and line-based) in terms of the set of developers, the approximated code ownership values, and the expertise level. Then, we analyze the association of each code ownership approximation with the defect-proneness. Through an empirical study of 25 releases that span real-world open-source software systems, we find that commit-based and line-based ownership approximations produce different sets of developers, different code ownership values, and different sets of major developers. In addition, we find that the commit-based approximation has a stronger association with software quality than the line-based approximation. Based on our analysis, we recommend line-based code ownership be used for accountability purposes (e.g., authorship attribution, intellectual property), while commit-based code ownership should be used for rapid bug-fixing and charting quality improvement plans.

Code Ownership: The Principles, Differences, and Their Associations with Software Quality

TL;DR

This paper investigates how two prevalent code ownership approximations—commit-based and line-based—differ in the sets of developers they identify, the ownership values assigned, and the expertise levels inferred, and how these metrics relate to defect-proneness. Using 25 releases across seven large open-source systems, the authors build cross-release defect-prediction models with six ownership metrics and assess their importance via permutation, NPSK ranking, and LIME explanations, achieving a median defect-prediction AUC of . They find that commit-based ownership has a stronger association with software quality than line-based ownership, while line-based ownership is more suitable for accountability tasks. The work provides guidance on when to apply each approximation and demonstrates the value of explainable AI techniques for interpreting defect-proneness in relation to ownership metrics, with replication data available on GitHub.

Abstract

Code ownership -- an approximation of the degree of ownership of a software component -- is one of the important software measures used in quality improvement plans. However, prior studies proposed different variants of code ownership approximations. Yet, little is known about the difference in code ownership approximations and their association with software quality. In this paper, we investigate the differences in the commonly used ownership approximations (i.e., commit-based and line-based) in terms of the set of developers, the approximated code ownership values, and the expertise level. Then, we analyze the association of each code ownership approximation with the defect-proneness. Through an empirical study of 25 releases that span real-world open-source software systems, we find that commit-based and line-based ownership approximations produce different sets of developers, different code ownership values, and different sets of major developers. In addition, we find that the commit-based approximation has a stronger association with software quality than the line-based approximation. Based on our analysis, we recommend line-based code ownership be used for accountability purposes (e.g., authorship attribution, intellectual property), while commit-based code ownership should be used for rapid bug-fixing and charting quality improvement plans.
Paper Structure (19 sections, 10 figures, 1 table)

This paper contains 19 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: Illustrative scenarios where the commit-based and line-based code ownership approaches will approximate different ownership values.
  • Figure 2: An overview diagram of our data preparation and analysis approaches for RQ1-RQ3.
  • Figure 3: A proportion of developers that are identified by both approaches (common), by only the commit-based approach (commit_only), and by only the line-based approach (line_only).
  • Figure 4: The Spearman's correlation coefficient between the ownership values of the commit-based and line-based approaches.
  • Figure 5: A distribution of an ownership value for the commit_only and line_only developers.
  • ...and 5 more figures