Code Ownership: The Principles, Differences, and Their Associations with Software Quality
Patanamon Thongtanunam, Chakkrit Tantithamthavorn
TL;DR
This paper investigates how two prevalent code ownership approximations—commit-based and line-based—differ in the sets of developers they identify, the ownership values assigned, and the expertise levels inferred, and how these metrics relate to defect-proneness. Using 25 releases across seven large open-source systems, the authors build cross-release defect-prediction models with six ownership metrics and assess their importance via permutation, NPSK ranking, and LIME explanations, achieving a median defect-prediction AUC of $0.80$. They find that commit-based ownership has a stronger association with software quality than line-based ownership, while line-based ownership is more suitable for accountability tasks. The work provides guidance on when to apply each approximation and demonstrates the value of explainable AI techniques for interpreting defect-proneness in relation to ownership metrics, with replication data available on GitHub.
Abstract
Code ownership -- an approximation of the degree of ownership of a software component -- is one of the important software measures used in quality improvement plans. However, prior studies proposed different variants of code ownership approximations. Yet, little is known about the difference in code ownership approximations and their association with software quality. In this paper, we investigate the differences in the commonly used ownership approximations (i.e., commit-based and line-based) in terms of the set of developers, the approximated code ownership values, and the expertise level. Then, we analyze the association of each code ownership approximation with the defect-proneness. Through an empirical study of 25 releases that span real-world open-source software systems, we find that commit-based and line-based ownership approximations produce different sets of developers, different code ownership values, and different sets of major developers. In addition, we find that the commit-based approximation has a stronger association with software quality than the line-based approximation. Based on our analysis, we recommend line-based code ownership be used for accountability purposes (e.g., authorship attribution, intellectual property), while commit-based code ownership should be used for rapid bug-fixing and charting quality improvement plans.
