Table of Contents
Fetching ...

Software Code Quality Measurement: Implications from Metric Distributions

Siyuan Jin, Mianmian Zhang, Yekai Guo, Yuejiang He, Ziyuan Li, Bichao Chen, Bing Zhu, Yong Xia

TL;DR

The paper tackles inconsistent standards in code quality measurement by separating metrics into monotonic and non-monotonic groups and introducing a distribution-based scoring method. Monotonic metrics are modeled with an exponential distribution while non-monotonic metrics use an asymmetric Gaussian, producing scores in $[0,100]$; an overall score is a weighted sum with weights learned via a Gradient Boosting Classifier using GitHub stars as the adoption target. An empirical study on $36{,}460$ OSS repositories across four languages demonstrates that these distribution-based scores can explain OSS adoption, with Java metrics showing the strongest explanatory power. The work provides a practical framework for consistent, multidimensional code quality assessment and highlights areas for validation and data expansion in future work.

Abstract

Software code quality is a construct with three dimensions: maintainability, reliability, and functionality. Although many firms have incorporated code quality metrics in their operations, evaluating these metrics still lacks consistent standards. We categorized distinct metrics into two types: 1) monotonic metrics that consistently influence code quality; and 2) non-monotonic metrics that lack a consistent relationship with code quality. To consistently evaluate them, we proposed a distribution-based method to get metric scores. Our empirical analysis includes 36,460 high-quality open-source software (OSS) repositories and their raw metrics from SonarQube and CK. The evaluated scores demonstrate great explainability on software adoption. Our work contributes to the multi-dimensional construct of code quality and its metric measurements, which provides practical implications for consistent measurements on both monotonic and non-monotonic metrics.

Software Code Quality Measurement: Implications from Metric Distributions

TL;DR

The paper tackles inconsistent standards in code quality measurement by separating metrics into monotonic and non-monotonic groups and introducing a distribution-based scoring method. Monotonic metrics are modeled with an exponential distribution while non-monotonic metrics use an asymmetric Gaussian, producing scores in ; an overall score is a weighted sum with weights learned via a Gradient Boosting Classifier using GitHub stars as the adoption target. An empirical study on OSS repositories across four languages demonstrates that these distribution-based scores can explain OSS adoption, with Java metrics showing the strongest explanatory power. The work provides a practical framework for consistent, multidimensional code quality assessment and highlights areas for validation and data expansion in future work.

Abstract

Software code quality is a construct with three dimensions: maintainability, reliability, and functionality. Although many firms have incorporated code quality metrics in their operations, evaluating these metrics still lacks consistent standards. We categorized distinct metrics into two types: 1) monotonic metrics that consistently influence code quality; and 2) non-monotonic metrics that lack a consistent relationship with code quality. To consistently evaluate them, we proposed a distribution-based method to get metric scores. Our empirical analysis includes 36,460 high-quality open-source software (OSS) repositories and their raw metrics from SonarQube and CK. The evaluated scores demonstrate great explainability on software adoption. Our work contributes to the multi-dimensional construct of code quality and its metric measurements, which provides practical implications for consistent measurements on both monotonic and non-monotonic metrics.
Paper Structure (12 sections, 5 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 12 sections, 5 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: Multi-Dimensional Construct
  • Figure 2: Examples of Non-Monotonic Metric Distribution and Monotonic Metric Distribution
  • Figure 3: Workflow for Code Quality Scoring with GitHub Stars as the Target Variable
  • Figure 4: Overall Scores for Four Languages