Towards Understanding the Impact of Code Modifications on Software Quality Metrics
Thomas Karanikiotis, Andreas L. Symeonidis
TL;DR
This study addresses how specific code modifications influence software quality metrics by constructing a commit-level dataset from 700 popular GitHub repositories and computing static quality metrics before and after each modification. It clusters modifications based on the induced changes in metrics using K-means and employs GPT-4 to generate concise, human-readable summaries of each modification, enabling interpretable patterns of quality impact. The results reveal distinct, cross-repository clusters describing how different changes affect metrics such as complexity, size, and documentation, offering practical insights for code reviews and maintenance. Overall, the approach advances understanding of the relationship between code changes and software quality and paves the way for data-guided quality prediction and maintenance strategies.
Abstract
Context: In the realm of software development, maintaining high software quality is a persistent challenge. However, this challenge is often impeded by the lack of comprehensive understanding of how specific code modifications influence quality metrics. Objective: This study ventures to bridge this gap through an approach that aspires to assess and interpret the impact of code modifications. The underlying hypothesis posits that code modifications inducing similar changes in software quality metrics can be grouped into distinct clusters, which can be effectively described using an AI language model, thus providing a simple understanding of code changes and their quality implications. Method: To validate this hypothesis, we built and analyzed a dataset from popular GitHub repositories, segmented into individual code modifications. Each project was evaluated against software quality metrics pre and post-application. Machine learning techniques were utilized to cluster these modifications based on the induced changes in the metrics. Simultaneously, an AI language model was employed to generate descriptions of each modification's function. Results: The results reveal distinct clusters of code modifications, each accompanied by a concise description, revealing their collective impact on software quality metrics. Conclusions: The findings suggest that this research is a significant step towards a comprehensive understanding of the complex relationship between code changes and software quality, which has the potential to transform software maintenance strategies and enable the development of more accurate quality prediction models.
