Table of Contents
Fetching ...

A Purpose-oriented Study on Open-source Software Commits and Their Impacts on Software Quality

Jincheng He, Zhongheng He

TL;DR

This study tackles how open-source commit types affect software quality by defining a purpose-oriented taxonomy and empirically analyzing their impact using tool-based metrics and compilability. It builds a labeled dataset (1914 commits) from SQUAAD, refines the maintenance category into three subtypes, and trains multiple classifiers (including BoW features with an ExtraTrees model) to automate commit-type tagging. The authors demonstrate statistically significant associations between commit types and quality metrics, and identify certain types that elevate compilability risk, offering practical guidelines for developers. The work advances open-source maintenance practices by enabling automated tagging, linking change types to quality outcomes, and providing datasets and baseline models for future research and SEO-friendly discovery.

Abstract

Developing software with the source code open to the public is prevalent; however, similar to its closed counter part, open-source has quality problems, which cause functional failures, such as program breakdowns, and non-functional, such as long response times. Previous researchers have revealed when, where, how and what developers contribute to projects and how these aspects impact software quality. However, there has been little work on how different categories of commits impact software quality. To improve open-source software, we conducted this preliminary study to categorize commits, train prediction models to automate the classification, and investigate how commit quality is impacted by commits of different purposes. By identifying these impacts, we will establish a new set of guidelines for committing changes that will improve the quality.

A Purpose-oriented Study on Open-source Software Commits and Their Impacts on Software Quality

TL;DR

This study tackles how open-source commit types affect software quality by defining a purpose-oriented taxonomy and empirically analyzing their impact using tool-based metrics and compilability. It builds a labeled dataset (1914 commits) from SQUAAD, refines the maintenance category into three subtypes, and trains multiple classifiers (including BoW features with an ExtraTrees model) to automate commit-type tagging. The authors demonstrate statistically significant associations between commit types and quality metrics, and identify certain types that elevate compilability risk, offering practical guidelines for developers. The work advances open-source maintenance practices by enabling automated tagging, linking change types to quality outcomes, and providing datasets and baseline models for future research and SEO-friendly discovery.

Abstract

Developing software with the source code open to the public is prevalent; however, similar to its closed counter part, open-source has quality problems, which cause functional failures, such as program breakdowns, and non-functional, such as long response times. Previous researchers have revealed when, where, how and what developers contribute to projects and how these aspects impact software quality. However, there has been little work on how different categories of commits impact software quality. To improve open-source software, we conducted this preliminary study to categorize commits, train prediction models to automate the classification, and investigate how commit quality is impacted by commits of different purposes. By identifying these impacts, we will establish a new set of guidelines for committing changes that will improve the quality.

Paper Structure

This paper contains 30 sections, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Commit Type Distribution for Neutrals and Breakers
  • Figure 2: Keywords for Commit Types