Source Code Hotspots: A Diagnostic Method for Quality Issues

Saleha Muzammil; Mughees Ur Rehman; Zoe Kotti; Diomidis Spinellis

Source Code Hotspots: A Diagnostic Method for Quality Issues

Saleha Muzammil, Mughees Ur Rehman, Zoe Kotti, Diomidis Spinellis

TL;DR

This paper introduces source code hotspots as line-level loci of frequent changes to diagnose maintainability issues in evolving software. It presents a line-level hotspot mining method applied to 91 open-source repositories, yielding 15 hotspot types organized into four categories and revealing that bots account for roughly 74% of hotspot edits. The work provides a practical taxonomy linked to concrete refactoring and CI-based mitigations, highlighting that over half of hotspots occur in administrative files and that many changes are mechanical noise. The findings offer actionable guidance for reducing avoidable churn, improving configurability, stability, and changeability, and establishing a dataset and tooling to advance future research in software maintenance.

Abstract

Software source code often harbours "hotspots": small portions of the code that change far more often than the rest of the project and thus concentrate maintenance activity. We mine the complete version histories of 91 evolving, actively developed GitHub repositories and identify 15 recurring line-level hotspot patterns that explain why these hotspots emerge. The three most prevalent patterns are Pinned Version Bump (26%), revealing brittle release practices; Long Line Change (17%), signalling deficient layout; and Formatting Ping-Pong (9%), indicating missing or inconsistent style automation. Surprisingly, automated accounts generate 74% of all hotspot edits, suggesting that bot activity is a dominant but largely avoidable source of noise in change histories. By mapping each pattern to concrete refactoring guidelines and continuous integration checks, our taxonomy equips practitioners with actionable steps to curb hotspots and systematically improve software quality in terms of configurability, stability, and changeability.

Source Code Hotspots: A Diagnostic Method for Quality Issues

TL;DR

Abstract

Paper Structure (34 sections, 1 equation, 5 figures, 3 tables)

This paper contains 34 sections, 1 equation, 5 figures, 3 tables.

Introduction
Related Work
Code Churn and Fault Prediction
Software Smells, Anti-Patterns, and Change Taxonomies
Techniques and Tools for Detecting Code Churn
Automation and Bots
Methodology
Repository Selection and Mining
File-Level Analysis
Line-Level Analysis
Line-Modification Counting Method
Line-identity model and offset adjustment.
Handling moved lines.
Taxonomy Development
Manual Labeling Process
...and 19 more sections

Figures (5)

Figure 1: Lifetime (top left), commits (top right), size (bottom left), and contributors (bottom right) of analyzed repositories. We use a log$_{10}$ scale for commit count, lines of code, and contributor count to visualize heavy-tailed distributions.
Figure 2: Language Distribution of Projects
Figure 3: Chao1 Estimator vs. Number of Label Assignments
Figure 4: Distribution of Occurrences per Hotspot Type
Figure 5: Bot vs. Human Commit Ratio

Source Code Hotspots: A Diagnostic Method for Quality Issues

TL;DR

Abstract

Source Code Hotspots: A Diagnostic Method for Quality Issues

Authors

TL;DR

Abstract

Table of Contents

Figures (5)