Investigating the Impact of Code Comment Inconsistency on Bug Introducing

Shiva Radmanesh; Aaron Imani; Iftekhar Ahmed; Mohammad Moshirpour

Investigating the Impact of Code Comment Inconsistency on Bug Introducing

Shiva Radmanesh, Aaron Imani, Iftekhar Ahmed, Mohammad Moshirpour

TL;DR

This study addresses how code-comment inconsistency contributes to bug introduction. It evaluates GPT-3.5 against baselines for detecting code-comment mismatches on the CUP2 dataset, finding that fine-tuned GPT-3.5 achieves superior precision, recall, and F1. It then conducts a large-scale, time-windowed analysis on Apache Java projects using GitCProc and the SZZ algorithm to link inconsistent changes to bug-prone commits, revealing that inconsistencies are about $1.5$× more likely to cause bugs within a $7$-day window and the effect diminishes over longer times ($ ext{OR}\approx ext{1.14}$ in a $14$-day window). These results highlight the importance of keeping comments aligned with code and demonstrate the potential of LLM-based tooling to improve software quality and maintenance efficiency. The work also provides practical guidance on timely comment updates and motivates future cross-language and longitudinal investigations.

Abstract

Code comments are essential for clarifying code functionality, improving readability, and facilitating collaboration among developers. Despite their importance, comments often become outdated, leading to inconsistencies with the corresponding code. This can mislead developers and potentially introduce bugs. Our research investigates the impact of code-comment inconsistency on bug introduction using large language models, specifically GPT-3.5. We first compare the performance of the GPT-3.5 model with other state-of-the-art methods in detecting these inconsistencies, demonstrating the superiority of GPT-3.5 in this domain. Additionally, we analyze the temporal evolution of code-comment inconsistencies and their effect on bug proneness over various timeframes using GPT-3.5 and Odds ratio analysis. Our findings reveal that inconsistent changes are around 1.5 times more likely to lead to a bug-introducing commit than consistent changes, highlighting the necessity of maintaining consistent and up-to-date comments in software development. This study provides new insights into the relationship between code-comment inconsistency and software quality, offering a comprehensive analysis of its impact over time, demonstrating that the impact of code-comment inconsistency on bug introduction is highest immediately after the inconsistency is introduced and diminishes over time.

Investigating the Impact of Code Comment Inconsistency on Bug Introducing

TL;DR

× more likely to cause bugs within a

-day window and the effect diminishes over longer times (

in a

-day window). These results highlight the importance of keeping comments aligned with code and demonstrate the potential of LLM-based tooling to improve software quality and maintenance efficiency. The work also provides practical guidance on timely comment updates and motivates future cross-language and longitudinal investigations.

Abstract

Paper Structure (29 sections, 2 figures, 8 tables)

This paper contains 29 sections, 2 figures, 8 tables.

Introduction
Related Work
Code Comments Analysis
Code Comment Quality Evaluation Tools
Code Comment Update and Generation Tools
Impact of Code-Comment Inconsistency
Methodology
Evaluating GPT-3.5 against other models
Prompting the LLM
Zero-Shot
Few-Shot
Finetuning
Impact of Code-Comment Inconsistency
Dataset
Project Selection
...and 14 more sections

Figures (2)

Figure 1: Training and Validation Loss Over Steps
Figure 2: Our pipeline for collecting data for RQ2 and RQ3

Investigating the Impact of Code Comment Inconsistency on Bug Introducing

TL;DR

Abstract

Investigating the Impact of Code Comment Inconsistency on Bug Introducing

Authors

TL;DR

Abstract

Table of Contents

Figures (2)