Explaining Software Bugs Leveraging Code Structures in Neural Machine Translation

Parvez Mahbub; Ohiduzzaman Shuvo; Mohammad Masudur Rahman

Explaining Software Bugs Leveraging Code Structures in Neural Machine Translation

Parvez Mahbub, Ohiduzzaman Shuvo, Mohammad Masudur Rahman

TL;DR

Bugsplainer addresses the need to explain software bugs by learning from a large corpus of bug-fix commits and leveraging code structure through a diff-aware AST traversal. It introduces discriminatory pre-training and a two-stage training regime on approximately 150K bug-fix commits, using a transformer-based architecture to generate natural-language bug explanations that reflect buggy and bug-free structures. Empirical results show Bugsplainer outperforms baselines on BLEU, semantic similarity, and exact-match metrics, with a developer study confirming higher accuracy, precision, conciseness, and usefulness. The work provides a replication package and a benchmark to advance explainable bug localization and remediation in software engineering.

Abstract

Software bugs claim approximately 50% of development time and cost the global economy billions of dollars. Once a bug is reported, the assigned developer attempts to identify and understand the source code responsible for the bug and then corrects the code. Over the last five decades, there has been significant research on automatically finding or correcting software bugs. However, there has been little research on automatically explaining the bugs to the developers, which is essential but a highly challenging task. In this paper, we propose Bugsplainer, a transformer-based generative model, that generates natural language explanations for software bugs by learning from a large corpus of bug-fix commits. Bugsplainer can leverage structural information and buggy patterns from the source code to generate an explanation for a bug. Our evaluation using three performance metrics shows that Bugsplainer can generate understandable and good explanations according to Google's standard, and can outperform multiple baselines from the literature. We also conduct a developer study involving 20 participants where the explanations from Bugsplainer were found to be more accurate, more precise, more concise and more useful than the baselines.

Explaining Software Bugs Leveraging Code Structures in Neural Machine Translation

TL;DR

Abstract

Paper Structure (29 sections, 2 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 29 sections, 2 equations, 6 figures, 7 tables, 1 algorithm.

Introduction
Motivating Example
Background
Neural Machine Translation
Structure-Based Traversal
Bugsplainer
Extract Buggy and Bug-free AST Nodes from Commit
Generate diffSBT Sequence
Train Bugsplainer
Discriminatory Pre-training
Fine-tuning
Generate Explanation
Experiment
Dataset Construction
Repository Selection
...and 14 more sections

Figures (6)

Figure 1: An example of buggy source code
Figure 2: Generated explanations for buggy code
Figure 3: Structure-based traversal (SBT) -- (a) an example tree, and (b) corresponding SBT sequence
Figure 4: Schematic diagram of Bugsplainer
Figure 5: An example of diffSBT sequence generation from buggy code and commit diff
...and 1 more figures

Explaining Software Bugs Leveraging Code Structures in Neural Machine Translation

TL;DR

Abstract

Explaining Software Bugs Leveraging Code Structures in Neural Machine Translation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)