Table of Contents
Fetching ...

Fault Localization from the Semantic Code Search Perspective

Yihao Qin, Shangwen Wang, Yan Lei, Zhuo Zhang, Bo Lin, Xin Peng, Liqian Chen, Xiaoguang Mao

TL;DR

This work proposes \(\mathtt{CosFL},'' an FL approach that decomposes the FL task into two steps: query generation, which describes the functionality of the problematic code in natural language, and fault retrieval, which uses CS to find program elements semantically related to the query, allowing for finishing the FL task from a CS perspective.

Abstract

The software development process is characterized by an iterative cycle of continuous functionality implementation and debugging, essential for the enhancement of software quality and adaptability to changing requirements. This process incorporates two isolatedly studied tasks: Code Search (CS), which retrieves reference code from a code corpus to aid in code implementation, and Fault Localization (FL), which identifies code entities responsible for bugs within the software project to boost software debugging. These two tasks exhibit similarities since they both address search problems. Notably, CS techniques have demonstrated greater effectiveness than FL ones, possibly because of the precise semantic details of the required code offered by natural language queries, which are not readily accessible to FL methods. Drawing inspiration from this, we hypothesize that a fault localizer could achieve greater proficiency if semantic information about the buggy methods were made available. Based on this idea, we propose CosFL, an FL approach that decomposes the FL task into two steps: query generation, which describes the functionality of the problematic code in natural language, and fault retrieval, which uses CS to find program elements semantically related to the query. Specifically, to depict the buggy functionalities and generate high-quality queries, CosFL extensively harnesses the code analysis, semantic comprehension, and decision-making capabilities of LLMs. Moreover, to enhance the accuracy of CS, CosFL captures varying levels of context information and employs a multi-granularity code search strategy, which facilitates a more precise identification of buggy methods from a holistic view. The evaluation on 835 real bugs from 23 Java projects shows that CosFL successfully localizes 324 bugs within Top-1, which significantly outperforms the state-of-the-art approaches by 26.6%-57.3%.

Fault Localization from the Semantic Code Search Perspective

TL;DR

This work proposes \(\mathtt{CosFL},'' an FL approach that decomposes the FL task into two steps: query generation, which describes the functionality of the problematic code in natural language, and fault retrieval, which uses CS to find program elements semantically related to the query, allowing for finishing the FL task from a CS perspective.

Abstract

The software development process is characterized by an iterative cycle of continuous functionality implementation and debugging, essential for the enhancement of software quality and adaptability to changing requirements. This process incorporates two isolatedly studied tasks: Code Search (CS), which retrieves reference code from a code corpus to aid in code implementation, and Fault Localization (FL), which identifies code entities responsible for bugs within the software project to boost software debugging. These two tasks exhibit similarities since they both address search problems. Notably, CS techniques have demonstrated greater effectiveness than FL ones, possibly because of the precise semantic details of the required code offered by natural language queries, which are not readily accessible to FL methods. Drawing inspiration from this, we hypothesize that a fault localizer could achieve greater proficiency if semantic information about the buggy methods were made available. Based on this idea, we propose CosFL, an FL approach that decomposes the FL task into two steps: query generation, which describes the functionality of the problematic code in natural language, and fault retrieval, which uses CS to find program elements semantically related to the query. Specifically, to depict the buggy functionalities and generate high-quality queries, CosFL extensively harnesses the code analysis, semantic comprehension, and decision-making capabilities of LLMs. Moreover, to enhance the accuracy of CS, CosFL captures varying levels of context information and employs a multi-granularity code search strategy, which facilitates a more precise identification of buggy methods from a holistic view. The evaluation on 835 real bugs from 23 Java projects shows that CosFL successfully localizes 324 bugs within Top-1, which significantly outperforms the state-of-the-art approaches by 26.6%-57.3%.

Paper Structure

This paper contains 22 sections, 3 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Illustration of $\mathtt{CosFL}$ and other FL pipelines. The rationale is generated by LLMs.
  • Figure 2: Concept diagram of localizing Closure-1 bug from the perspective of semantic code search.
  • Figure 3: Overview of $\mathtt{CosFL}$.
  • Figure 4: Overlap of Top5 results.
  • Figure 5: The impact of different hyper-parameters on $\mathtt{CosFL}$.
  • ...and 2 more figures