Table of Contents
Fetching ...

A Study on the Impact of Fault localization Granularity for Repository-Scale Code Repair Tasks

Joseph Townsend, Chandresh Pravin, Kwun Ho Ngan, Matthieu Parizy

Abstract

Automatic program repair can be a challenging task, especially when resolving complex issues at a repository-level, which often involves issue reproduction, fault localization, code repair, testing and validation. Issues of this scale can be commonly found in popular GitHub repositories or datasets that are derived from them. Some repository-level approaches separate localization and repair into distinct phases. Where this is the case, the fault localization approaches vary in terms of the granularity of localization. Where the impact of granularity is explored to some degree for smaller datasets, not all isolate this issue from the separate question of localization accuracy by testing code repair under the assumption of perfect fault localization. To the best of the authors' knowledge, no repository-scale studies have explicitly investigated granularity under this assumption, nor conducted a systematic empirical comparison of granularity levels in isolation. We propose a framework for performing such tests by modifying the localization phase of the Agentless framework to retrieve ground-truth localization data and include this as context in the prompt fed to the repair phase. We show that under this configuration and as a generalization over the SWE-Bench-Mini dataset, function-level granularity yields the highest repair rate against line-level and file-level. However, a deeper dive suggests that the ideal granularity may in fact be task dependent. This study is not intended to improve on the state-of-the-art, nor do we intend for results to be compared against any complete agentic frameworks. Rather, we present a proof of concept for investigating how fault localization may impact automatic code repair in repository-scale scenarios. We present preliminary findings to this end and encourage further research into this relationship between the two phases.

A Study on the Impact of Fault localization Granularity for Repository-Scale Code Repair Tasks

Abstract

Automatic program repair can be a challenging task, especially when resolving complex issues at a repository-level, which often involves issue reproduction, fault localization, code repair, testing and validation. Issues of this scale can be commonly found in popular GitHub repositories or datasets that are derived from them. Some repository-level approaches separate localization and repair into distinct phases. Where this is the case, the fault localization approaches vary in terms of the granularity of localization. Where the impact of granularity is explored to some degree for smaller datasets, not all isolate this issue from the separate question of localization accuracy by testing code repair under the assumption of perfect fault localization. To the best of the authors' knowledge, no repository-scale studies have explicitly investigated granularity under this assumption, nor conducted a systematic empirical comparison of granularity levels in isolation. We propose a framework for performing such tests by modifying the localization phase of the Agentless framework to retrieve ground-truth localization data and include this as context in the prompt fed to the repair phase. We show that under this configuration and as a generalization over the SWE-Bench-Mini dataset, function-level granularity yields the highest repair rate against line-level and file-level. However, a deeper dive suggests that the ideal granularity may in fact be task dependent. This study is not intended to improve on the state-of-the-art, nor do we intend for results to be compared against any complete agentic frameworks. Rather, we present a proof of concept for investigating how fault localization may impact automatic code repair in repository-scale scenarios. We present preliminary findings to this end and encourage further research into this relationship between the two phases.

Paper Structure

This paper contains 19 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: A modified version of the Agentless framework xia2025demystifying was designed to conduct the experiments. The file localization and patch validation phase from the original work were bypassed such that a replacement script derives perfect localization data from the ground truth patches to produce the raw generated patches for SWEBench evaluation in order to concentrate the study on the effect of localization granularity.
  • Figure 2: Fault localization at various levels of granularity, demonstrated in a hypothetical repository containing two sorting methods. The actual fault is in the mergeSort method of merge_sort.py, where n should be replaced with m. At line-level granularity, only these two code lines are provided as context for the repair phase, indicated by the yellow highlight. At function-level, the entire function is provided. And at file-level, the entire merge_sort.py is provided.
  • Figure 3: Box plot of resolve rate achieved for different granularities of perfect fault localization as reported in table \ref{['tab:results']}
  • Figure 4: Mean resolution rate by problem difficulty tier and localization granularity. Difficulty is inferred from human annotator resolution time as provided in the SWE-Bench Verified metadata. Results are averaged over 10 trials per granularity.