Table of Contents
Fetching ...

Extracting Fix Ingredients using Language Models

Julian Aron Prenner, Romain Robbes

TL;DR

This work investigates the role of identifier ingredients in neural program repair and introduces ScanFix, a two-model system combining a dedicated identifier-scanner with a repair model to leverage wider-context information. Empirical analyses on Defects4J and TSSB-3M show identifier ingredients are common and often out-of-context, and that their presence correlates with repair success. The scanner achieves modest extraction performance (F1 ≈ 0.27) and, when integrated with the repair model, provides measurable gains, particularly for far-away ingredients, though a very large input window can yield larger improvements. The results highlight the potential of ingredient scanning as a subtask in NPR while underscoring Sutton’s bitter lesson that expanding context windows can sometimes outperform targeted extraction approaches.

Abstract

Deep learning and language models are increasingly dominating automated program repair research. While previous generate-and-validate approaches were able to find and use fix ingredients on a file or even project level, neural language models are limited to the code that fits their input window. In this work we investigate how important identifier ingredients are in neural program repair and present ScanFix, an approach that leverages an additional scanner model to extract identifiers from a bug's file and potentially project-level context. We find that lack of knowledge of far-away identifiers is an important cause of failed repairs. Augmenting repair model input with scanner-extracted identifiers yields relative improvements of up to 31%. However, ScanFix is outperformed by a model with a large input window (> 5k tokens). When passing ingredients from the ground-truth fix, improvements are even higher. This shows that, with refined extraction techniques, ingredient scanning, similar to fix candidate ranking, could have the potential to become an important subtask of future automated repair systems. At the same time, it also demonstrates that this idea is subject to Sutton's bitter lesson and may be rendered unnecessary by new code models with ever-increasing context windows.

Extracting Fix Ingredients using Language Models

TL;DR

This work investigates the role of identifier ingredients in neural program repair and introduces ScanFix, a two-model system combining a dedicated identifier-scanner with a repair model to leverage wider-context information. Empirical analyses on Defects4J and TSSB-3M show identifier ingredients are common and often out-of-context, and that their presence correlates with repair success. The scanner achieves modest extraction performance (F1 ≈ 0.27) and, when integrated with the repair model, provides measurable gains, particularly for far-away ingredients, though a very large input window can yield larger improvements. The results highlight the potential of ingredient scanning as a subtask in NPR while underscoring Sutton’s bitter lesson that expanding context windows can sometimes outperform targeted extraction approaches.

Abstract

Deep learning and language models are increasingly dominating automated program repair research. While previous generate-and-validate approaches were able to find and use fix ingredients on a file or even project level, neural language models are limited to the code that fits their input window. In this work we investigate how important identifier ingredients are in neural program repair and present ScanFix, an approach that leverages an additional scanner model to extract identifiers from a bug's file and potentially project-level context. We find that lack of knowledge of far-away identifiers is an important cause of failed repairs. Augmenting repair model input with scanner-extracted identifiers yields relative improvements of up to 31%. However, ScanFix is outperformed by a model with a large input window (> 5k tokens). When passing ingredients from the ground-truth fix, improvements are even higher. This shows that, with refined extraction techniques, ingredient scanning, similar to fix candidate ranking, could have the potential to become an important subtask of future automated repair systems. At the same time, it also demonstrates that this idea is subject to Sutton's bitter lesson and may be rendered unnecessary by new code models with ever-increasing context windows.

Paper Structure

This paper contains 21 sections, 1 equation, 11 figures, 2 tables.

Figures (11)

  • Figure 1: ScanFix during inference: the scanner modelreceives the buggy code locationwith a local context (e.g. 10 lines before and after) as well as code snippets from the file under repair from which it extracts relevant identifier ingredients. These identifiers are passed on to the actual repair model.
  • Figure 2: Bug Chart#10 from Defects4J with ground-truth fix.
  • Figure 3: Percentage of fix ingredients covered at the method/function, input window, file, and project level. For TSSB-3M we estimate the project-level on a sample of 500 bugs with out-of-file ingredients; for Defects4J, we use all relevant files belonging to the bug.
  • Figure 4: Example input for the repair model. The bug location is marked with special <BUGSTART> and <BUGEND> tokens. Note that here parts of the local contexts are omitted.
  • Figure 5: Repair success of APR tools as a function of fix ingredient count for single change bugs in Defects4J. The gray line is the mean over all tools with a 95% CI band. For most tools, repair success decreases when the number of required fix ingredients increases.
  • ...and 6 more figures