Table of Contents
Fetching ...

Enhancing Automated Program Repair via Faulty Token Localization and Quality-Aware Patch Refinement

Jiaolong Kong, Xiaofei Xie, Yiheng Xiong, Yuekun Wang, Jian Wang

TL;DR

This work addresses the inefficiencies of LLM-based automatic program repair (APR) that rely on coarse external feedback by introducing TokenRepair, a two-level refinement framework. It combines internal token-level reflection, using context-aware uncertainty to localize faulty tokens and token-guided Chain-of-Thought decoding for targeted refinement, with a quality-aware external feedback loop that filters candidates before further refinement. The approach also employs trace-quality measurement to prefer repair trajectories with decreasing uncertainty, stabilizing the iterative search. Empirical results on Defects4J 1.2 and HumanEval-Java across five LLMs demonstrate state-of-the-art repair performance, with substantial improvements and robust ablations validating the effectiveness of both token-level localization and external quality gating.

Abstract

Large language models (LLMs) have recently demonstrated strong potential for automated program repair (APR). However, existing LLM-based techniques primarily rely on coarse-grained external feedback (e.g.,test results) to guide iterative patch generation, while lacking fine-grained internal signals that reveal why a patch fails or which parts of the generated code are likely incorrect. This limitation often leads to inefficient refinement, error propagation, and suboptimal repair performance. In this work, we propose TokenRepair, a novel two-level refinement framework that enhances APR by integrating internal reflection for localizing potentially faulty tokens with external feedback for quality-aware patch refinement. Specifically, TokenRepair first performs internal reflection by analyzing context-aware token-level uncertainty fluctuations to identify suspicious or low-confidence tokens within a patch. It then applies Chain-of-Thought guided rewriting to refine only these localized tokens, enabling targeted and fine-grained correction. To further stabilize the iterative repair loop, TokenRepair incorporates a quality-aware external feedback mechanism that evaluates patch quality and filters out low-quality candidates before refinement. Experimental results show that TokenRepair achieves new state-of-the-art repair performance, correctly fixing 88 bugs on Defects4J 1.2 and 139 bugs on HumanEval-Java, demonstrating substantial improvements ranging from 8.2% to 34.9% across all models on Defects4J 1.2 and from 3.3% to 16.1% on HumanEval-Java.

Enhancing Automated Program Repair via Faulty Token Localization and Quality-Aware Patch Refinement

TL;DR

This work addresses the inefficiencies of LLM-based automatic program repair (APR) that rely on coarse external feedback by introducing TokenRepair, a two-level refinement framework. It combines internal token-level reflection, using context-aware uncertainty to localize faulty tokens and token-guided Chain-of-Thought decoding for targeted refinement, with a quality-aware external feedback loop that filters candidates before further refinement. The approach also employs trace-quality measurement to prefer repair trajectories with decreasing uncertainty, stabilizing the iterative search. Empirical results on Defects4J 1.2 and HumanEval-Java across five LLMs demonstrate state-of-the-art repair performance, with substantial improvements and robust ablations validating the effectiveness of both token-level localization and external quality gating.

Abstract

Large language models (LLMs) have recently demonstrated strong potential for automated program repair (APR). However, existing LLM-based techniques primarily rely on coarse-grained external feedback (e.g.,test results) to guide iterative patch generation, while lacking fine-grained internal signals that reveal why a patch fails or which parts of the generated code are likely incorrect. This limitation often leads to inefficient refinement, error propagation, and suboptimal repair performance. In this work, we propose TokenRepair, a novel two-level refinement framework that enhances APR by integrating internal reflection for localizing potentially faulty tokens with external feedback for quality-aware patch refinement. Specifically, TokenRepair first performs internal reflection by analyzing context-aware token-level uncertainty fluctuations to identify suspicious or low-confidence tokens within a patch. It then applies Chain-of-Thought guided rewriting to refine only these localized tokens, enabling targeted and fine-grained correction. To further stabilize the iterative repair loop, TokenRepair incorporates a quality-aware external feedback mechanism that evaluates patch quality and filters out low-quality candidates before refinement. Experimental results show that TokenRepair achieves new state-of-the-art repair performance, correctly fixing 88 bugs on Defects4J 1.2 and 139 bugs on HumanEval-Java, demonstrating substantial improvements ranging from 8.2% to 34.9% across all models on Defects4J 1.2 and from 3.3% to 16.1% on HumanEval-Java.

Paper Structure

This paper contains 27 sections, 8 equations, 5 figures, 6 tables, 1 algorithm.

Figures (5)

  • Figure 1: Motivating example of human problem solving
  • Figure 2: The motivation example
  • Figure 3: Overview of TokenRepair
  • Figure 4: Bug fix Venn diagram in two benchmarks
  • Figure 5: An illustrative example of a bug uniquely fixed by TokenRepair in Defects4J 1.2