Table of Contents
Fetching ...

Gradient-Based Program Repair: Fixing Bugs in Continuous Program Spaces

André Silva, Gustav Thorén, Martin Monperrus

TL;DR

This work reframes automatic program repair as continuous optimization by compiling symbolic programs into differentiable numerical representations and optimizing them with gradient descent to satisfy a correctness loss. It introduces RaspBugs, a large benchmark of buggy RASP programs with paired numerical representations, to evaluate repairs in the numerical program space. Empirical results show GBPR repairs a majority of bugs across multiple base programs, often achieving near-perfect correctness, and it outperforms symbolic baselines on complex, multi-location bugs. The study demonstrates a new paradigm bridging continuous optimization and program behavior, with implications for scalable repair and verification in differentiable systems.

Abstract

Automatic program repair seeks to generate correct code from buggy programs, with most approaches searching the correct program in a discrete, symbolic space of source code tokens. This symbolic search is fundamentally limited by its inability to directly reason about program behavior. We introduce Gradient-Based Program Repair (GBPR), a new paradigm that reframes program repair as continuous optimization in a differentiable numerical program space. Our core insight is to compile symbolic programs into differentiable numerical representations, enabling search in the numerical program space directly guided by program behavior. To evaluate GBPR, we present RaspBugs, a new benchmark of 1,466 buggy symbolic RASP programs and their respective numerical representations. Our experiments demonstrate that GBPR can effectively repair buggy symbolic programs by gradient-based optimization in the numerical program space, with convincing repair trajectories. To our knowledge, we are the first to state program repair as continuous optimization in a numerical program space. Our work establishes a new direction for program repair research, bridging two rich worlds: continuous optimization and program behavior.

Gradient-Based Program Repair: Fixing Bugs in Continuous Program Spaces

TL;DR

This work reframes automatic program repair as continuous optimization by compiling symbolic programs into differentiable numerical representations and optimizing them with gradient descent to satisfy a correctness loss. It introduces RaspBugs, a large benchmark of buggy RASP programs with paired numerical representations, to evaluate repairs in the numerical program space. Empirical results show GBPR repairs a majority of bugs across multiple base programs, often achieving near-perfect correctness, and it outperforms symbolic baselines on complex, multi-location bugs. The study demonstrates a new paradigm bridging continuous optimization and program behavior, with implications for scalable repair and verification in differentiable systems.

Abstract

Automatic program repair seeks to generate correct code from buggy programs, with most approaches searching the correct program in a discrete, symbolic space of source code tokens. This symbolic search is fundamentally limited by its inability to directly reason about program behavior. We introduce Gradient-Based Program Repair (GBPR), a new paradigm that reframes program repair as continuous optimization in a differentiable numerical program space. Our core insight is to compile symbolic programs into differentiable numerical representations, enabling search in the numerical program space directly guided by program behavior. To evaluate GBPR, we present RaspBugs, a new benchmark of 1,466 buggy symbolic RASP programs and their respective numerical representations. Our experiments demonstrate that GBPR can effectively repair buggy symbolic programs by gradient-based optimization in the numerical program space, with convincing repair trajectories. To our knowledge, we are the first to state program repair as continuous optimization in a numerical program space. Our work establishes a new direction for program repair research, bridging two rich worlds: continuous optimization and program behavior.

Paper Structure

This paper contains 18 sections, 6 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: The key insight of Gradient-Based Program Repair is that program search can be done in a numerical space by employing gradient-based optimization. a) Symbolic program computing the reverse function, written in RASP, and the difference between the expected and buggy behavior; b) Compilation of the symbolic program into a numerical program, encoded as a Transformer; c) Numerical program, equivalent to the symbolic program; d) GBPR optimizes the numerical program via the correctness loss, starting from the buggy program. The program is iteratively optimized, moving towards correct behavior. As the correctness loss decreases, the program correctness increases, with some incorrect behavior now corrected. At the end of the optimization, the repaired program correctly implements the reverse function. As opposed to LLM-based bug fixing, GBPR directly reasons about the expected behavior as a first-class optimizable concept.
  • Figure 2: Example of a buggy RASP program in RaspBugs, synthesized from the reference hist program using mutation. The reference program selects only equal tokens, while the mutated program selects tokens greater than or equal to, resulting in buggy program behavior.
  • Figure 3: Accuracy distribution before (red) and after (green) Gradient-Based Program Repair for each program in RaspBugs. The majority of buggy variants for five programs can be repaired with GBPR (as demonstrated by the rightmost green bars).
  • Figure 4: Repair success rates over evaluation steps, stratified by mutation order. In GBPR, an epoch is a full forward+backward pass over the I/O samples of the training dataset; in symbolic baselines, an epoch is the symbolic evaluation of one mutated program against the same I/O samples. The idea is that all three approaches are exposed to the same amount of information per epoch, in order to drive the search. Each panel shows accuracy trajectories for GBPR, GP, and BFS (higher better). For simple bugs (orders 1-2), symbolic methods achieve higher final performance since the number of potential repairs to explore is small. For complex bugs (orders 4-5), GBPR surpasses symbolic baselines, showing how gradient-based optimization can traverse complex program spaces more efficiently than symbolic search methods.
  • Figure 5: Repair trajectory for a buggy sort program, in the numerical program space. The red cross marks the initial buggy program, and the trajectory shows the path taken by gradient descent towards a repaired program. Gradient-Based Program Repair iteratively updates the numerical representation of the program using the gradient defined by the correctness loss landscape, until the program behavior is repaired ($L \approx 0$). Left: Surface plot of the correctness loss landscape along the two principal components of the numerical program space. Right: Contour plot of the same landscape, the input-output behavior changing along the trajectory.
  • ...and 2 more figures