Table of Contents
Fetching ...

Semantic Code Repair using Neuro-Symbolic Transformation Networks

Jacob Devlin, Jonathan Uesato, Rishabh Singh, Pushmeet Kohli

TL;DR

This work tackles semantic code repair without unit tests by introducing the Share, Specialize, and Compete (SSC) architecture, which first encodes code via a shared AST-based representation and then scores repair candidates with specialized, per-type modules before competing their scores. Trained on a large corpus of synthetically bugged Python snippets and evaluated on real-bug commits, SSC substantially outperforms a baseline attentional seq2seq model, achieving up to 87% exact repair accuracy in single-repair settings on synthetic data and 41% on real bugs. The approach demonstrates strong generalization, leverage of AST structure, and insights from human evaluations, with future work aimed at expanding bug types and applying SSC to broader transformation tasks.

Abstract

We study the problem of semantic code repair, which can be broadly defined as automatically fixing non-syntactic bugs in source code. The majority of past work in semantic code repair assumed access to unit tests against which candidate repairs could be validated. In contrast, the goal here is to develop a strong statistical model to accurately predict both bug locations and exact fixes without access to information about the intended correct behavior of the program. Achieving such a goal requires a robust contextual repair model, which we train on a large corpus of real-world source code that has been augmented with synthetically injected bugs. Our framework adopts a two-stage approach where first a large set of repair candidates are generated by rule-based processors, and then these candidates are scored by a statistical model using a novel neural network architecture which we refer to as Share, Specialize, and Compete. Specifically, the architecture (1) generates a shared encoding of the source code using an RNN over the abstract syntax tree, (2) scores each candidate repair using specialized network modules, and (3) then normalizes these scores together so they can compete against one another in comparable probability space. We evaluate our model on a real-world test set gathered from GitHub containing four common categories of bugs. Our model is able to predict the exact correct repair 41\% of the time with a single guess, compared to 13\% accuracy for an attentional sequence-to-sequence model.

Semantic Code Repair using Neuro-Symbolic Transformation Networks

TL;DR

This work tackles semantic code repair without unit tests by introducing the Share, Specialize, and Compete (SSC) architecture, which first encodes code via a shared AST-based representation and then scores repair candidates with specialized, per-type modules before competing their scores. Trained on a large corpus of synthetically bugged Python snippets and evaluated on real-bug commits, SSC substantially outperforms a baseline attentional seq2seq model, achieving up to 87% exact repair accuracy in single-repair settings on synthetic data and 41% on real bugs. The approach demonstrates strong generalization, leverage of AST structure, and insights from human evaluations, with future work aimed at expanding bug types and applying SSC to broader transformation tasks.

Abstract

We study the problem of semantic code repair, which can be broadly defined as automatically fixing non-syntactic bugs in source code. The majority of past work in semantic code repair assumed access to unit tests against which candidate repairs could be validated. In contrast, the goal here is to develop a strong statistical model to accurately predict both bug locations and exact fixes without access to information about the intended correct behavior of the program. Achieving such a goal requires a robust contextual repair model, which we train on a large corpus of real-world source code that has been augmented with synthetically injected bugs. Our framework adopts a two-stage approach where first a large set of repair candidates are generated by rule-based processors, and then these candidates are scored by a statistical model using a novel neural network architecture which we refer to as Share, Specialize, and Compete. Specifically, the architecture (1) generates a shared encoding of the source code using an RNN over the abstract syntax tree, (2) scores each candidate repair using specialized network modules, and (3) then normalizes these scores together so they can compete against one another in comparable probability space. We evaluate our model on a real-world test set gathered from GitHub containing four common categories of bugs. Our model is able to predict the exact correct repair 41\% of the time with a single guess, compared to 13\% accuracy for an attentional sequence-to-sequence model.

Paper Structure

This paper contains 18 sections, 4 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Model Visualization: A visualization of the Share, Specialize, and Compete architecture for neural program repair.
  • Figure 2: Pooled Pointer Module: The application of a pooled pointer module at a single time step, to predict the variable replacement scores for each potential replacement of the token fname. The input here is the per-token representation computed by the Share module. Representations for variable names are passed through a pooling module which outputs per-variable pooled representations. These representations are then passed through a similarity module, as in standard pointer networks, to yield a (dynamically-sized) output dictionary containing one score for each unique variable.
  • Figure 3: Results binned by number of repair candidates in the snippet