Neural Program Repair by Jointly Learning to Localize and Repair
Marko Vasic, Aditya Kanade, Petros Maniatis, David Bieber, Rishabh Singh
TL;DR
This work targets variable-misuse bugs by jointly learning to classify, localize, and repair using multi-headed pointer networks over program tokens. By predicting both the bug location and a repair variable in a single model, it overcomes limitations of prior enumerative methods that treat localization and repair separately. Across ETH-Py150 and MSR-VarMisuse datasets, the joint model achieves superior localization and repair accuracy while maintaining high classification and true-positive rates, and demonstrates robustness and practical relevance through industrial, real-world data. The results suggest that integrating token-level pointers with end-to-end optimization offers a promising direction for neural program repair, with potential extensions to include richer semantic information via graph-based approaches.
Abstract
Due to its potential to improve programmer productivity and software quality, automated program repair has been an active topic of research. Newer techniques harness neural networks to learn directly from examples of buggy programs and their fixes. In this work, we consider a recently identified class of bugs called variable-misuse bugs. The state-of-the-art solution for variable misuse enumerates potential fixes for all possible bug locations in a program, before selecting the best prediction. We show that it is beneficial to train a model that jointly and directly localizes and repairs variable-misuse bugs. We present multi-headed pointer networks for this purpose, with one head each for localization and repair. The experimental results show that the joint model significantly outperforms an enumerative solution that uses a pointer based model for repair alone.
