Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis
Fengjie Li, Jiajun Jiang, Jiajun Sun, Hongyu Zhang
TL;DR
This work tackles the limitations of LLM-based automated program repair by using LLM-generated patches as guidance rather than final fixes. It introduces GIANTREPAIR, a two-stage framework that constructs patch skeletons from LLM patches via AST differencing and then instantiates these skeletons through static analysis and context-aware selection to produce high-quality patches. Large-scale experiments on Defects4J demonstrate that GIANTREPAIR significantly increases correct repairs over direct LLM patches and outperforms state-of-the-art APR methods under both perfect and automated fault localization, with notable unique fixes. The approach offers practical benefits for real-world repair by better leveraging LLM guidance and reducing search space, and the authors provide open-source code and data for reproducibility.
Abstract
Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers. Recently, LLM-based APR methods have shown promise in repairing real-world bugs. However, existing APR methods often utilize patches generated by LLMs without further optimization, resulting in reduced effectiveness due to the lack of program-specific knowledge. Furthermore, the evaluations of these APR methods have typically been conducted under the assumption of perfect fault localization, which may not accurately reflect their real-world effectiveness. To address these limitations, this paper introduces an innovative APR approach called GIANTREPAIR. Our approach leverages the insight that LLM-generated patches, although not necessarily correct, offer valuable guidance for the patch generation process. Based on this insight, GIANTREPAIR first constructs patch skeletons from LLM-generated patches to confine the patch space, and then generates high-quality patches tailored to specific programs through context-aware patch generation by instantiating the skeletons. To evaluate the performance of our approach, we conduct two large-scale experiments. The results demonstrate that GIANTREPAIR not only effectively repairs more bugs (an average of 27.78% on Defects4J v1.2 and 23.40% on Defects4J v2.0) than using LLM-generated patches directly, but also outperforms state-of-the-art APR methods by repairing at least 42 and 7 more bugs under perfect and automated fault localization scenarios, respectively.
