Revisiting Evolutionary Program Repair via Code Language Model
Yunan Wang, Tingyu Guo, Zilong Huang, Yuan Yuan
TL;DR
This work tackles the limitation of CLM-based APR restricting repairs to single-point defects by introducing ARJA-CLM, which integrates a multiobjective evolutionary algorithm with Code Language Models to repair multilocation Java defects. It advances a context-aware prompt strategy that injects callable fields and methods into the CLM input, enabling higher-quality candidate statements. Empirical results on Defects4J and APR-2024 show ARJA-CLM surpassing state-of-the-art baselines such as ARJA-e and DEAR, with notable improvements in multilocation repair and robustness to data leakage. The approach demonstrates how combining CLMs with traditional evolutionary search yields a more expressive repair search space and practical impact for real-world software maintenance.
Abstract
Software defects are an inherent part of software development and maintenance. To address these defects, Automated Program Repair (APR) has been developed to fix bugs automatically. With the advent of Large Language Models, Code Language Models (CLMs) trained on code corpora excels in code generation, making them suitable for APR applications. Despite this progress, a significant limitation remains: many bugs necessitate multi-point edits for repair, yet current CLM-based APRs are restricted to single-point bug fixes, which severely narrows the scope of repairable bugs. Moreover, these tools typically only consider the direct context of the buggy line when building prompts for the CLM, leading to suboptimal repair outcomes due to the limited information provided. This paper introduces a novel approach, ARJA-CLM, which integrates the multiobjective evolutionary algorithm with CLM to fix multilocation bugs in Java projects. We also propose a context-aware prompt construction stratege, which enriches the prompt with additional information about accessible fields and methods for the CLM generating candidate statements. Our experiments on the Defects4J and APR-2024 competition benchmark demonstrate that ARJA-CLM surpasses many state-of-the-art repair systems, and performs well on multi-point bugs. The results also reveal that CLMs effectively utilize the provided field and method information within context-aware prompts to produce candidate statements.
