Table of Contents
Fetching ...

Reinforcement Learning for Mutation Operator Selection in Automated Program Repair

Carol Hanna, Aymeric Blot, Justyna Petke

TL;DR

This work tackles the inefficiency of random mutation operator selection in heuristic-based automated program repair by introducing reinforcement learning (RL) to guide operator choice. It integrates four RL strategies (probability matching, adaptive pursuit, epsilon-greedy, and UCB) with two credit assignments and two reward types into the JaRFly APR framework, evaluating on 353 real Defects4J bugs across 30,080 repair attempts. The study finds that RL-guided operator selection increases the number of test-passing variants but does not significantly raise the count of bugs patched, likely due to the coarse fitness signal and limited budget. The results offer actionable insights into operator-space design, the stationarity of APR search environments, and directions for more effective reward signals and resource allocation in future RL-augmented repair systems.

Abstract

Automated program repair techniques aim to aid software developers with the challenging task of fixing bugs. In heuristic-based program repair, a search space of program variants, created via mutations on software, is explored to find potential patches for bugs. Most commonly, every selection of a mutation operator during search is performed uniformly at random, whcih can generate many buggy, even uncompilable program variants. Our goal is to reduce the generation of variants that do not compile or break intended functionality which waste considerable resources. In this paper, we investigate the feasibility of a reinforcement learning-based approach for the selection of mutation operators in heuristic-based program repair. Our proposed approach is programming language, granularity-level, and search strategy agnostic and allows for easy augmentation into existing heuristic-based repair tools. We conduct an extensive empirical evaluation of four operator selection techniques, two reward types, two credit assignment strategies, two integration methods, and three sets of mutation operators using 30,080 independent repair attempts. We evaluate our approach on 353 real-world bugs from the Defects4J benchmark.The reinforcement learning-based mutation operator selection results in a higher number of test-passing variants, but does not exhibit a noticeable improvement in the number of bugs patched in comparison with the baseline, which uses random selection. While reinforcement learning has been previously shown to be successful in improving the search of evolutionary algorithms, often used in heuristic-based program repair, it has not shown such improvements when applied to this area of research.

Reinforcement Learning for Mutation Operator Selection in Automated Program Repair

TL;DR

This work tackles the inefficiency of random mutation operator selection in heuristic-based automated program repair by introducing reinforcement learning (RL) to guide operator choice. It integrates four RL strategies (probability matching, adaptive pursuit, epsilon-greedy, and UCB) with two credit assignments and two reward types into the JaRFly APR framework, evaluating on 353 real Defects4J bugs across 30,080 repair attempts. The study finds that RL-guided operator selection increases the number of test-passing variants but does not significantly raise the count of bugs patched, likely due to the coarse fitness signal and limited budget. The results offer actionable insights into operator-space design, the stationarity of APR search environments, and directions for more effective reward signals and resource allocation in future RL-augmented repair systems.

Abstract

Automated program repair techniques aim to aid software developers with the challenging task of fixing bugs. In heuristic-based program repair, a search space of program variants, created via mutations on software, is explored to find potential patches for bugs. Most commonly, every selection of a mutation operator during search is performed uniformly at random, whcih can generate many buggy, even uncompilable program variants. Our goal is to reduce the generation of variants that do not compile or break intended functionality which waste considerable resources. In this paper, we investigate the feasibility of a reinforcement learning-based approach for the selection of mutation operators in heuristic-based program repair. Our proposed approach is programming language, granularity-level, and search strategy agnostic and allows for easy augmentation into existing heuristic-based repair tools. We conduct an extensive empirical evaluation of four operator selection techniques, two reward types, two credit assignment strategies, two integration methods, and three sets of mutation operators using 30,080 independent repair attempts. We evaluate our approach on 353 real-world bugs from the Defects4J benchmark.The reinforcement learning-based mutation operator selection results in a higher number of test-passing variants, but does not exhibit a noticeable improvement in the number of bugs patched in comparison with the baseline, which uses random selection. While reinforcement learning has been previously shown to be successful in improving the search of evolutionary algorithms, often used in heuristic-based program repair, it has not shown such improvements when applied to this area of research.
Paper Structure (26 sections, 5 equations, 9 tables)