Table of Contents
Fetching ...

Semantic Evolution over Populations for LLM-Guided Automated Program Repair

Cuong Chi Le, Minh Le-Anh, Cuong Duc Van, Tien N. Nguyen

Abstract

Large language models (LLMs) have recently shown strong potential for automated program repair (APR), particularly through iterative refinement that generates and improves candidate patches. However, state-of-the-art iterative refinement LLM-based APR approaches cannot fully address challenges, including maintaining useful diversity among repair hypotheses, identifying semantically related repair families, composing complementary partial fixes, exploiting structured failure information, and escaping structurally flawed search regions. In this paper, we propose a Population-Based Semantic Evolution framework for APR iterative refinement, called EvolRepair, that formulates LLM-based APR as a semantic evolutionary algorithm. EvolRepair reformulates the search paradigm of classic genetic algorithm for APR, but replaces its syntax-based operators with semantics-aware components powered by LLMs and structured execution feedback. Candidate repairs are organized into behaviorally coherent groups, enabling the algorithm to preserve diversity, reason over repair families, and synthesize stronger candidates by recombining complementary repair insights across the population. By leveraging structured failure patterns to guide search direction, EvolRepair can both refine promising repair strategies and shift toward alternative abstractions when necessary. Our experiments show that EvolRepair substantially improves repair effectiveness over existing LLM-based APR approaches.

Semantic Evolution over Populations for LLM-Guided Automated Program Repair

Abstract

Large language models (LLMs) have recently shown strong potential for automated program repair (APR), particularly through iterative refinement that generates and improves candidate patches. However, state-of-the-art iterative refinement LLM-based APR approaches cannot fully address challenges, including maintaining useful diversity among repair hypotheses, identifying semantically related repair families, composing complementary partial fixes, exploiting structured failure information, and escaping structurally flawed search regions. In this paper, we propose a Population-Based Semantic Evolution framework for APR iterative refinement, called EvolRepair, that formulates LLM-based APR as a semantic evolutionary algorithm. EvolRepair reformulates the search paradigm of classic genetic algorithm for APR, but replaces its syntax-based operators with semantics-aware components powered by LLMs and structured execution feedback. Candidate repairs are organized into behaviorally coherent groups, enabling the algorithm to preserve diversity, reason over repair families, and synthesize stronger candidates by recombining complementary repair insights across the population. By leveraging structured failure patterns to guide search direction, EvolRepair can both refine promising repair strategies and shift toward alternative abstractions when necessary. Our experiments show that EvolRepair substantially improves repair effectiveness over existing LLM-based APR approaches.

Paper Structure

This paper contains 27 sections, 28 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: A motivating APR example based on rotated-array binary search with duplicates. The easy paths follow standard binary-search reasoning, but the duplicate-ambiguity path requires switching to a different repair abstraction rather than making only local comparison-level edits.
  • Figure 2: EvolRepair Overview
  • Figure 3: Prompt template for recombination. The model synthesizes a stronger repair by integrating complementary logic from multiple candidate solutions.
  • Figure 4: Prompt template for mutation. The model refines a failed candidate using structured failure feedback.
  • Figure 5: APR progress across iterations on mini-Size-Sub-array, which asks for the shortest target-sum subarray in an infinitely repeated array. EvolRepair reaches full correctness, while baseline methods plateau at partial correctness.
  • ...and 1 more figures