Table of Contents
Fetching ...

Counterexample Guided Program Repair Using Zero-Shot Learning and MaxSAT-based Fault Localization

Pedro Orvalho, Mikoláš Janota, Vasco Manquinho

TL;DR

This work tackles automated program repair for introductory programming assignments by uniting formal methods-based fault localization with zero-shot learning in LLMs within a Counterexample Guided Inductive Synthesis loop. Key to the approach is MaxSAT-based fault localization to produce bug-free program sketches, which the LLM then completes in iterative cycles guided by test-suite counterexamples. Across 1431 incorrect C programs, the hybrid prompt configurations and the portfolio LLM strategy repair more programs and produce smaller patches than state-of-the-art symbolic tools, demonstrating the value of combining formal diagnosis with data-driven synthesis. The results suggest a practical, scalable path for automated, minimally invasive feedback in programming education, with code and methodologies applicable to broader APR contexts.

Abstract

Automated Program Repair (APR) for introductory programming assignments (IPAs) is motivated by the large number of student enrollments in programming courses each year. Since providing feedback on IPAs requires substantial time and effort from faculty, personalized feedback often involves suggesting fixes to students' programs. Formal Methods (FM)-based semantic repair approaches, check a program's execution against a test suite or reference solution, are effective but limited. These tools excel at identifying buggy parts but can only fix programs if the correct implementation and the faulty one share the same control flow graph. Conversely, Large Language Models (LLMs) are used for APR but often make extensive instead of minimal rewrites. This leads to more invasive fixes, making it harder for students to learn from their mistakes. In summary, LLMs excel at completing strings, while FM-based fault localization excel at identifying buggy parts of a program. In this paper, we propose a novel approach that combines the strengths of both FM-based fault localization and LLMs, via zero-shot learning, to enhance APR for IPAs. Our method uses MaxSAT-based fault localization to identify buggy parts of a program, then presents the LLM with a program sketch devoid of these buggy statements. This hybrid approach follows a CEGIS loop to iteratively refine the program. We ask the LLM to synthesize the missing parts, which are then checked against a test suite. If the suggested program is incorrect, a counterexample from the test suite is fed back to the LLM. Our experiments show that our counterexample guided approach, using MaxSAT-based bug-free program sketches, significantly improves the repair capabilities of all six evaluated LLMs. This method allows LLMs to repair more programs with smaller fixes, outperforming other configurations and state-of-the-art symbolic program repair tools.

Counterexample Guided Program Repair Using Zero-Shot Learning and MaxSAT-based Fault Localization

TL;DR

This work tackles automated program repair for introductory programming assignments by uniting formal methods-based fault localization with zero-shot learning in LLMs within a Counterexample Guided Inductive Synthesis loop. Key to the approach is MaxSAT-based fault localization to produce bug-free program sketches, which the LLM then completes in iterative cycles guided by test-suite counterexamples. Across 1431 incorrect C programs, the hybrid prompt configurations and the portfolio LLM strategy repair more programs and produce smaller patches than state-of-the-art symbolic tools, demonstrating the value of combining formal diagnosis with data-driven synthesis. The results suggest a practical, scalable path for automated, minimally invasive feedback in programming education, with code and methodologies applicable to broader APR contexts.

Abstract

Automated Program Repair (APR) for introductory programming assignments (IPAs) is motivated by the large number of student enrollments in programming courses each year. Since providing feedback on IPAs requires substantial time and effort from faculty, personalized feedback often involves suggesting fixes to students' programs. Formal Methods (FM)-based semantic repair approaches, check a program's execution against a test suite or reference solution, are effective but limited. These tools excel at identifying buggy parts but can only fix programs if the correct implementation and the faulty one share the same control flow graph. Conversely, Large Language Models (LLMs) are used for APR but often make extensive instead of minimal rewrites. This leads to more invasive fixes, making it harder for students to learn from their mistakes. In summary, LLMs excel at completing strings, while FM-based fault localization excel at identifying buggy parts of a program. In this paper, we propose a novel approach that combines the strengths of both FM-based fault localization and LLMs, via zero-shot learning, to enhance APR for IPAs. Our method uses MaxSAT-based fault localization to identify buggy parts of a program, then presents the LLM with a program sketch devoid of these buggy statements. This hybrid approach follows a CEGIS loop to iteratively refine the program. We ask the LLM to synthesize the missing parts, which are then checked against a test suite. If the suggested program is incorrect, a counterexample from the test suite is fed back to the LLM. Our experiments show that our counterexample guided approach, using MaxSAT-based bug-free program sketches, significantly improves the repair capabilities of all six evaluated LLMs. This method allows LLMs to repair more programs with smaller fixes, outperforming other configurations and state-of-the-art symbolic program repair tools.

Paper Structure

This paper contains 24 sections, 1 equation, 2 figures, 9 tables.

Figures (2)

  • Figure 1: Counterexample Guided Automated Repair.
  • Figure 2: Comparison of tree edit distances (TED) for Gra-ni-te's repairs when using (x-axis) versus not using (y-axis) correct implementations with configuration Sk_De-TS-CE.