Table of Contents
Fetching ...

BugSpotter: Automated Generation of Code Debugging Exercises

Victor-Alexandru Pădurean, Paul Denny, Adish Singla

Abstract

Debugging is an essential skill when learning to program, yet its instruction and emphasis often vary widely across introductory courses. In the era of code-generating large language models (LLMs), the ability for students to reason about code and identify errors is increasingly important. However, students frequently resort to trial-and-error methods to resolve bugs without fully understanding the underlying issues. Developing the ability to identify and hypothesize the cause of bugs is crucial but can be time-consuming to teach effectively through traditional means. This paper introduces BugSpotter, an innovative tool that leverages an LLM to generate buggy code from a problem description and verify the synthesized bugs via a test suite. Students interact with BugSpotter by designing failing test cases, where the buggy code's output differs from the expected result as defined by the problem specification. This not only provides opportunities for students to enhance their debugging skills, but also to practice reading and understanding problem specifications. We deployed BugSpotter in a large classroom setting and compared the debugging exercises it generated to exercises hand-crafted by an instructor for the same problems. We found that the LLM-generated exercises produced by BugSpotter varied in difficulty and were well-matched to the problem specifications. Importantly, the LLM-generated exercises were comparable to those manually created by instructors with respect to student performance, suggesting that BugSpotter could be an effective and efficient aid for learning debugging.

BugSpotter: Automated Generation of Code Debugging Exercises

Abstract

Debugging is an essential skill when learning to program, yet its instruction and emphasis often vary widely across introductory courses. In the era of code-generating large language models (LLMs), the ability for students to reason about code and identify errors is increasingly important. However, students frequently resort to trial-and-error methods to resolve bugs without fully understanding the underlying issues. Developing the ability to identify and hypothesize the cause of bugs is crucial but can be time-consuming to teach effectively through traditional means. This paper introduces BugSpotter, an innovative tool that leverages an LLM to generate buggy code from a problem description and verify the synthesized bugs via a test suite. Students interact with BugSpotter by designing failing test cases, where the buggy code's output differs from the expected result as defined by the problem specification. This not only provides opportunities for students to enhance their debugging skills, but also to practice reading and understanding problem specifications. We deployed BugSpotter in a large classroom setting and compared the debugging exercises it generated to exercises hand-crafted by an instructor for the same problems. We found that the LLM-generated exercises produced by BugSpotter varied in difficulty and were well-matched to the problem specifications. Importantly, the LLM-generated exercises were comparable to those manually created by instructors with respect to student performance, suggesting that BugSpotter could be an effective and efficient aid for learning debugging.

Paper Structure

This paper contains 15 sections, 8 figures.

Figures (8)

  • Figure 1: Illustration of a debugging exercise from BugSpotter for Problem 1, where a student's objective is to design a failing test case. (a) shows the problem specification. (b) shows the buggy code. (c) shows a successfully designed failing test case. A failing test case is composed of the function's input, its buggy output, and the correct output, according to the problem specification. In this case, the student successfully solved the exercise (tick ) by (1) providing a test case that leads to the buggy code generating a different output to the correct code, (2) providing the buggy output that matches what is obtained by executing the buggy code on the input, and (3) providing the correct output that matches what is obtained by executing the reference solution code on the input.
  • Figure 2: BugSpotter's exercise generation pipeline.
  • Figure 3: Prompt for asking LLMs to generate buggy codes.
  • Figure 4: Results of the expert-based quality assessment. BugSpotter leverages LLMs from OpenAI's GPT family GPT-Family. Evaluation was done across 3 different problems, over 3 independent runs, according to the rubric described in Section \ref{['sec.setup']}.
  • Figure 5: Student success rates w.r.t. expert-assessed difficulty of exercises. Aggregated per problem, success rates are 67.2% for Problem 1, 40.5% for Problem 2, and 39.2% for Problem 3.
  • ...and 3 more figures