Table of Contents
Fetching ...

DebugTA: An LLM-Based Agent for Simplifying Debugging and Teaching in Programming Education

Lingyue Fu, Haowei Yuan, Datong Chen, Xinyi Dai, Qingyao Li, Weinan Zhang, Weiwen Liu, Yong Yu

TL;DR

DebugTA addresses the DT challenge in programming education by decoupling complex reasoning into tool-assisted steps using Standard Code Retrieval, Variable Substitution, and External Compiler tools, aided by a memory module. It integrates a student simulator (StuBot) to evaluate modification quality across three real-world datasets and multiple backbones, demonstrating improved accuracy and reduced computational costs while mitigating answer leakage. The approach shows model-size robustness, as even smaller LLMs outperform baselines by leveraging decomposition and targeted tools. Overall, DebugTA offers a scalable, cost-efficient framework for AI-assisted debugging and teaching in programming education with practical implications for online judges and classrooms.

Abstract

In programming education, Debugging and Teaching (DT) task is a common scenario where students receive assistance in correcting their erroneous code. The task involves multiple inputs, including erroneous code, error messages, reference solutions, and the question description, with the goal of generating modification suggestions to the erroneous code. However, two key challenges hinder the effectiveness of existing approaches. Firstly, the complexity and heterogeneity of inputs inherent in DT tasks significantly elevate the reasoning challenges faced by LLMs. Second, existing approaches often fail to fully leverage the availability of standard code in DT tasks, forcing models to rely solely on complex multi-step reasoning, which limits the potential of LLMs in addressing DT tasks effectively. To address these challenges, we propose DebugTA, a novel LLM-based debugging and teaching agent with specialized tools for standard code retrieval, variable substitution to align reference code, and an external compiler for real-time code analysis. Guided by explicit pedagogical and debugging principles, DebugTA acts as an agent that decomposes a complex task into sequential LLM interactions, each utilizing distinct tools for specific subtasks, thereby simplifying the logical reasoning at each step and reducing overall reasoning complexity. Furthermore, DebugTA utilizes tool calls to align the standard code with the erroneous code as much as possible, allowing the LLM to focus on logic errors within the erroneous code and improving the accuracy of the generated suggestions. To rigorously assess the quality of modification suggestions, we introduce a student simulator-teacher interaction paradigm. Experimental results on three real-world code datasets demonstrate that DebugTA consistently improves teaching effectiveness while significantly reducing computational costs.

DebugTA: An LLM-Based Agent for Simplifying Debugging and Teaching in Programming Education

TL;DR

DebugTA addresses the DT challenge in programming education by decoupling complex reasoning into tool-assisted steps using Standard Code Retrieval, Variable Substitution, and External Compiler tools, aided by a memory module. It integrates a student simulator (StuBot) to evaluate modification quality across three real-world datasets and multiple backbones, demonstrating improved accuracy and reduced computational costs while mitigating answer leakage. The approach shows model-size robustness, as even smaller LLMs outperform baselines by leveraging decomposition and targeted tools. Overall, DebugTA offers a scalable, cost-efficient framework for AI-assisted debugging and teaching in programming education with practical implications for online judges and classrooms.

Abstract

In programming education, Debugging and Teaching (DT) task is a common scenario where students receive assistance in correcting their erroneous code. The task involves multiple inputs, including erroneous code, error messages, reference solutions, and the question description, with the goal of generating modification suggestions to the erroneous code. However, two key challenges hinder the effectiveness of existing approaches. Firstly, the complexity and heterogeneity of inputs inherent in DT tasks significantly elevate the reasoning challenges faced by LLMs. Second, existing approaches often fail to fully leverage the availability of standard code in DT tasks, forcing models to rely solely on complex multi-step reasoning, which limits the potential of LLMs in addressing DT tasks effectively. To address these challenges, we propose DebugTA, a novel LLM-based debugging and teaching agent with specialized tools for standard code retrieval, variable substitution to align reference code, and an external compiler for real-time code analysis. Guided by explicit pedagogical and debugging principles, DebugTA acts as an agent that decomposes a complex task into sequential LLM interactions, each utilizing distinct tools for specific subtasks, thereby simplifying the logical reasoning at each step and reducing overall reasoning complexity. Furthermore, DebugTA utilizes tool calls to align the standard code with the erroneous code as much as possible, allowing the LLM to focus on logic errors within the erroneous code and improving the accuracy of the generated suggestions. To rigorously assess the quality of modification suggestions, we introduce a student simulator-teacher interaction paradigm. Experimental results on three real-world code datasets demonstrate that DebugTA consistently improves teaching effectiveness while significantly reducing computational costs.

Paper Structure

This paper contains 35 sections, 6 equations, 7 figures, 3 tables, 2 algorithms.

Figures (7)

  • Figure 1: Demonstration of the Debugging and Teaching (DT) task. DebugTA receives the erroneous code and the standard code pool, connects to an external compiler, and then outputs tailored modification suggestions for each student.
  • Figure 2: Overview of DebugTA. DebugTA is equipped with three specialized tools: Standard Code Retrieval, Variable Substitution, and Compiler, along with a memory module. DebugTA adaptively invokes these tools and manages information in memory to process erroneous code submissions, ultimately generating modification suggestions for students.
  • Figure 3: Performance of DebugTA using DeepSeek with various model sizes on the Code4Bench dataset.
  • Figure 4: The relationship between token usage (x-axis), average AC rate on ACMOJ (y-axis), and performance variance (bubble size) between backbone LLMs.
  • Figure 5: AC Rate improvement comparison between StuBot and human participants across eight programming problems. Human 1 is a Ph.D. student in Computer Science and Human 2 is a C++ beginner.
  • ...and 2 more figures