How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging

Qianou Ma; Hua Shen; Kenneth Koedinger; Tongshuang Wu

How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging

Qianou Ma, Hua Shen, Kenneth Koedinger, Tongshuang Wu

TL;DR

HypoCompass is introduced, a novel system to facilitate deliberate practice on debugging, where human novices play the role of Teaching Assistants and help LLM-powered teachable agents debug code.

Abstract

Large Language Models (LLMs) now excel at generative skills and can create content at impeccable speeds. However, they are imperfect and still make various mistakes. In a Computer Science education context, as these models are widely recognized as "AI pair programmers," it becomes increasingly important to train students on evaluating and debugging the LLM-generated code. In this work, we introduce HypoCompass, a novel system to facilitate deliberate practice on debugging, where human novices play the role of Teaching Assistants and help LLM-powered teachable agents debug code. We enable effective task delegation between students and LLMs in this learning-by-teaching environment: students focus on hypothesizing the cause of code errors, while adjacent skills like code completion are offloaded to LLM-agents. Our evaluations demonstrate that HypoCompass generates high-quality training materials (e.g., bugs and fixes), outperforming human counterparts fourfold in efficiency, and significantly improves student performance on debugging by 12% in the pre-to-post test.

How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging

TL;DR

HypoCompass is introduced, a novel system to facilitate deliberate practice on debugging, where human novices play the role of Teaching Assistants and help LLM-powered teachable agents debug code.

Abstract

Paper Structure (19 sections, 6 figures, 2 tables)

This paper contains 19 sections, 6 figures, 2 tables.

Introduction
Related Works
The Design of HypoCompass
Interface and Key Components.
LLM Integration
Task Formation and Decomposition.
Over-Generate-then-Select.
Human-in-the-Loop Verification.
LLM Evaluation: Generation Efficiency and Quality
Method.
Result: Efficient and High-Quality Generation.
Learning Evaluation: Pre- / Post-Test Study
Assessment.
Method: Study Procedure and Participants.
Quantitative Result: Learning Gains.
...and 4 more sections

Figures (6)

Figure 1: In HypoCompass, given a programming problem description (A), a student user (in the role of a Teaching Assistant) needs to compile a test suite (B) and assist multiple LLM-simulated agents (e.g.,Bob, Chelsea, Dave) in an Office Hour Queue (C) through a chat interface (E). Each LLM-agent acts as a novice seeking help with a buggy solution (D) and provides feedback to the user (F).
Figure 2: To enable deliberate practice, we establish a close mapping between the (A) learning objectives, (B) the cognitive debugging process model, (C) the HypoCompass interaction flow, and (D) the primary tasks students perform in HypoCompass. We offload various material generation tasks to LLMs (C$_2$).
Figure 3: Examples of inputs and outputs to the LLM material generation pipeline.
Figure 4: Over-generate and automatically select materials with pedagogical values.
Figure 5: Pre-post test question examples for \ref{['lo:comprehensive']} comprehensive (Q3.1 and Q3.2) and \ref{['lo:accurate']} accurate hypothesis construction (Q7).
...and 1 more figures

How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging

TL;DR

Abstract

How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging

Authors

TL;DR

Abstract

Table of Contents

Figures (6)