Table of Contents
Fetching ...

Reasoning Trajectories for Socratic Debugging of Student Code: From Misconceptions to Contradictions and Updated Beliefs

Erfan Al-Hossami, Razvan Bunescu

TL;DR

The paper tackles debugging education by replacing direct bug fixes with Socratic guidance that leads students to contradict their misconceptions. It formalizes Reasoning Trajectories (RTs) as deductive, counterexample-driven chains and pairs them with anchored Socratic turns to create complete conversations, supported by a McMining-derived dataset of 227 buggy samples across 501 problems. A large-scale evaluation shows frontier-style models can generate RTs with up to 91% validity and Socratic turns with 98.7% validity, validated by an LLM-as-judge framework. The work provides a scalable, pedagogy-driven approach to durable belief updates in programming concepts and offers a practical toolchain for instructors to craft targeted Socratic interventions.

Abstract

In Socratic debugging, instructors guide students towards identifying and fixing a bug on their own, instead of providing the bug fix directly. Most novice programmer bugs are caused by programming misconceptions, namely false beliefs about a programming concept. In this context, Socratic debugging can be formulated as a guided Reasoning Trajectory (RT) leading to a statement about the program behavior that contradicts the bug-causing misconception. Upon reaching this statement, the ensuing cognitive dissonance leads the student to first identify and then update their false belief. In this paper, we introduce the task of reasoning trajectory generation, together with a dataset of debugging problems manually annotated with RTs. We then describe LLM-based solutions for generating RTs and Socratic conversations that are anchored on them. A large-scale LLM-as-judge evaluation shows that frontier models can generate up to 91% correct reasoning trajectories and 98.7% valid conversation turns.

Reasoning Trajectories for Socratic Debugging of Student Code: From Misconceptions to Contradictions and Updated Beliefs

TL;DR

The paper tackles debugging education by replacing direct bug fixes with Socratic guidance that leads students to contradict their misconceptions. It formalizes Reasoning Trajectories (RTs) as deductive, counterexample-driven chains and pairs them with anchored Socratic turns to create complete conversations, supported by a McMining-derived dataset of 227 buggy samples across 501 problems. A large-scale evaluation shows frontier-style models can generate RTs with up to 91% validity and Socratic turns with 98.7% validity, validated by an LLM-as-judge framework. The work provides a scalable, pedagogy-driven approach to durable belief updates in programming concepts and offers a practical toolchain for instructors to craft targeted Socratic interventions.

Abstract

In Socratic debugging, instructors guide students towards identifying and fixing a bug on their own, instead of providing the bug fix directly. Most novice programmer bugs are caused by programming misconceptions, namely false beliefs about a programming concept. In this context, Socratic debugging can be formulated as a guided Reasoning Trajectory (RT) leading to a statement about the program behavior that contradicts the bug-causing misconception. Upon reaching this statement, the ensuing cognitive dissonance leads the student to first identify and then update their false belief. In this paper, we introduce the task of reasoning trajectory generation, together with a dataset of debugging problems manually annotated with RTs. We then describe LLM-based solutions for generating RTs and Socratic conversations that are anchored on them. A large-scale LLM-as-judge evaluation shows that frontier models can generate up to 91% correct reasoning trajectories and 98.7% valid conversation turns.

Paper Structure

This paper contains 36 sections, 1 equation, 13 figures, 2 tables, 1 algorithm.

Figures (13)

  • Figure 1: Socratic debugging example: (a) the input specifies the problem, the buggy code, the failed test case, and the student misconception that caused the bug; (b) a reasoning trajectory ending with a statement that contradicts the misconception; (c) a Socratic conversation that follows the reasoning trajectory and ends with a belief update.
  • Figure 2: Alternative reasoning trajectory for the input from Figure \ref{['fig:task']}(a).
  • Figure 3: The original input specification.
  • Figure 4: The simplified input for the original in Figure \ref{['fig:original']}.
  • Figure 5: The RT for the simplified input in Figure \ref{['fig:simplification']}.
  • ...and 8 more figures