Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors
Jian Wang, Yinpei Dai, Yichi Zhang, Ziqiao Ma, Wenjie Li, Joyce Chai
TL;DR
Task-tutoring with LLMs faces grounding and personalization challenges in real-world tasks. The authors introduce Trace-and-Verify (Traver), a workflow that combines explicit knowledge tracing with a turn-by-turn verifier to guide tutor utterances toward task completion, and the Dict automatic evaluation protocol that uses simulated students and automated unit tests for scalable benchmarking. Empirical results on EvoCodeBench show Traver improves tutoring outcomes over vanilla baselines, narrows the gap to an Oracle, and supports inference-time scaling by evaluating multiple candidate utterances per turn. The work lays a path toward scalable, task-focused tutoring beyond coding and highlights opportunities for future human-in-the-loop validation and broader applications.
Abstract
Intelligent tutoring agents powered by large language models (LLMs) have been increasingly explored to deliver personalized knowledge in areas such as language learning and science education. However, their capabilities in guiding users to solve complex real-world tasks remain underexplored. To address this limitation, in this work, we focus on coding tutoring, a challenging problem that requires tutors to proactively guide students towards completing predefined coding tasks. We propose a novel agent workflow, Trace-and-Verify (TRAVER), which combines knowledge tracing to estimate a student's knowledge state and turn-by-turn verification to ensure effective guidance toward task completion. We introduce DICT, an automatic evaluation protocol that assesses tutor agents using controlled student simulation and code generation tests. Extensive experiments reveal the challenges of coding tutoring and demonstrate that TRAVER achieves a significantly higher success rate. Although we use code tutoring as an example in this paper, our approach can be extended beyond coding, providing valuable insights into advancing tutoring agents for human task learning.
