Learning Code-Edit Embedding to Model Student Debugging Behavior
Hasnain Heickal, Andrew Lan
TL;DR
The paper tackles the challenge of providing personalized, timely feedback in introductory programming by modeling student debugging as sequences of code edits. It proposes an encoder-decoder framework based on CodeT5 that learns code-edit embeddings from pairs of consecutive submissions, optimized with a contrastive loss, a reconstruction loss, and a regularization term to align edit and code spaces. By leveraging test-case masks to define similarity, the model enables personalized next-step code suggestions and facilitates clustering to reveal common debugging patterns. Experimental results on the CSEDM dataset show the approach yields more learner-centric and incremental guidance than baselines like GPT-4o, while exposing actionable insights into student debugging behaviors through cluster analyses. The work contributes a practical, scalable method for insight-rich feedback and presents directions for integrating such embeddings into educational platforms and broader language support.
Abstract
Providing effective feedback for programming assignments in computer science education can be challenging: students solve problems by iteratively submitting code, executing it, and using limited feedback from the compiler or the auto-grader to debug. Analyzing student debugging behavior in this process may reveal important insights into their knowledge and inform better personalized support tools. In this work, we propose an encoder-decoder-based model that learns meaningful code-edit embeddings between consecutive student code submissions, to capture their debugging behavior. Our model leverages information on whether a student code submission passes each test case to fine-tune large language models (LLMs) to learn code editing representations. It enables personalized next-step code suggestions that maintain the student's coding style while improving test case correctness. Our model also enables us to analyze student code-editing patterns to uncover common student errors and debugging behaviors, using clustering techniques. Experimental results on a real-world student code submission dataset demonstrate that our model excels at code reconstruction and personalized code suggestion while revealing interesting patterns in student debugging behavior.
