Autograder+: A Multi-Faceted AI Framework for Rich Pedagogical Feedback in Programming Education
Vikrant Sahu, Gagan Raj Gupta, Raghav Borikar, Nitin Mane
TL;DR
This paper presents Autograder+, a framework that converts autograding from a binary, black-box assessment into a formative, context-aware learning experience. It combines two AI strategies—a fine-tuned LLM for direct feedback and a contrastively trained embedding model for semantic analytics and visualization—within a secure, end-to-end pipeline that includes prompt pooling and instructor-facing analytics via UMAP. The system demonstrates strong semantic alignment with human feedback (average BERTScore F1 around 0.75) on 600 submissions and enables clustering of solutions by functionality and approach using 1,000 annotated examples. By integrating automated feedback, semantic clustering, and interactive visualizations, Autograder+ aims to reduce instructor workload, enable targeted instruction, and scale high-quality feedback across large programming courses, with potential applicability to other domains of structured problem solving.
Abstract
The rapid growth of programming education has outpaced traditional assessment tools, leaving faculty with limited means to provide meaningful, scalable feedback. Conventional autograders, while efficient, act as black-box systems that simply return pass/fail results, offering little insight into student thinking or learning needs. Autograder+ is designed to shift autograding from a purely summative process to a formative learning experience. It introduces two key capabilities: automated feedback generation using a fine-tuned Large Language Model, and visualization of student code submissions to uncover learning patterns. The model is fine-tuned on curated student code and expert feedback to ensure pedagogically aligned, context-aware guidance. In evaluation across 600 student submissions from multiple programming tasks, the system produced feedback with strong semantic alignment to instructor comments. For visualization, contrastively learned code embeddings trained on 1,000 annotated submissions enable grouping solutions into meaningful clusters based on functionality and approach. The system also supports prompt-pooling, allowing instructors to guide feedback style through selected prompt templates. By integrating AI-driven feedback, semantic clustering, and interactive visualization, Autograder+ reduces instructor workload while supporting targeted instruction and promoting stronger learning outcomes.
