CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs
Majeed Kazemitabaar, Runlong Ye, Xiaoning Wang, Austin Z. Henley, Paul Denny, Michelle Craig, Tovi Grossman
TL;DR
The paper tackles scalable, equitable programming education by introducing CodeAid, an LLM-based assistant that provides conceptual guidance, pseudo-code scaffolding, and targeted code annotations without exposing direct solutions. Through a 12-week deployment in a 700-student CS course, the authors collect multi-source data (8,000+ interactions, surveys, interviews) and perform thematic analyses to assess usage, answer quality, and user trust. Key contributions include four design considerations for educational AI tooling, a robust multi-faceted evaluation, and iterative system updates such as pseudo-code integration and line-level explanations. The work demonstrates that pedagogically guarded AI tools can augment learning, reduce dependence on direct solutions, and offer actionable insights for integrating AI assistants into curricula while preserving instructional integrity and equity.
Abstract
Timely, personalized feedback is essential for students learning programming. LLM-powered tools like ChatGPT offer instant support, but reveal direct answers with code, which may hinder deep conceptual engagement. We developed CodeAid, an LLM-powered programming assistant delivering helpful, technically correct responses, without revealing code solutions. CodeAid answers conceptual questions, generates pseudo-code with line-by-line explanations, and annotates student's incorrect code with fix suggestions. We deployed CodeAid in a programming class of 700 students for a 12-week semester. A thematic analysis of 8,000 usages of CodeAid was performed, further enriched by weekly surveys, and 22 student interviews. We then interviewed eight programming educators to gain further insights. Our findings reveal four design considerations for future educational AI assistants: D1) exploiting AI's unique benefits; D2) simplifying query formulation while promoting cognitive engagement; D3) avoiding direct responses while encouraging motivated learning; and D4) maintaining transparency and control for students to asses and steer AI responses.
