TAMIGO: Empowering Teaching Assistants using LLM-assisted viva and code assessment in an Advanced Computing Class
Anishka IIITD, Diksha Sethi, Nipun Gupta, Shikhar Sharma, Srishti Jain, Ujjwal Singhal, Dhruv Kumar
TL;DR
The paper addresses the burden of TA workloads in advanced computing courses by introducing TAMIGO, an LLM-based system that generates viva questions, evaluates student responses, and provides code feedback. Using GPT-3.5-Turbo and retrieval-augmented generation, TAMIGO evolves from a two-module v1 to a more capable v2 with a Code Evaluation module, and is deployed in two take-home distributed-systems assignments. Findings show that while LLMs produce high-quality viva questions, feedback on viva answers can hallucinate, whereas code feedback and code summaries are generally thorough and useful, with rubric alignment needing improvement. The study demonstrates the practical potential and limits of integrating LLMs into TA workflows in higher education, guiding future efforts to enhance reliability, rubric adherence, and user experience for scalable education tools.
Abstract
Large Language Models (LLMs) have significantly transformed the educational landscape, offering new tools for students, instructors, and teaching assistants. This paper investigates the application of LLMs in assisting teaching assistants (TAs) with viva and code assessments in an advanced computing class on distributed systems in an Indian University. We develop TAMIGO, an LLM-based system for TAs to evaluate programming assignments. For viva assessment, the TAs generated questions using TAMIGO and circulated these questions to the students for answering. The TAs then used TAMIGO to generate feedback on student answers. For code assessment, the TAs selected specific code blocks from student code submissions and fed it to TAMIGO to generate feedback for these code blocks. The TAMIGO-generated feedback for student answers and code blocks was used by the TAs for further evaluation. We evaluate the quality of LLM-generated viva questions, model answers, feedback on viva answers, and feedback on student code submissions. Our results indicate that LLMs are highly effective at generating viva questions when provided with sufficient context and background information. However, the results for LLM-generated feedback on viva answers were mixed; instances of hallucination occasionally reduced the accuracy of feedback. Despite this, the feedback was consistent, constructive, comprehensive, balanced, and did not overwhelm the TAs. Similarly, for code submissions, the LLM-generated feedback was constructive, comprehensive and balanced, though there was room for improvement in aligning the feedback with the instructor-provided rubric for code evaluation. Our findings contribute to understanding the benefits and limitations of integrating LLMs into educational settings.
