Enhancing LLM-Based Feedback: Insights from Intelligent Tutoring Systems and the Learning Sciences
John Stamper, Ruiwei Xiao, Xinying Hou
TL;DR
The paper addresses the gap in theory-backed design for LLM-based feedback within Intelligent Tutoring Systems by synthesizing historical ITS feedback approaches—expert-generated models, data-driven models, and recent LLM-based methods—and grounding them in learning-science frameworks. It argues that effective LLM feedback requires careful trigger strategies, richer student-context inputs, KLI-aligned content, diverse delivery modalities, and rigorous evaluation beyond traditional metrics. The authors propose a practical, evidence-based design toolkit and guidelines spanning prompting, content selection, deployment, and assessment, aiming to preserve educational integrity and learning gains in the era of generative AI. This work highlights the importance of theoretical grounding to mitigate risks like bias and hallucination while enabling scalable, personalized feedback that aligns with established instructional principles. The practical impact lies in offering a roadmap for researchers and practitioners to develop pedagogically sound, evaluable LLM-powered ITS feedback systems.
Abstract
The field of Artificial Intelligence in Education (AIED) focuses on the intersection of technology, education, and psychology, placing a strong emphasis on supporting learners' needs with compassion and understanding. The growing prominence of Large Language Models (LLMs) has led to the development of scalable solutions within educational settings, including generating different types of feedback in Intelligent Tutoring Systems. However, the approach to utilizing these models often involves directly formulating prompts to solicit specific information, lacking a solid theoretical foundation for prompt construction and empirical assessments of their impact on learning. This work advocates careful and caring AIED research by going through previous research on feedback generation in ITS, with emphasis on the theoretical frameworks they utilized and the efficacy of the corresponding design in empirical evaluations, and then suggesting opportunities to apply these evidence-based principles to the design, experiment, and evaluation phases of LLM-based feedback generation. The main contributions of this paper include: an avocation of applying more cautious, theoretically grounded methods in feedback generation in the era of generative AI; and practical suggestions on theory and evidence-based feedback design for LLM-powered ITS.
