Table of Contents
Fetching ...

From Superficial Outputs to Superficial Learning: Risks of Large Language Models in Education

Iris Delikoura, Yi. R Fung, Pan Hui

TL;DR

The paper addresses the risks of integrating large language models into education by conducting a PRISMA-guided systematic review of 70 empirical studies across computer science, education, and psychology. It develops a three-domain taxonomy—operational effectiveness, personalized applications, and interactive learning tools—and introduces the LLM-Risk Adapted Learning Model to trace how technical, cognitive, and societal risks cascade through learning interactions. Key findings show pervasive risks like superficial understanding, bias, hallucinations, memory erosion, and reduced student agency, with mitigation strategies spread across educators, developers, students, and policymakers. This work provides a foundation for human-centered, responsible integration of LLMs in education and offers methodological guidance for rigorous, comparable future research.

Abstract

Large Language Models (LLMs) are transforming education by enabling personalization, feedback, and knowledge access, while also raising concerns about risks to students and learning systems. Yet empirical evidence on these risks remains fragmented. This paper presents a systematic review of 70 empirical studies across computer science, education, and psychology. Guided by four research questions, we examine: (i) which applications of LLMs in education have been most frequently explored; (ii) how researchers have measured their impact; (iii) which risks stem from such applications; and (iv) what mitigation strategies have been proposed. We find that research on LLMs clusters around three domains: operational effectiveness, personalized applications, and interactive learning tools. Across these, model-level risks include superficial understanding, bias, limited robustness, anthropomorphism, hallucinations, privacy concerns, and knowledge constraints. When learners interact with LLMs, these risks extend to cognitive and behavioural outcomes, including reduced neural activity, over-reliance, diminished independent learning skills, and a loss of student agency. To capture this progression, we propose an LLM-Risk Adapted Learning Model that illustrates how technical risks cascade through interaction and interpretation to shape educational outcomes. As the first synthesis of empirically assessed risks, this review provides a foundation for responsible, human-centred integration of LLMs in education.

From Superficial Outputs to Superficial Learning: Risks of Large Language Models in Education

TL;DR

The paper addresses the risks of integrating large language models into education by conducting a PRISMA-guided systematic review of 70 empirical studies across computer science, education, and psychology. It develops a three-domain taxonomy—operational effectiveness, personalized applications, and interactive learning tools—and introduces the LLM-Risk Adapted Learning Model to trace how technical, cognitive, and societal risks cascade through learning interactions. Key findings show pervasive risks like superficial understanding, bias, hallucinations, memory erosion, and reduced student agency, with mitigation strategies spread across educators, developers, students, and policymakers. This work provides a foundation for human-centered, responsible integration of LLMs in education and offers methodological guidance for rigorous, comparable future research.

Abstract

Large Language Models (LLMs) are transforming education by enabling personalization, feedback, and knowledge access, while also raising concerns about risks to students and learning systems. Yet empirical evidence on these risks remains fragmented. This paper presents a systematic review of 70 empirical studies across computer science, education, and psychology. Guided by four research questions, we examine: (i) which applications of LLMs in education have been most frequently explored; (ii) how researchers have measured their impact; (iii) which risks stem from such applications; and (iv) what mitigation strategies have been proposed. We find that research on LLMs clusters around three domains: operational effectiveness, personalized applications, and interactive learning tools. Across these, model-level risks include superficial understanding, bias, limited robustness, anthropomorphism, hallucinations, privacy concerns, and knowledge constraints. When learners interact with LLMs, these risks extend to cognitive and behavioural outcomes, including reduced neural activity, over-reliance, diminished independent learning skills, and a loss of student agency. To capture this progression, we propose an LLM-Risk Adapted Learning Model that illustrates how technical risks cascade through interaction and interpretation to shape educational outcomes. As the first synthesis of empirically assessed risks, this review provides a foundation for responsible, human-centred integration of LLMs in education.

Paper Structure

This paper contains 52 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Venn Diagram of LLM Risks
  • Figure 2: PRISMA flow of identification, screening, eligibility, and inclusion.
  • Figure 3: Distribution of reviewed papers across LLM application categories.
  • Figure 4: Subject-area distributions by application: (a) operational effectiveness, (b) personalized applications, (c) interactive learning tools.
  • Figure 5: Descriptive Synthesis of Reviewed Studies
  • ...and 1 more figures