Table of Contents
Fetching ...

Large Language Models in Teaching and Learning: Reflections on Implementing an AI Chatbot in Higher Education

Fiammetta Caccavale, Carina L. Gargalo, Julian Kager, Magdalena Skowyra, Steen Larsen, Krist V. Gernaey, Ulrich Krühne

Abstract

The landscape of education is changing rapidly, shaped by emerging pedagogical approaches, technological innovations such as artificial intelligence (AI), and evolving societal expectations, all of which demand thorough evaluation of new educational tools. Although large language models (LLMs) present substantial opportunities especially in Higher Education, their propensity to generate hallucinations and their limited specialized knowledge may introduce significant risks. This study aims to address these risks by examining the practical implementation of an LLM-enhanced assistant in a university level course. We implemented a generative AI assistant grounded in a retrieval-augmented generation (RAG) model to replicate a previously teacher-led, time-intensive exercise. To assess the effectiveness of the LLM, we conducted three separate experiments through iterative mixed-methods approaches, including a crossover design. The resulting data address central research questions related to student motivation, perceived differences between engaging with the LLM versus a human teacher, the quality of AI-generated responses, and the impact of the LLM on students' academic performance. The results offer direct insights into students' views and the pedagogical feasibility of embedding LLMs into specialized courses. Finally, we discuss the main challenges, opportunities and future directions of LLMs in teaching and learning in Higher Education.

Large Language Models in Teaching and Learning: Reflections on Implementing an AI Chatbot in Higher Education

Abstract

The landscape of education is changing rapidly, shaped by emerging pedagogical approaches, technological innovations such as artificial intelligence (AI), and evolving societal expectations, all of which demand thorough evaluation of new educational tools. Although large language models (LLMs) present substantial opportunities especially in Higher Education, their propensity to generate hallucinations and their limited specialized knowledge may introduce significant risks. This study aims to address these risks by examining the practical implementation of an LLM-enhanced assistant in a university level course. We implemented a generative AI assistant grounded in a retrieval-augmented generation (RAG) model to replicate a previously teacher-led, time-intensive exercise. To assess the effectiveness of the LLM, we conducted three separate experiments through iterative mixed-methods approaches, including a crossover design. The resulting data address central research questions related to student motivation, perceived differences between engaging with the LLM versus a human teacher, the quality of AI-generated responses, and the impact of the LLM on students' academic performance. The results offer direct insights into students' views and the pedagogical feasibility of embedding LLMs into specialized courses. Finally, we discuss the main challenges, opportunities and future directions of LLMs in teaching and learning in Higher Education.
Paper Structure (26 sections, 6 figures, 16 tables)

This paper contains 26 sections, 6 figures, 16 tables.

Figures (6)

  • Figure 1: Responses to surveys investigating the perceptions of students that performed the exercise with a teacher or with the LLM-enhanced assistant. Results are compared between 2024 and 2025. Responses to the research question (a) "How happy are you with your experience?" (RQ1); (b) "After having performed the audit, what do you think of the quality of the answers?" (RQ2); (c) "Would you recommend doing the audit with your auditee (teacher or chatbot) to other students?" (RQ3); and, (d) "Do you think this is the future of the audit exercise in this course?" (RQ4). RQ1, RQ2 and RQ3 are on a Likert scale in the range 1-5, from lowest to highest. One response is excluded because not relevant.
  • Figure 2: New graphical interface of the AI assistant.
  • Figure 3: Responses to surveys investigating the perceptions of students that performed the exercise with a teacher or with the LLM-enhanced assistant. Results are reported for the crossover study performed with six students, divided in two groups of three students each, in the fall semester of 2025.
  • Figure 4: Stress perception of the students when performing the audit with the teacher and AI assistant as auditees. Results are on a Likert scale between 1 (not stressful at all) and 5 (very stressful). Average stress with the teacher is 2, with the AI assistant is 1.8 (p-value=0.485).
  • Figure 5: Students' perception on the speech-to-text and text-to-speech implementations in the chatbot. Results of the second graph are presented on a Likert scale from 1 (not useful at all) to 5 (very useful).
  • ...and 1 more figures