A review on the use of large language models as virtual tutors
Silvia García-Méndez, Francisco de Arriba-Pérez, María del Carmen Somoza-López
TL;DR
This paper surveys the use of large language models (LLMs) as virtual tutors in education, focusing on systems designed to generate and assess educational content with involvement from students or teachers. It adopts a two-step methodology—data gathering from Google Scholar and strict screening—to identify relevant works post-2020, culminating in 29 eligible studies. The analysis shows BERT, GPT-3, T5, and GPT-3.5 as common models, with virtual assistants and question generation as the dominant tasks; the review also assesses reproducibility and human-in-the-loop involvement, noting limited code availability in many studies. Overall, LLM-based educational tools represent a rapidly growing, high-potential area with important considerations around ethics, transparency, and curriculum integration as newer models like GPT-4 enter the field.
Abstract
Transformer architectures contribute to managing long-term dependencies for Natural Language Processing, representing one of the most recent changes in the field. These architectures are the basis of the innovative, cutting-edge Large Language Models (LLMs) that have produced a huge buzz in several fields and industrial sectors, among the ones education stands out. Accordingly, these generative Artificial Intelligence-based solutions have directed the change in techniques and the evolution in educational methods and contents, along with network infrastructure, towards high-quality learning. Given the popularity of LLMs, this review seeks to provide a comprehensive overview of those solutions designed specifically to generate and evaluate educational materials and which involve students and teachers in their design or experimental plan. To the best of our knowledge, this is the first review of educational applications (e.g., student assessment) of LLMs. As expected, the most common role of these systems is as virtual tutors for automatic question generation. Moreover, the most popular models are GTP-3 and BERT. However, due to the continuous launch of new generative models, new works are expected to be published shortly.
