Table of Contents
Fetching ...

Beyond Text-to-Text: An Overview of Multimodal and Generative Artificial Intelligence for Education Using Topic Modeling

Ville Heilala, Roberto Araya, Raija Hämäläinen

TL;DR

This study investigates the landscape of multimodal and generative AI in education by applying topic modeling to a large corpus of 4175 articles. Using a BERTopic-based pipeline with 768-dimensional sentence embeddings, UMAP, and HDBSCAN, the authors extract 38 interpretable topics organized into 14 thematic areas, highlighting a predominant focus on text-to-text LLMs such as OpenAI ChatGPT. The results expose a gap in multimodal research, with text-to-speech and other modalities underexplored despite their potential for personalized learning, problem solving, and creativity. The work provides a data-driven roadmap for researchers and policymakers to broaden modality coverage, develop educational multimodal foundational approaches, and empower educators to experiment with diverse AI-enabled learning technologies.

Abstract

Generative artificial intelligence (GenAI) can reshape education and learning. While large language models (LLMs) like ChatGPT dominate current educational research, multimodal capabilities, such as text-to-speech and text-to-image, are less explored. This study uses topic modeling to map the research landscape of multimodal and generative AI in education. An extensive literature search using Dimensions yielded 4175 articles. Employing a topic modeling approach, latent topics were extracted, resulting in 38 interpretable topics organized into 14 thematic areas. Findings indicate a predominant focus on text-to-text models in educational contexts, with other modalities underexplored, overlooking the broader potential of multimodal approaches. The results suggest a research gap, stressing the importance of more balanced attention across different AI modalities and educational levels. In summary, this research provides an overview of current trends in generative AI for education, underlining opportunities for future exploration of multimodal technologies to fully realize the transformative potential of artificial intelligence in education.

Beyond Text-to-Text: An Overview of Multimodal and Generative Artificial Intelligence for Education Using Topic Modeling

TL;DR

This study investigates the landscape of multimodal and generative AI in education by applying topic modeling to a large corpus of 4175 articles. Using a BERTopic-based pipeline with 768-dimensional sentence embeddings, UMAP, and HDBSCAN, the authors extract 38 interpretable topics organized into 14 thematic areas, highlighting a predominant focus on text-to-text LLMs such as OpenAI ChatGPT. The results expose a gap in multimodal research, with text-to-speech and other modalities underexplored despite their potential for personalized learning, problem solving, and creativity. The work provides a data-driven roadmap for researchers and policymakers to broaden modality coverage, develop educational multimodal foundational approaches, and empower educators to experiment with diverse AI-enabled learning technologies.

Abstract

Generative artificial intelligence (GenAI) can reshape education and learning. While large language models (LLMs) like ChatGPT dominate current educational research, multimodal capabilities, such as text-to-speech and text-to-image, are less explored. This study uses topic modeling to map the research landscape of multimodal and generative AI in education. An extensive literature search using Dimensions yielded 4175 articles. Employing a topic modeling approach, latent topics were extracted, resulting in 38 interpretable topics organized into 14 thematic areas. Findings indicate a predominant focus on text-to-text models in educational contexts, with other modalities underexplored, overlooking the broader potential of multimodal approaches. The results suggest a research gap, stressing the importance of more balanced attention across different AI modalities and educational levels. In summary, this research provides an overview of current trends in generative AI for education, underlining opportunities for future exploration of multimodal technologies to fully realize the transformative potential of artificial intelligence in education.
Paper Structure (9 sections, 1 equation, 5 figures, 2 tables)

This paper contains 9 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Topic modeling process
  • Figure 2: The number ($n \geq 2$) of titles/abstracts mentioning different keywords in the corpus
  • Figure 3: Best topic modeling solution for the number of clusters between ]1, 150[
  • Figure 4: A parallel coordinate plot depicting the hyperparameter space (n=5000) and the selected parameters. One line represents one set of parameters.
  • Figure 5: A thematic map of research on generative AI in education based on the topic modeling results