GenAIReading: Augmenting Human Cognition with Interactive Digital Textbooks Using Large Language Models and Image Generation Models
Ryugo Morita, Ko Watanabe, Jinjia Zhou, Andreas Dengel, Shoya Ishimaru
TL;DR
The paper investigates cognitive augmentation in education by augmenting digital textbooks with Generative AI tools. It introduces a two-phase pipeline combining LLM-generated text summaries with image generation and a Summary Image Selector to produce visually aligned content, evaluated via eye-tracking and post-reading tests. Results show that AI-generated text summaries, images, and especially image-based summaries significantly improve learning outcomes, with gains up to 7.50% and effects moderated by learners' preferences for text or visuals. The work demonstrates the potential of adaptive, multimodal GenAI-enabled textbooks and provides design guidance for personalized educational tools.
Abstract
Cognitive augmentation is a cornerstone in advancing education, particularly through personalized learning. However, personalizing extensive textual materials, such as narratives and academic textbooks, remains challenging due to their heavy use, which can hinder learner engagement and understanding. Building on cognitive theories like Dual Coding Theory -- which posits that combining textual and visual information enhances comprehension and memory -- this study explores the potential of Generative AI (GenAI) to enrich educational materials. We utilized large language models (LLMs) to generate concise text summaries and image generation models (IGMs) to create visually aligned content from textual inputs. After recruiting 24 participants, we verified that integrating AI-generated supplementary materials significantly improved learning outcomes, increasing post-reading test scores by 7.50%. These findings underscore GenAI's transformative potential in creating adaptive learning environments that enhance cognitive augmentation.
