Table of Contents
Fetching ...

GenAIReading: Augmenting Human Cognition with Interactive Digital Textbooks Using Large Language Models and Image Generation Models

Ryugo Morita, Ko Watanabe, Jinjia Zhou, Andreas Dengel, Shoya Ishimaru

TL;DR

The paper investigates cognitive augmentation in education by augmenting digital textbooks with Generative AI tools. It introduces a two-phase pipeline combining LLM-generated text summaries with image generation and a Summary Image Selector to produce visually aligned content, evaluated via eye-tracking and post-reading tests. Results show that AI-generated text summaries, images, and especially image-based summaries significantly improve learning outcomes, with gains up to 7.50% and effects moderated by learners' preferences for text or visuals. The work demonstrates the potential of adaptive, multimodal GenAI-enabled textbooks and provides design guidance for personalized educational tools.

Abstract

Cognitive augmentation is a cornerstone in advancing education, particularly through personalized learning. However, personalizing extensive textual materials, such as narratives and academic textbooks, remains challenging due to their heavy use, which can hinder learner engagement and understanding. Building on cognitive theories like Dual Coding Theory -- which posits that combining textual and visual information enhances comprehension and memory -- this study explores the potential of Generative AI (GenAI) to enrich educational materials. We utilized large language models (LLMs) to generate concise text summaries and image generation models (IGMs) to create visually aligned content from textual inputs. After recruiting 24 participants, we verified that integrating AI-generated supplementary materials significantly improved learning outcomes, increasing post-reading test scores by 7.50%. These findings underscore GenAI's transformative potential in creating adaptive learning environments that enhance cognitive augmentation.

GenAIReading: Augmenting Human Cognition with Interactive Digital Textbooks Using Large Language Models and Image Generation Models

TL;DR

The paper investigates cognitive augmentation in education by augmenting digital textbooks with Generative AI tools. It introduces a two-phase pipeline combining LLM-generated text summaries with image generation and a Summary Image Selector to produce visually aligned content, evaluated via eye-tracking and post-reading tests. Results show that AI-generated text summaries, images, and especially image-based summaries significantly improve learning outcomes, with gains up to 7.50% and effects moderated by learners' preferences for text or visuals. The work demonstrates the potential of adaptive, multimodal GenAI-enabled textbooks and provides design guidance for personalized educational tools.

Abstract

Cognitive augmentation is a cornerstone in advancing education, particularly through personalized learning. However, personalizing extensive textual materials, such as narratives and academic textbooks, remains challenging due to their heavy use, which can hinder learner engagement and understanding. Building on cognitive theories like Dual Coding Theory -- which posits that combining textual and visual information enhances comprehension and memory -- this study explores the potential of Generative AI (GenAI) to enrich educational materials. We utilized large language models (LLMs) to generate concise text summaries and image generation models (IGMs) to create visually aligned content from textual inputs. After recruiting 24 participants, we verified that integrating AI-generated supplementary materials significantly improved learning outcomes, increasing post-reading test scores by 7.50%. These findings underscore GenAI's transformative potential in creating adaptive learning environments that enhance cognitive augmentation.

Paper Structure

This paper contains 33 sections, 2 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Architecture of the generation flow of the story text summary using LLMs (ChatGPT). The input consists of the generated story from the story generation phase and constraint prompts, which guide the summary generation. The constraint prompts control parameters such as word count, ensuring that the summary is concise and adheres to the specified length and content requirements for effective summarization.
  • Figure 2: Architecture of the selection flow of the summary image selector. The input includes the story and the generated images, which are processed to select five key summary images. The text and images are segmented and fed into the Summary Image Selector to calculate the highest similarity score in each segment, which is chosen as the summary image.
  • Figure 3: Architecture of the generation flow of the questions using LLMs (ChatGPT). The input is generated story from the story generation phase to tailor questions to align with the story content and constraint prompts. The prompts define question types, such as multiple-choice or open-ended, and determine the focus areas like numerical values or narrative comprehension, ensuring the output is formatted appropriately for further use.
  • Figure 4: User interface of the web application showing the four reading conditions: (a) "Baseline", (b) "IGenAI Image", (c) "TGenAI summary", and (d) "IGenAI Summary".
  • Figure 5: Experiment workflow. Calibration refers to an eye-tracker, the process of estimating the geometric characteristics of a subject's eyes. The post-reading test provides ten questions for evaluating reading comprehension and memory retention of the provided reading conditions.
  • ...and 5 more figures