Table of Contents
Fetching ...

RetAssist: Facilitating Vocabulary Learners with Generative Images in Story Retelling Practices

Qiaoyi Chen, Siyu Liu, Kaihui Huang, Xingbo Wang, Xiaojuan Ma, Junkai Zhu, Zhenhui Peng

TL;DR

RetAssist addresses vocabulary learning for ESL learners by pairing story based input with generated images to reduce cognitive load and improve recall of target word usage. The authors build a sentence-level image generation workflow using Stable-Diffusion-v1-5, CLIP similarity, and cartoon style transfer guided by CTML and BDCT principles. They validate the approach with a within-subjects study (N=24) comparing RetAssist to a baseline, finding gains in fluency and positive user perceptions, and derive five design principles to guide future systems. The work demonstrates the feasibility and educational value of integrating generative AIs into vocabulary practice and outlines broader implications for AI assisted education.

Abstract

Reading and repeatedly retelling a short story is a common and effective approach to learning the meanings and usages of target words. However, learners often struggle with comprehending, recalling, and retelling the story contexts of these target words. Inspired by the Cognitive Theory of Multimedia Learning, we propose a computational workflow to generate relevant images paired with stories. Based on the workflow, we work with learners and teachers to iteratively design an interactive vocabulary learning system named RetAssist. It can generate sentence-level images of a story to facilitate the understanding and recall of the target words in the story retelling practices. Our within-subjects study (N=24) shows that compared to a baseline system without generative images, RetAssist significantly improves learners' fluency in expressing with target words. Participants also feel that RetAssist eases their learning workload and is more useful. We discuss insights into leveraging text-to-image generative models to support learning tasks.

RetAssist: Facilitating Vocabulary Learners with Generative Images in Story Retelling Practices

TL;DR

RetAssist addresses vocabulary learning for ESL learners by pairing story based input with generated images to reduce cognitive load and improve recall of target word usage. The authors build a sentence-level image generation workflow using Stable-Diffusion-v1-5, CLIP similarity, and cartoon style transfer guided by CTML and BDCT principles. They validate the approach with a within-subjects study (N=24) comparing RetAssist to a baseline, finding gains in fluency and positive user perceptions, and derive five design principles to guide future systems. The work demonstrates the feasibility and educational value of integrating generative AIs into vocabulary practice and outlines broader implications for AI assisted education.

Abstract

Reading and repeatedly retelling a short story is a common and effective approach to learning the meanings and usages of target words. However, learners often struggle with comprehending, recalling, and retelling the story contexts of these target words. Inspired by the Cognitive Theory of Multimedia Learning, we propose a computational workflow to generate relevant images paired with stories. Based on the workflow, we work with learners and teachers to iteratively design an interactive vocabulary learning system named RetAssist. It can generate sentence-level images of a story to facilitate the understanding and recall of the target words in the story retelling practices. Our within-subjects study (N=24) shows that compared to a baseline system without generative images, RetAssist significantly improves learners' fluency in expressing with target words. Participants also feel that RetAssist eases their learning workload and is more useful. We discuss insights into leveraging text-to-image generative models to support learning tasks.
Paper Structure (36 sections, 16 figures)

This paper contains 36 sections, 16 figures.

Figures (16)

  • Figure 1: Our design and development process of RetAssist with English teachers and ESL learners.
  • Figure 2: Our computational workflow of generating relevant images for stories.
  • Figure 3: Given sentences of an example story as input, we compare images generated by our computational workflow with those generated by two alternatives. [Ours (sentence-level, sentence-based)] A1-A4: Images generated using the preprocessed sentences as prompts. B1-B4: Cartoon stylization of A1-A4. [Alternative-2 (sentence-level, keyword-based)] C1-C4: Images generated using the keywords (bold words in the preprocessed sentences of the example story) corresponding to the preprocessed sentences as prompts. D1-D4: Cartoon stylization of C1-C4. [Alternative-1 (story-level)] E: Images generated using the entire story as a prompt. F: Cartoon stylization of E.
  • Figure 4: Means and Standard Errors of human ratings on the quality of generative images; 1/5 - strongly disagree/agree; *: $p$ < .05 using paired samples Wilcoxon signed rank tests. We compare Alternative-1 (story-level) with Ours (sentence-level) on the images’ relevance (R) to the story, visual quality (VQ), and effectiveness in aiding story comprehension (E-1) and recall (E-2).
  • Figure 5: Means and Standard Errors of human ratings on the quality of generative images; 1/5 - strongly disagree/agree; *: $p$ < .05 using paired samples Wilcoxon signed rank tests. We compare Alternative-2 (keyword-based) with Ours (sentence-based) on the images’ relevance (R) to the story, visual quality (VQ), and effectiveness in aiding story comprehension (E-1) and recall (E-2).
  • ...and 11 more figures