RetAssist: Facilitating Vocabulary Learners with Generative Images in Story Retelling Practices

Qiaoyi Chen; Siyu Liu; Kaihui Huang; Xingbo Wang; Xiaojuan Ma; Junkai Zhu; Zhenhui Peng

RetAssist: Facilitating Vocabulary Learners with Generative Images in Story Retelling Practices

Qiaoyi Chen, Siyu Liu, Kaihui Huang, Xingbo Wang, Xiaojuan Ma, Junkai Zhu, Zhenhui Peng

TL;DR

RetAssist addresses vocabulary learning for ESL learners by pairing story based input with generated images to reduce cognitive load and improve recall of target word usage. The authors build a sentence-level image generation workflow using Stable-Diffusion-v1-5, CLIP similarity, and cartoon style transfer guided by CTML and BDCT principles. They validate the approach with a within-subjects study (N=24) comparing RetAssist to a baseline, finding gains in fluency and positive user perceptions, and derive five design principles to guide future systems. The work demonstrates the feasibility and educational value of integrating generative AIs into vocabulary practice and outlines broader implications for AI assisted education.

Abstract

Reading and repeatedly retelling a short story is a common and effective approach to learning the meanings and usages of target words. However, learners often struggle with comprehending, recalling, and retelling the story contexts of these target words. Inspired by the Cognitive Theory of Multimedia Learning, we propose a computational workflow to generate relevant images paired with stories. Based on the workflow, we work with learners and teachers to iteratively design an interactive vocabulary learning system named RetAssist. It can generate sentence-level images of a story to facilitate the understanding and recall of the target words in the story retelling practices. Our within-subjects study (N=24) shows that compared to a baseline system without generative images, RetAssist significantly improves learners' fluency in expressing with target words. Participants also feel that RetAssist eases their learning workload and is more useful. We discuss insights into leveraging text-to-image generative models to support learning tasks.

RetAssist: Facilitating Vocabulary Learners with Generative Images in Story Retelling Practices

TL;DR

Abstract

Paper Structure (36 sections, 16 figures)

This paper contains 36 sections, 16 figures.

Introduction
Related Work
Story Retelling for Vocabulary Learning
Vocabulary Learning Systems
Text-to-Image Generation Techniques
Design Process
Developing a Computational Workflow for Text-to-Image Generation
Evaluating the Feasibility of Generative Images for Story Retelling Support
Alternative approaches
Preparing target word sets and short stories
Procedure and Results
Exploring Design Principles of RetAssist
Process of exploring design principles
Design principles
RetAssist System Design and Implementation
...and 21 more sections

Figures (16)

Figure 1: Our design and development process of RetAssist with English teachers and ESL learners.
Figure 2: Our computational workflow of generating relevant images for stories.
Figure 3: Given sentences of an example story as input, we compare images generated by our computational workflow with those generated by two alternatives. [Ours (sentence-level, sentence-based)] A1-A4: Images generated using the preprocessed sentences as prompts. B1-B4: Cartoon stylization of A1-A4. [Alternative-2 (sentence-level, keyword-based)] C1-C4: Images generated using the keywords (bold words in the preprocessed sentences of the example story) corresponding to the preprocessed sentences as prompts. D1-D4: Cartoon stylization of C1-C4. [Alternative-1 (story-level)] E: Images generated using the entire story as a prompt. F: Cartoon stylization of E.
Figure 4: Means and Standard Errors of human ratings on the quality of generative images; 1/5 - strongly disagree/agree; *: $p$ < .05 using paired samples Wilcoxon signed rank tests. We compare Alternative-1 (story-level) with Ours (sentence-level) on the images’ relevance (R) to the story, visual quality (VQ), and effectiveness in aiding story comprehension (E-1) and recall (E-2).
Figure 5: Means and Standard Errors of human ratings on the quality of generative images; 1/5 - strongly disagree/agree; *: $p$ < .05 using paired samples Wilcoxon signed rank tests. We compare Alternative-2 (keyword-based) with Ours (sentence-based) on the images’ relevance (R) to the story, visual quality (VQ), and effectiveness in aiding story comprehension (E-1) and recall (E-2).
...and 11 more figures

RetAssist: Facilitating Vocabulary Learners with Generative Images in Story Retelling Practices

TL;DR

Abstract

RetAssist: Facilitating Vocabulary Learners with Generative Images in Story Retelling Practices

Authors

TL;DR

Abstract

Table of Contents

Figures (16)