ELLMA-T: an Embodied LLM-agent for Supporting English Language Learning in Social VR

Mengxu Pan; Alexandra Kitson; Hongyu Wan; Mirjana Prpa

ELLMA-T: an Embodied LLM-agent for Supporting English Language Learning in Social VR

Mengxu Pan, Alexandra Kitson, Hongyu Wan, Mirjana Prpa

TL;DR

ELLMA-T is developed, a design probe that integrates an LLM (GPT-4) with an ECA for English language learning in social VR (VRChat), informed by the situated learning framework, and reveals the potential of ELLMA-T to generate realistic, believable, and context-specific role plays for agent-learner interaction in VR.

Abstract

Many people struggle with learning a new language, with traditional tools falling short in providing contextualized learning tailored to each learner's needs. The recent development of large language models (LLMs) and embodied conversational agents (ECAs) in social virtual reality (VR) provide new opportunities to practice language learning in a contextualized and naturalistic way that takes into account the learner's language level and needs. To explore this opportunity, we developed ELLMA-T, an ECA that leverages an LLM (GPT-4) and situated learning framework for supporting learning English language in social VR (VRChat). Drawing on qualitative interviews (N=12), we reveal the potential of ELLMA-T to generate realistic, believable and context-specific role plays for agent-learner interaction in VR, and LLM's capability to provide initial language assessment and continuous feedback to learners. We provide five design implications for the future development of LLM-based language agents in social VR.

ELLMA-T: an Embodied LLM-agent for Supporting English Language Learning in Social VR

TL;DR

Abstract

Paper Structure (61 sections, 4 figures, 3 tables)

This paper contains 61 sections, 4 figures, 3 tables.

Introduction
Related Work
Supporting Situated Language Learning in VR
Embodied Conversational Agents for Language Learning
Potential of LLMs & LLM-agents for Language Learning in Social VR
System Design
Design Principles
Learning English with ELLMA-T in VRChat
Design Principle #1: Introduction and Language Level Assessment
Design Principle #2: Role-Play Topic Generation and Conversation Continuation
Design Principle #3: Communication Strategy and Back-channels
System Implementation
System Architecture
Multi-task Multi-turn Conversation System
Prompt Engineering
...and 46 more sections

Figures (4)

Figure 1: Workflow of conversation tasks performed by ELLMA-T, including greeting the user, conducting language assessments, engaging in role-play scenarios, and providing feedback.
Figure 2: ELLMA-T in different virtual worlds within VRChat: an indoor café (left) and an outdoor city (right).
Figure 3: System Architecture of the ELLMA-T. The architecture highlights the core components and data flow within the system.
Figure 4: Structure of Separate Prompts for Different Tasks. This diagram illustrates how prompts are structured and separated for various tasks within the system. 1) The system prompt establishes the agent's persona across all interactions. 2) Task-specific prompts guide the agent during the introduction, language assessment, role-play, and feedback. 3) A decision prompt helps the agent determine when to transition between tasks. 4) The prompt for providing scaffolding during role-play conversations.

ELLMA-T: an Embodied LLM-agent for Supporting English Language Learning in Social VR

TL;DR

Abstract

ELLMA-T: an Embodied LLM-agent for Supporting English Language Learning in Social VR

Authors

TL;DR

Abstract

Table of Contents

Figures (4)