Table of Contents
Fetching ...

Practicing a Second Language Without Fear: Mixed Reality Agents for Interactive Group Conversation

Mariana Fernandez-Espinosa, Kai Zhang, Jad Bendarkawi, Ashley Ponce, Sean Chidozie Mata, Aminah Aliu, Lei Zhang, Francisco Fernandez Medina, Elena Mangione-Lora, Andres Monroy-Hernandez, Diego Gomez-Zara

TL;DR

ConversAR integrates Mixed Reality with Generative AI to support grounded, group-based second language practice. Grounded in SLA theory and expert formative work, it enables avatar-led group conversations with real-world scene recognition and dynamic 3D props to scaffold speaking, feedback, and engagement. In a mixed-methods study with 21 L2 learners and 6 SLA experts, ConversAR increased willingness to communicate and provided a perceived safe space for practice, while highlighting challenges in timing, feedback delivery, and prop relevance. The work demonstrates how grounded, adaptive AI agents can augment language learning in group contexts and offers design directions for scalable, context-aware MR language tools.

Abstract

Developing speaking proficiency in a second language can be cognitively demanding and emotionally taxing, often triggering fear of making mistakes or being excluded from larger groups. While current learning tools show promise for speaking practice, most focus on dyadic, scripted scenarios, limiting opportunities for dynamic group interactions. To address this gap, we present ConversAR, a Mixed Reality system that leverages Generative AI and XR to support situated and personalized group conversations. It integrates embodied AI agents, scene recognition, and generative 3D props anchored to real-world surroundings. Based on a formative study with experts in language acquisition, we developed and tested this system with a user study with 21 second-language learners. Results indicate that the system enhanced learner engagement, increased willingness to communicate, and offered a safe space for speaking. We discuss the implications for integrating Generative AI and XR into the design of future language learning applications.

Practicing a Second Language Without Fear: Mixed Reality Agents for Interactive Group Conversation

TL;DR

ConversAR integrates Mixed Reality with Generative AI to support grounded, group-based second language practice. Grounded in SLA theory and expert formative work, it enables avatar-led group conversations with real-world scene recognition and dynamic 3D props to scaffold speaking, feedback, and engagement. In a mixed-methods study with 21 L2 learners and 6 SLA experts, ConversAR increased willingness to communicate and provided a perceived safe space for practice, while highlighting challenges in timing, feedback delivery, and prop relevance. The work demonstrates how grounded, adaptive AI agents can augment language learning in group contexts and offers design directions for scalable, context-aware MR language tools.

Abstract

Developing speaking proficiency in a second language can be cognitively demanding and emotionally taxing, often triggering fear of making mistakes or being excluded from larger groups. While current learning tools show promise for speaking practice, most focus on dyadic, scripted scenarios, limiting opportunities for dynamic group interactions. To address this gap, we present ConversAR, a Mixed Reality system that leverages Generative AI and XR to support situated and personalized group conversations. It integrates embodied AI agents, scene recognition, and generative 3D props anchored to real-world surroundings. Based on a formative study with experts in language acquisition, we developed and tested this system with a user study with 21 second-language learners. Results indicate that the system enhanced learner engagement, increased willingness to communicate, and offered a safe space for speaking. We discuss the implications for integrating Generative AI and XR into the design of future language learning applications.

Paper Structure

This paper contains 77 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: ConversAR enables second language learners to engage in group conversations with embodied AI agents tailored to their proficiency level and personal interests. The system provides real-time corrective feedback and grounds conversations in the learner’s physical environment by recognizing real-world objects (e.g., plant, calendar, notebook). It also dynamically generates 3D digital props (e.g., speaker) informed by realia-based pedagogical theory. These tangible, contextual objects serve as conversational triggers that foster deeper oral expression, sustained interaction, and meaningful language use.
  • Figure 2: ConversAR interaction flow illustrating how the system assesses language proficiency and interests, initiates group conversations grounded in the physical environment, generates contextual 3D props, and delivers real-time corrective feedback.
  • Figure 3: System Overview of ConversAR. (a) Learners begin with a 1-on-1 warm-up conversation to assess language proficiency and personal interests. (b) The system detects real-world objects and generates a contextual scene description. (c) Based on the learner’s environment and language level, an adapted group conversation with AI agents unfolds, referencing physical objects and interests. (d, e)ConversAR dynamically generates 3D digital props grounded in realia pedagogical theory. These objects serve as tangible conversation anchors, fostering language development and deeper engagement.
  • Figure 4: Example of ConversAR conversation with corrective feedback and 3D object generation.
  • Figure 5: Communicative Effectiveness Index (CETI) Survey Results.
  • ...and 2 more figures