Table of Contents
Fetching ...

Design of Seamless Multi-modal Interaction Framework for Intelligent Virtual Agents in Wearable Mixed Reality Environment

Ghazanfar Ali, Hong-Quan Le, Junho Kim, Seoung-won Hwang, Jae-In Hwang

TL;DR

The paper addresses delivering engaging, non-repetitive content through intelligent virtual agents in wearable mixed reality for venues like museums and botanical gardens. It presents a modular framework that fuses spatial mapping, gaze-based interaction, speech, object recognition, cloud-based chatbot services, and expressive avatar animation to create seamless experiences on resource-constrained devices. Key contributions include the design and implementation of the virtual agent framework, explicit mapping of speech content to body animations and facial emotions, and a scalable anchor-map approach to extend the MR workspace. Empirical demonstration in a botanical garden scenario shows interactive response times around 2–4 seconds after user queries (5–8 seconds total including perception and network latency), highlighting the practicality and adaptability of the approach for diverse MR devices and applications. The work suggests that cloud-enabled multimodal virtual agents can significantly enhance real-world MR experiences by combining realism, responsiveness, and flexibility across applications.

Abstract

In this paper, we present the design of a multimodal interaction framework for intelligent virtual agents in wearable mixed reality environments, especially for interactive applications at museums, botanical gardens, and similar places. These places need engaging and no-repetitive digital content delivery to maximize user involvement. An intelligent virtual agent is a promising mode for both purposes. Premises of framework is wearable mixed reality provided by MR devices supporting spatial mapping. We envisioned a seamless interaction framework by integrating potential features of spatial mapping, virtual character animations, speech recognition, gazing, domain-specific chatbot and object recognition to enhance virtual experiences and communication between users and virtual agents. By applying a modular approach and deploying computationally intensive modules on cloud-platform, we achieved a seamless virtual experience in a device with limited resources. Human-like gaze and speech interaction with a virtual agent made it more interactive. Automated mapping of body animations with the content of a speech made it more engaging. In our tests, the virtual agents responded within 2-4 seconds after the user query. The strength of the framework is flexibility and adaptability. It can be adapted to any wearable MR device supporting spatial mapping.

Design of Seamless Multi-modal Interaction Framework for Intelligent Virtual Agents in Wearable Mixed Reality Environment

TL;DR

The paper addresses delivering engaging, non-repetitive content through intelligent virtual agents in wearable mixed reality for venues like museums and botanical gardens. It presents a modular framework that fuses spatial mapping, gaze-based interaction, speech, object recognition, cloud-based chatbot services, and expressive avatar animation to create seamless experiences on resource-constrained devices. Key contributions include the design and implementation of the virtual agent framework, explicit mapping of speech content to body animations and facial emotions, and a scalable anchor-map approach to extend the MR workspace. Empirical demonstration in a botanical garden scenario shows interactive response times around 2–4 seconds after user queries (5–8 seconds total including perception and network latency), highlighting the practicality and adaptability of the approach for diverse MR devices and applications. The work suggests that cloud-enabled multimodal virtual agents can significantly enhance real-world MR experiences by combining realism, responsiveness, and flexibility across applications.

Abstract

In this paper, we present the design of a multimodal interaction framework for intelligent virtual agents in wearable mixed reality environments, especially for interactive applications at museums, botanical gardens, and similar places. These places need engaging and no-repetitive digital content delivery to maximize user involvement. An intelligent virtual agent is a promising mode for both purposes. Premises of framework is wearable mixed reality provided by MR devices supporting spatial mapping. We envisioned a seamless interaction framework by integrating potential features of spatial mapping, virtual character animations, speech recognition, gazing, domain-specific chatbot and object recognition to enhance virtual experiences and communication between users and virtual agents. By applying a modular approach and deploying computationally intensive modules on cloud-platform, we achieved a seamless virtual experience in a device with limited resources. Human-like gaze and speech interaction with a virtual agent made it more interactive. Automated mapping of body animations with the content of a speech made it more engaging. In our tests, the virtual agents responded within 2-4 seconds after the user query. The strength of the framework is flexibility and adaptability. It can be adapted to any wearable MR device supporting spatial mapping.

Paper Structure

This paper contains 11 sections, 13 figures, 1 table.

Figures (13)

  • Figure 1: Scenario usage of the framework.
  • Figure 2: Overall architecture.
  • Figure 3: Process to get recognized object information
  • Figure 4: Gaze and Speech interaction architecture
  • Figure 5: Chatbot Architecture
  • ...and 8 more figures