Table of Contents
Fetching ...

VirtuWander: Enhancing Multi-modal Interaction for Virtual Tour Guidance through Large Language Models

Zhan Wang, Lin-Ping Yuan, Liangwei Wang, Bingchuan Jiang, Wei Zeng

TL;DR

The paper tackles the challenge of delivering flexible, personalized tour guidance in virtual museums by leveraging multi-modal interactions powered by large language models. It introduces VirtuWander, a two-stage, pack-of-bots system that converts natural language queries into context-aware guidance across voice, avatar, text, and visualization modalities. Through a formative study, a design framework, five feedback designs, three example cases, and a VR user study, the work demonstrates that LLM-enabled multi-modal feedback can enhance engagement, knowledge delivery, and personalization, while highlighting timing and cognitive-load considerations. The results indicate strong potential for real-world deployment and AR extensions, with implications for broader domains such as airports or hospitals and for evolving VR/AR tour guidance design.

Abstract

Tour guidance in virtual museums encourages multi-modal interactions to boost user experiences, concerning engagement, immersion, and spatial awareness. Nevertheless, achieving the goal is challenging due to the complexity of comprehending diverse user needs and accommodating personalized user preferences. Informed by a formative study that characterizes guidance-seeking contexts, we establish a multi-modal interaction design framework for virtual tour guidance. We then design VirtuWander, a two-stage innovative system using domain-oriented large language models to transform user inquiries into diverse guidance-seeking contexts and facilitate multi-modal interactions. The feasibility and versatility of VirtuWander are demonstrated with virtual guiding examples that encompass various touring scenarios and cater to personalized preferences. We further evaluate VirtuWander through a user study within an immersive simulated museum. The results suggest that our system enhances engaging virtual tour experiences through personalized communication and knowledgeable assistance, indicating its potential for expanding into real-world scenarios.

VirtuWander: Enhancing Multi-modal Interaction for Virtual Tour Guidance through Large Language Models

TL;DR

The paper tackles the challenge of delivering flexible, personalized tour guidance in virtual museums by leveraging multi-modal interactions powered by large language models. It introduces VirtuWander, a two-stage, pack-of-bots system that converts natural language queries into context-aware guidance across voice, avatar, text, and visualization modalities. Through a formative study, a design framework, five feedback designs, three example cases, and a VR user study, the work demonstrates that LLM-enabled multi-modal feedback can enhance engagement, knowledge delivery, and personalization, while highlighting timing and cognitive-load considerations. The results indicate strong potential for real-world deployment and AR extensions, with implications for broader domains such as airports or hospitals and for evolving VR/AR tour guidance design.

Abstract

Tour guidance in virtual museums encourages multi-modal interactions to boost user experiences, concerning engagement, immersion, and spatial awareness. Nevertheless, achieving the goal is challenging due to the complexity of comprehending diverse user needs and accommodating personalized user preferences. Informed by a formative study that characterizes guidance-seeking contexts, we establish a multi-modal interaction design framework for virtual tour guidance. We then design VirtuWander, a two-stage innovative system using domain-oriented large language models to transform user inquiries into diverse guidance-seeking contexts and facilitate multi-modal interactions. The feasibility and versatility of VirtuWander are demonstrated with virtual guiding examples that encompass various touring scenarios and cater to personalized preferences. We further evaluate VirtuWander through a user study within an immersive simulated museum. The results suggest that our system enhances engaging virtual tour experiences through personalized communication and knowledgeable assistance, indicating its potential for expanding into real-world scenarios.
Paper Structure (35 sections, 8 figures)

This paper contains 35 sections, 8 figures.

Figures (8)

  • Figure 1: Design framework for LLM-based multi-modal feedback within various guidance-seeking contexts in virtual tour experiences.
  • Figure 2: Five common multi-modal feedback combinations summarized from our design framework.
  • Figure 3: Implementation of VirtuWander: (a) multi-modal feedback combinations for common guidance-seeking contexts and (b) a simulated virtual reality museum.
  • Figure 4: VirtuWander is a voice-controlled tour guidance system with a two-stage framework: 1) context identification stage converts visitor natural language input into various guidance-seeking contexts, and 2) feedback generation stage generates multi-modal feedback combinations based on task-specific LLM responses.
  • Figure 5: Examples for (A) inputs and outputs of our two-stage pack-of-bots strategy and (B) prompt techniques of the first-level bots and (C) the second-level bots.
  • ...and 3 more figures