Table of Contents
Fetching ...

Exploring Context-aware and LLM-driven Locomotion for Immersive Virtual Reality

Süleyman Özdel, Kadir Burak Buldu, Enkelejda Kasneci, Efe Bozkir

TL;DR

This work addresses the challenge of hands-free VR locomotion by introducing a context-aware, natural-language navigation system powered by large language models (LLMs). It benchmarks three approaches—controller teleportation, fixed-command voice steering, and LLM-driven navigation—using eye-tracking data and standardized usability, presence, cybersickness, and workload questionnaires in a town-like VR environment. Across 60 participants in a between-subjects design, the LLM-driven method achieves comparable usability and cybersickness to the baselines while exhibiting distinctive gaze patterns and higher presence, with end-to-end latency under $1.5$ seconds. The results demonstrate that LLM-driven locomotion is a viable, accessible alternative that can accommodate multilingual interaction, though it introduces slightly higher cognitive load and requires robust prompt design and error handling. Overall, the study highlights the potential of natural language, context-aware interaction to enhance accessibility and immersion in VR, and points to avenues for personalization and scalable deployment in future work.

Abstract

Locomotion plays a crucial role in shaping the user experience within virtual reality environments. In particular, hands-free locomotion offers a valuable alternative by supporting accessibility and freeing users from reliance on handheld controllers. To this end, traditional speech-based methods often depend on rigid command sets, limiting the naturalness and flexibility of interaction. In this study, we propose a novel locomotion technique powered by large language models (LLMs), which allows users to navigate virtual environments using natural language with contextual awareness. We evaluate three locomotion methods: controller-based teleportation, voice-based steering, and our language model-driven approach. Our evaluation combines eye-tracking data analysis, including exploratory explainable machine learning analysis with SHAP, and standardized questionnaires (SUS, IPQ, CSQ-VR, NASA-TLX) to examine user experience through both objective gaze-based measures and subjective self-reports of usability, presence, cybersickness, and cognitive load. Our findings show no statistically significant differences in usability, presence, or cybersickness between LLM-driven locomotion and established methods such as teleportation, suggesting its potential as a viable, natural language-based, hands-free alternative. In addition, eye-tracking analysis revealed patterns suggesting tendency toward increased user attention and engagement in the LLM-driven condition. Complementary to these findings, exploratory SHAP analysis revealed that fixation, saccade, and pupil-related features vary across techniques, indicating distinct patterns of visual attention and cognitive processing. Overall, we state that our method can facilitate hands-free locomotion in virtual spaces, especially in supporting accessibility.

Exploring Context-aware and LLM-driven Locomotion for Immersive Virtual Reality

TL;DR

This work addresses the challenge of hands-free VR locomotion by introducing a context-aware, natural-language navigation system powered by large language models (LLMs). It benchmarks three approaches—controller teleportation, fixed-command voice steering, and LLM-driven navigation—using eye-tracking data and standardized usability, presence, cybersickness, and workload questionnaires in a town-like VR environment. Across 60 participants in a between-subjects design, the LLM-driven method achieves comparable usability and cybersickness to the baselines while exhibiting distinctive gaze patterns and higher presence, with end-to-end latency under seconds. The results demonstrate that LLM-driven locomotion is a viable, accessible alternative that can accommodate multilingual interaction, though it introduces slightly higher cognitive load and requires robust prompt design and error handling. Overall, the study highlights the potential of natural language, context-aware interaction to enhance accessibility and immersion in VR, and points to avenues for personalization and scalable deployment in future work.

Abstract

Locomotion plays a crucial role in shaping the user experience within virtual reality environments. In particular, hands-free locomotion offers a valuable alternative by supporting accessibility and freeing users from reliance on handheld controllers. To this end, traditional speech-based methods often depend on rigid command sets, limiting the naturalness and flexibility of interaction. In this study, we propose a novel locomotion technique powered by large language models (LLMs), which allows users to navigate virtual environments using natural language with contextual awareness. We evaluate three locomotion methods: controller-based teleportation, voice-based steering, and our language model-driven approach. Our evaluation combines eye-tracking data analysis, including exploratory explainable machine learning analysis with SHAP, and standardized questionnaires (SUS, IPQ, CSQ-VR, NASA-TLX) to examine user experience through both objective gaze-based measures and subjective self-reports of usability, presence, cybersickness, and cognitive load. Our findings show no statistically significant differences in usability, presence, or cybersickness between LLM-driven locomotion and established methods such as teleportation, suggesting its potential as a viable, natural language-based, hands-free alternative. In addition, eye-tracking analysis revealed patterns suggesting tendency toward increased user attention and engagement in the LLM-driven condition. Complementary to these findings, exploratory SHAP analysis revealed that fixation, saccade, and pupil-related features vary across techniques, indicating distinct patterns of visual attention and cognitive processing. Overall, we state that our method can facilitate hands-free locomotion in virtual spaces, especially in supporting accessibility.

Paper Structure

This paper contains 42 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Top-down view of the virtual environment with the blue ‘X’ marking the start, and pink holograms indicate target locations. The pink hologram on the left corresponds to the first target, whereas the one on the right is the final target location.
  • Figure 2: Results for task completion times.
  • Figure 3: Results for fixation rates.
  • Figure 4: Results for mean fixation duration.
  • Figure 5: Results for mean saccade duration.
  • ...and 6 more figures