Table of Contents
Fetching ...

MRPoS: Mixed Reality-Based Robot Navigation Interface Using Spatial Pointing and Speech with Large Language Model

Eduardo Iglesius, Masato Kobayashi, Yuki Uranishi

Abstract

Recent advancements have made robot navigation more intuitive by transitioning from traditional 2D displays to spatially aware Mixed Reality (MR) systems. However, current MR interfaces often rely on manual "air tap" gestures for goal placement, which can be repetitive and physically demanding, especially for beginners. This paper proposes the Mixed Reality-Based Robot Navigation Interface using Spatial Pointing and Speech (MRPoS). This novel framework replaces complex hand gestures with a natural, multimodal interface combining spatial pointing with Large Language Model (LLM)-based speech interaction. By leveraging both information, the system translates verbal intent into navigation goals visualized by MR technology. Comprehensive experiments comparing MRPoS against conventional gesture-based systems demonstrate that our approach significantly reduces task completion time and workload, providing a more accessible and efficient interface. For additional material, please check: https://mertcookimg.github.io/mrpos

MRPoS: Mixed Reality-Based Robot Navigation Interface Using Spatial Pointing and Speech with Large Language Model

Abstract

Recent advancements have made robot navigation more intuitive by transitioning from traditional 2D displays to spatially aware Mixed Reality (MR) systems. However, current MR interfaces often rely on manual "air tap" gestures for goal placement, which can be repetitive and physically demanding, especially for beginners. This paper proposes the Mixed Reality-Based Robot Navigation Interface using Spatial Pointing and Speech (MRPoS). This novel framework replaces complex hand gestures with a natural, multimodal interface combining spatial pointing with Large Language Model (LLM)-based speech interaction. By leveraging both information, the system translates verbal intent into navigation goals visualized by MR technology. Comprehensive experiments comparing MRPoS against conventional gesture-based systems demonstrate that our approach significantly reduces task completion time and workload, providing a more accessible and efficient interface. For additional material, please check: https://mertcookimg.github.io/mrpos
Paper Structure (30 sections, 14 figures, 5 tables, 3 algorithms)

This paper contains 30 sections, 14 figures, 5 tables, 3 algorithms.

Figures (14)

  • Figure 1: MRPoS Workflow. The workflow is divided into 2 phases, which are MR-beacon Generation and Robot Navigation. (1-a, 1-b) shows the MR-beacon generation by using spatial pointer and voice. (1-c) shows the final state after the generation. (2-a, 2-b) shows the robot navigation towards the created MR-beacon. (2-c) shows the final state.
  • Figure 2: System Design Diagram. Our system consists of four main modules: the HoloLens 2, serving as the user interface to acquire, analyze, and visualize MR-beacons; ROS 2, acting as a bridge between the HoloLens 2 and the robot; a Model Server to process user voice data; and the robot itself. Components highlighted in green represent our novel contributions.
  • Figure 3: Hand Menu. (a) The primary layout of the hand menu. (b) The Beacon submenu, accessed via the main menu.
  • Figure 4: Add Function (Multiple Objects). (a) Initial empty environment. (b) User positioning the first MR-beacon candidate. (c) User specifying orientation toward the water bottle via speech by specifying the name. (d, e) User repeating the same process for the coffee machine and flower pot simultaneously. (f) Final state with three generated MR-beacons.
  • Figure 5: Edit Function. (a) Initial state with an MR-beacon. (b) User selecting the MR-beacon via spatial pointing. (c) User relocating it and specifying a new orientation (flower pot) via speech. (d) Final state of the modified MR-beacon.
  • ...and 9 more figures