Table of Contents
Fetching ...

LLM-Driven Augmented Reality Puppeteer: Controller-Free Voice-Commanded Robot Teleoperation

Yuchong Zhang, Bastian Orthmann, Michael C. Welle, Jonne Van Haastregt, Danica Kragic

TL;DR

The paper proposes a controller-free, LLM-driven voice-commanded AR puppeteering system for real-time robot teleoperation, implemented on the Meta Quest 3 and mirroring virtual robot motions to a physical Franka arm. It integrates RealtimeSTT for speech-to-text, locally hosted Llama 3.2 1B-Instruct Q6 models for reasoning, and a UDP OSC pipeline to execute validated commands, enabling hands-free control via voice within AR. By removing physical controllers and leveraging AR visualization, the approach aims to improve accessibility, safety, and immersion in HRI, building on prior controller-based AR puppeteering with a more natural interface. A preliminary user demonstration validates core functionality, and future work will focus on robust voice recognition, multimodal gestures, dynamic trajectory planning, and comprehensive user studies to assess usability and performance.

Abstract

The integration of robotics and augmented reality (AR) presents transformative opportunities for advancing human-robot interaction (HRI) by improving usability, intuitiveness, and accessibility. This work introduces a controller-free, LLM-driven voice-commanded AR puppeteering system, enabling users to teleoperate a robot by manipulating its virtual counterpart in real time. By leveraging natural language processing (NLP) and AR technologies, our system -- prototyped using Meta Quest 3 -- eliminates the need for physical controllers, enhancing ease of use while minimizing potential safety risks associated with direct robot operation. A preliminary user demonstration successfully validated the system's functionality, demonstrating its potential for safer, more intuitive, and immersive robotic control.

LLM-Driven Augmented Reality Puppeteer: Controller-Free Voice-Commanded Robot Teleoperation

TL;DR

The paper proposes a controller-free, LLM-driven voice-commanded AR puppeteering system for real-time robot teleoperation, implemented on the Meta Quest 3 and mirroring virtual robot motions to a physical Franka arm. It integrates RealtimeSTT for speech-to-text, locally hosted Llama 3.2 1B-Instruct Q6 models for reasoning, and a UDP OSC pipeline to execute validated commands, enabling hands-free control via voice within AR. By removing physical controllers and leveraging AR visualization, the approach aims to improve accessibility, safety, and immersion in HRI, building on prior controller-based AR puppeteering with a more natural interface. A preliminary user demonstration validates core functionality, and future work will focus on robust voice recognition, multimodal gestures, dynamic trajectory planning, and comprehensive user studies to assess usability and performance.

Abstract

The integration of robotics and augmented reality (AR) presents transformative opportunities for advancing human-robot interaction (HRI) by improving usability, intuitiveness, and accessibility. This work introduces a controller-free, LLM-driven voice-commanded AR puppeteering system, enabling users to teleoperate a robot by manipulating its virtual counterpart in real time. By leveraging natural language processing (NLP) and AR technologies, our system -- prototyped using Meta Quest 3 -- eliminates the need for physical controllers, enhancing ease of use while minimizing potential safety risks associated with direct robot operation. A preliminary user demonstration successfully validated the system's functionality, demonstrating its potential for safer, more intuitive, and immersive robotic control.

Paper Structure

This paper contains 18 sections, 3 figures.

Figures (3)

  • Figure 1: An overview of our proposed LLM-driven, controller-free, voice-commanded AR robotic puppeteering system. The system operates within an AR environment, utilizing the Meta Quest 3 HMD. Users can interact seamlessly with the virtual robot, modeled identically after the real Franka robot arm using only hand gestures, while voice commands are integrated to enable intuitive control of the virtual robot, which in turn governs the real robot.
  • Figure 2: The overview of the controller-based baseline AR robotic puppeteering system proposed by van2024puppeteer.
  • Figure 3: A schematic overview of the proposed system with a user in action. The virtual Franka robot arm is rendered within the AR view (the middle one). Users can interact with the virtual robot through controller-free mid-air interaction by hands and voice commands signaled by a ZOOM H2N microphone (the left sub-figures), enabling precise manipulation of its movements (the right sub-figures). Simultaneously, the physical robot mirrors the virtual robot’s trajectory in real-time, completing the ‘puppeteer’ process seamlessly.