LLM-Driven Augmented Reality Puppeteer: Controller-Free Voice-Commanded Robot Teleoperation
Yuchong Zhang, Bastian Orthmann, Michael C. Welle, Jonne Van Haastregt, Danica Kragic
TL;DR
The paper proposes a controller-free, LLM-driven voice-commanded AR puppeteering system for real-time robot teleoperation, implemented on the Meta Quest 3 and mirroring virtual robot motions to a physical Franka arm. It integrates RealtimeSTT for speech-to-text, locally hosted Llama 3.2 1B-Instruct Q6 models for reasoning, and a UDP OSC pipeline to execute validated commands, enabling hands-free control via voice within AR. By removing physical controllers and leveraging AR visualization, the approach aims to improve accessibility, safety, and immersion in HRI, building on prior controller-based AR puppeteering with a more natural interface. A preliminary user demonstration validates core functionality, and future work will focus on robust voice recognition, multimodal gestures, dynamic trajectory planning, and comprehensive user studies to assess usability and performance.
Abstract
The integration of robotics and augmented reality (AR) presents transformative opportunities for advancing human-robot interaction (HRI) by improving usability, intuitiveness, and accessibility. This work introduces a controller-free, LLM-driven voice-commanded AR puppeteering system, enabling users to teleoperate a robot by manipulating its virtual counterpart in real time. By leveraging natural language processing (NLP) and AR technologies, our system -- prototyped using Meta Quest 3 -- eliminates the need for physical controllers, enhancing ease of use while minimizing potential safety risks associated with direct robot operation. A preliminary user demonstration successfully validated the system's functionality, demonstrating its potential for safer, more intuitive, and immersive robotic control.
