Table of Contents
Fetching ...

ARMADA: Augmented Reality for Robot Manipulation and Robot-Free Data Acquisition

Nataliya Nechyporenko, Ryan Hoque, Christopher Webb, Mouli Sivapurapu, Jian Zhang

TL;DR

This work addresses the data bottleneck in imitation learning by enabling robot-compatible demonstrations without a physical robot, using ARMADA: an Apple Vision Pro–based AR system that overlays a real-time robot digital twin and provides live feedback. The approach maps user hand poses to robot IK within a Drake-based solver, streaming data at $30\ \,\mathrm{Hz}$ and replaying trajectories on hardware. In a study with 15 participants across 3 tasks and 3 feedback conditions, live AR feedback raised average replayability from $1.3\%$ to $71.1\%$, with post-feedback still lagging behind live visualization. This suggests scalable data collection for IL is feasible without robot teleoperation, with future work toward learning control policies from such data and extending IK retargeting for more complex tasks.

Abstract

Teleoperation for robot imitation learning is bottlenecked by hardware availability. Can high-quality robot data be collected without a physical robot? We present a system for augmenting Apple Vision Pro with real-time virtual robot feedback. By providing users with an intuitive understanding of how their actions translate to robot motions, we enable the collection of natural barehanded human data that is compatible with the limitations of physical robot hardware. We conducted a user study with 15 participants demonstrating 3 different tasks each under 3 different feedback conditions and directly replayed the collected trajectories on physical robot hardware. Results suggest live robot feedback dramatically improves the quality of the collected data, suggesting a new avenue for scalable human data collection without access to robot hardware. Videos and more are available at https://nataliya.dev/armada.

ARMADA: Augmented Reality for Robot Manipulation and Robot-Free Data Acquisition

TL;DR

This work addresses the data bottleneck in imitation learning by enabling robot-compatible demonstrations without a physical robot, using ARMADA: an Apple Vision Pro–based AR system that overlays a real-time robot digital twin and provides live feedback. The approach maps user hand poses to robot IK within a Drake-based solver, streaming data at and replaying trajectories on hardware. In a study with 15 participants across 3 tasks and 3 feedback conditions, live AR feedback raised average replayability from to , with post-feedback still lagging behind live visualization. This suggests scalable data collection for IL is feasible without robot teleoperation, with future work toward learning control policies from such data and extending IK retargeting for more complex tasks.

Abstract

Teleoperation for robot imitation learning is bottlenecked by hardware availability. Can high-quality robot data be collected without a physical robot? We present a system for augmenting Apple Vision Pro with real-time virtual robot feedback. By providing users with an intuitive understanding of how their actions translate to robot motions, we enable the collection of natural barehanded human data that is compatible with the limitations of physical robot hardware. We conducted a user study with 15 participants demonstrating 3 different tasks each under 3 different feedback conditions and directly replayed the collected trajectories on physical robot hardware. Results suggest live robot feedback dramatically improves the quality of the collected data, suggesting a new avenue for scalable human data collection without access to robot hardware. Videos and more are available at https://nataliya.dev/armada.

Paper Structure

This paper contains 17 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overview. (A) Human demonstrators wearing Apple Vision Pro can collect data directly with their hands. (B) Egocentric view within Vision Pro shows real-time robot execution overlaid on the user's hands with augmented reality. (C) High-quality demonstrations collected with this system can be directly replayed on physical robot hardware.
  • Figure 2: Overview of the system architecture described in Section \ref{['ssec:arch']}. Human skeletal data is sent over websockets to an external compute device, which runs a live robot simulation that follows the pose targets given by the human data. The robot proprioceptive data is then sent back to Vision Pro for AR visualization. The full loop runs at 30 Hz.
  • Figure 3: Constraints. (A) The virtual robot gradually turns yellow as it approaches a singular configuration. (B) The background turns red when the robot moves beyond the Cartesian space boundaries. (C) The user is alerted with a virtual text overlay when their hand motion exceeds the robot's velocity limits.
  • Figure 4: Tasks. The three tasks from Section \ref{['ssec:tasks']}. Left: egocentric view of initial states during data collection with Feedback. Middle: egocentric view of final states during data collection with Feedback. Right: mid-task robot execution during trajectory replay.
  • Figure 5: User Interface.Left: main menu with toggle options. Right, Top: hand placement in spheres to initialize demonstration. Right, Bottom: hand placement to release spheres and end demonstration.