EgoZero: Robot Learning from Smart Glasses

Vincent Liu; Ademi Adeniji; Haotian Zhan; Siddhant Haldar; Raunaq Bhirangi; Pieter Abbeel; Lerrel Pinto

EgoZero: Robot Learning from Smart Glasses

Vincent Liu, Ademi Adeniji, Haotian Zhan, Siddhant Haldar, Raunaq Bhirangi, Pieter Abbeel, Lerrel Pinto

TL;DR

EgoZero tackles the data bottleneck in real-world robotics by learning zero-shot manipulation policies from in-the-wild egocentric human demonstrations captured with Project Aria glasses, without any robot data. It unifies human and robot domains using ego-centric 3D point representations and trains a closed-loop Transformer policy via behavior cloning on this shared space, relying on triangulated object points and hand-pose cues. The approach demonstrates 70% zero-shot success across seven tasks on a Franka Panda, with only 20 minutes of human data per task and strong generalization to new viewpoints, object poses, and instances. This work suggests that scalable, diverse human data can serve as a practical foundation for real-world robot learning, paving the way for more human-centric and data-efficient robotics research.

Abstract

Despite recent progress in general purpose robotics, robot policies still lag far behind basic human capabilities in the real world. Humans interact constantly with the physical world, yet this rich data resource remains largely untapped in robot learning. We propose EgoZero, a minimal system that learns robust manipulation policies from human demonstrations captured with Project Aria smart glasses, $\textbf{and zero robot data}$. EgoZero enables: (1) extraction of complete, robot-executable actions from in-the-wild, egocentric, human demonstrations, (2) compression of human visual observations into morphology-agnostic state representations, and (3) closed-loop policy learning that generalizes morphologically, spatially, and semantically. We deploy EgoZero policies on a gripper Franka Panda robot and demonstrate zero-shot transfer with 70% success rate over 7 manipulation tasks and only 20 minutes of data collection per task. Our results suggest that in-the-wild human data can serve as a scalable foundation for real-world robot learning - paving the way toward a future of abundant, diverse, and naturalistic training data for robots. Code and videos are available at https://egozero-robot.github.io.

EgoZero: Robot Learning from Smart Glasses

TL;DR

Abstract

EgoZero: Robot Learning from Smart Glasses

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)