Table of Contents
Fetching ...

EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World

Ryan Punamiya, Simar Kareer, Zeyi Liu, Josh Citron, Ri-Zhao Qiu, Xiongyi Cai, Alexey Gavryushin, Jiaqi Chen, Davide Liconti, Lawrence Y. Zhu, Patcharapong Aphiwetsa, Baoyu Li, Aniketh Cheluva, Pranav Kuppili, Yangcen Liu, Dhruv Patel, Aidan Gao, Hye-Young Chung, Ryan Co, Renee Zbizika, Jeff Liu, Xiaomeng Xu, Haoyu Xiong, Geng Chen, Sebastiano Oliani, Chenyu Yang, Xi Wang, James Fort, Richard Newcombe, Josh Gao, Jason Chong, Garrett Matsuda, Aseem Doriwala, Marc Pollefeys, Robert Katzschmann, Xiaolong Wang, Shuran Song, Judy Hoffman, Danfei Xu

Abstract

Robot learning increasingly depends on large and diverse data, yet robot data collection remains expensive and difficult to scale. Egocentric human data offer a promising alternative by capturing rich manipulation behavior across everyday environments. However, existing human datasets are often limited in scope, difficult to extend, and fragmented across institutions. We introduce EgoVerse, a collaborative platform for human data-driven robot learning that unifies data collection, processing, and access under a shared framework, enabling contributions from individual researchers, academic labs, and industry partners. The current release includes 1,362 hours (80k episodes) of human demonstrations spanning 1,965 tasks, 240 scenes, and 2,087 unique demonstrators, with standardized formats, manipulation-relevant annotations, and tooling for downstream learning. Beyond the dataset, we conduct a large-scale study of human-to-robot transfer with experiments replicated across multiple labs, tasks, and robot embodiments under shared protocols. We find that policy performance generally improves with increased human data, but that effective scaling depends on alignment between human data and robot learning objectives. Together, the dataset, platform, and study establish a foundation for reproducible progress in human data-driven robot learning. Videos and additional information can be found at https://egoverse.ai/

EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World

Abstract

Robot learning increasingly depends on large and diverse data, yet robot data collection remains expensive and difficult to scale. Egocentric human data offer a promising alternative by capturing rich manipulation behavior across everyday environments. However, existing human datasets are often limited in scope, difficult to extend, and fragmented across institutions. We introduce EgoVerse, a collaborative platform for human data-driven robot learning that unifies data collection, processing, and access under a shared framework, enabling contributions from individual researchers, academic labs, and industry partners. The current release includes 1,362 hours (80k episodes) of human demonstrations spanning 1,965 tasks, 240 scenes, and 2,087 unique demonstrators, with standardized formats, manipulation-relevant annotations, and tooling for downstream learning. Beyond the dataset, we conduct a large-scale study of human-to-robot transfer with experiments replicated across multiple labs, tasks, and robot embodiments under shared protocols. We find that policy performance generally improves with increased human data, but that effective scaling depends on alignment between human data and robot learning objectives. Together, the dataset, platform, and study establish a foundation for reproducible progress in human data-driven robot learning. Videos and additional information can be found at https://egoverse.ai/

Paper Structure

This paper contains 60 sections, 7 equations, 18 figures, 11 tables.

Figures (18)

  • Figure 1: Overview. EgoVerse is a collaborative framework for scalable human data–driven robot learning. Capture: Egocentric demonstrations are collected worldwide using academic, industry, and community-accessible hardware systems, continuously aggregated by a centrally-hosted data management system. Dataset: All data are unified into a shared dataset with egocentric video, 3D hand poses, camera motion, and task descriptions across diverse tasks and scenes. Evaluation: This work presents a large-scale evaluation study on human-to-robot transfer with shared protocols across multiple labs and robot embodiments.
  • Figure 2: Human Data Capture Setup. (Left) EgoVerse is captured through a variety of hardware systems, including Project Aria glasses (academic labs), a phone-based capture system (accessible by everyone), and custom setups by industry partners. (Right) Regardless of sources, human data is processed into a unified format that contains at minimum egocentric videos, hand keypoints, and camera poses.
  • Figure 3: EgoDB. Human and robot data from multiple labs and partners are ingested into a cloud-based processing pipeline, unified in a common storage format, and made accessible through a web-based viewer. Users can sync filtered subsets of the dataset to local machines for downstream policy training.
  • Figure 4: Dataset Composition and Diversity. Left: EgoVerse-A and EgoVerse-I include six shared flagship manipulation tasks collected across diverse scenes and demonstrators. Right: EgoVerse-I contains over 1,500 open-ended tasks spanning everyday activity categories, with representative verb frequency distributions illustrating the diversity of manipulation actions.
  • Figure 5: UMAP of DINOv3 embeddings.
  • ...and 13 more figures