EgoPet: Egomotion and Interaction Data from an Animal's Perspective
Amir Bar, Arya Bakhtiar, Danny Tran, Antonio Loquercio, Jathushan Rajasegaran, Yann LeCun, Amir Globerson, Trevor Darrell
TL;DR
EgoPet introduces a large-scale egocentric animal video dataset (~$84$ hours) with egomotion and interaction data across diverse species. It defines three benchmarks—VIP for visual interactions, LP for forward trajectory prediction, and VPP for vision-to-proprioception transfer to legged locomotion—and demonstrates that pretraining on EgoPet yields strong downstream performance, especially for robotics-oriented tasks. The work shows EgoPet's potential to bridge the gap between animal-like perception and action and current AI capabilities, while revealing that interaction prediction remains substantially challenging. The dataset provides a foundation for self-supervised learning and robotics research, with future directions including multi-sensory integration (e.g., audio) to better model animal behavior.
Abstract
Animals perceive the world to plan their actions and interact with other agents to accomplish complex tasks, demonstrating capabilities that are still unmatched by AI systems. To advance our understanding and reduce the gap between the capabilities of animals and AI systems, we introduce a dataset of pet egomotion imagery with diverse examples of simultaneous egomotion and multi-agent interaction. Current video datasets separately contain egomotion and interaction examples, but rarely both at the same time. In addition, EgoPet offers a radically distinct perspective from existing egocentric datasets of humans or vehicles. We define two in-domain benchmark tasks that capture animal behavior, and a third benchmark to assess the utility of EgoPet as a pretraining resource to robotic quadruped locomotion, showing that models trained from EgoPet outperform those trained from prior datasets.
