Introducing HOT3D: An Egocentric Dataset for 3D Hand and Object Tracking
Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Fan Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard Newcombe, Robert Wang, Jakob Julian Engel, Tomas Hodan
TL;DR
HOT3D introduces a large-scale, multi-view egocentric dataset for 3D hand–object tracking, collected with real head-mounted devices and anchored by precise marker-based motion capture. It provides 3.7M images, 6DoF ground-truth poses for hands, objects, and cameras, plus rich modalities like eye gaze, SLAM point clouds, and high-fidelity PBR object models. The dataset supports both model-based and model-free tracking, includes onboarding sequences and curated clips, and sets up public ECCV 2024 challenges to standardize benchmarking and foster reproducibility. By enabling diverse, realistic hand–object interactions in AR/VR contexts, HOT3D aims to accelerate advances in perception, reconstruction, and human–machine collaboration.
Abstract
We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, as well as comprehensive ground truth annotations including 3D poses of objects, hands, and cameras, and 3D models of hands and objects. In addition to simple pick-up/observe/put-down actions, HOT3D contains scenarios resembling typical actions in a kitchen, office, and living room environment. The dataset is recorded by two head-mounted devices from Meta: Project Aria, a research prototype of light-weight AR/AI glasses, and Quest 3, a production VR headset sold in millions of units. Ground-truth poses were obtained by a professional motion-capture system using small optical markers attached to hands and objects. Hand annotations are provided in the UmeTrack and MANO formats and objects are represented by 3D meshes with PBR materials obtained by an in-house scanner. We aim to accelerate research on egocentric hand-object interaction by making the HOT3D dataset publicly available and by co-organizing public challenges on the dataset at ECCV 2024. The dataset can be downloaded from the project website: https://facebookresearch.github.io/hot3d/.
