HOI-M3:Capture Multiple Humans and Objects Interaction within Contextual Environment
Juze Zhang, Jingyan Zhang, Zining Song, Zhanhe Shi, Chengfeng Zhao, Ye Shi, Jingyi Yu, Lan Xu, Jingya Wang
TL;DR
HOI-M3 addresses the scarcity of datasets for multi-human multi-object interactions by providing a large-scale, multi-view 3D motion capture dataset collected with dense RGB cameras and object-mounted IMUs. It introduces a robust capture and annotation pipeline, along with two data-driven downstream tasks: monocular capture of multiple HOI and unstructured generation of multiple HOI, each with strong baselines. The dataset comprises 181 million frames across 199 sequences, 42 viewpoints, 90 objects, and 31 human subjects, enabling rich HOI perception and generation research. By releasing data, code, and models, HOI-M3 aims to catalyze advances in understanding social interactions with surrounding objects for applications in embodied AI, robotics, and VR/AR.
Abstract
Humans naturally interact with both others and the surrounding multiple objects, engaging in various social activities. However, recent advances in modeling human-object interactions mostly focus on perceiving isolated individuals and objects, due to fundamental data scarcity. In this paper, we introduce HOI-M3, a novel large-scale dataset for modeling the interactions of Multiple huMans and Multiple objects. Notably, it provides accurate 3D tracking for both humans and objects from dense RGB and object-mounted IMU inputs, covering 199 sequences and 181M frames of diverse humans and objects under rich activities. With the unique HOI-M3 dataset, we introduce two novel data-driven tasks with companion strong baselines: monocular capture and unstructured generation of multiple human-object interactions. Extensive experiments demonstrate that our dataset is challenging and worthy of further research about multiple human-object interactions and behavior analysis. Our HOI-M3 dataset, corresponding codes, and pre-trained models will be disseminated to the community for future research.
