MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors
He Zhang, Shenghao Ren, Haolei Yuan, Jianhui Zhao, Fan Li, Shuangpeng Sun, Zhenghao Liang, Tao Yu, Qiu Shen, Xun Cao
TL;DR
MMVP introduces a vision-pressure multimodal MoCap dataset that pairs RGBD video with dense plantar pressure to enable accurate, dense foot-contact annotations during large-range, fast motions. It contributes an RGBD-P SMPL fitting approach that leverages both depth and pressure signals to constrain pose, shape, and ground contact, and a monocular baseline, VP-MoCap, that predicts foot pressure and refines pose/translation using ground depth and contact cues. Across GT fitting, contact estimation, and pose-translation optimization, the methods outperform vision-only baselines and prior multimodal methods, demonstrating improved global translation stability and reduced foot sliding. The dataset and baselines are poised to advance MoCap research in AR/VR, biomechanics, and related domains by providing synchronized vision and pressure signals with precise contact annotations.
Abstract
Foot contact is an important cue for human motion capture, understanding, and generation. Existing datasets tend to annotate dense foot contact using visual matching with thresholding or incorporating pressure signals. However, these approaches either suffer from low accuracy or are only designed for small-range and slow motion. There is still a lack of a vision-pressure multimodal dataset with large-range and fast human motion, as well as accurate and dense foot-contact annotation. To fill this gap, we propose a Multimodal MoCap Dataset with Vision and Pressure sensors, named MMVP. MMVP provides accurate and dense plantar pressure signals synchronized with RGBD observations, which is especially useful for both plausible shape estimation, robust pose fitting without foot drifting, and accurate global translation tracking. To validate the dataset, we propose an RGBD-P SMPL fitting method and also a monocular-video-based baseline framework, VP-MoCap, for human motion capture. Experiments demonstrate that our RGBD-P SMPL Fitting results significantly outperform pure visual motion capture. Moreover, VP-MoCap outperforms SOTA methods in foot-contact and global translation estimation accuracy. We believe the configuration of the dataset and the baseline frameworks will stimulate the research in this direction and also provide a good reference for MoCap applications in various domains. Project page: https://metaverse-ai-lab-thu.github.io/MMVP-Dataset/.
