BEDLAM2.0: Synthetic Humans and Cameras in Motion
Joachim Tesch, Giorgio Becherini, Prerana Achar, Anastasios Yiannakidis, Muhammed Kocabas, Priyanka Patel, Michael J. Black
TL;DR
BEDLAM2.0 provides a substantially richer synthetic dataset for 3D human motion in world coordinates, expanding camera intrinsics, camera motions, body shapes, hair, shoes, clothing, scenes, and lighting. By combining synthetic and captured camera motions with a large, diverse pool of SMPL-X bodies, hair grooms, and clothing simulated in realistic environments, it enables end-to-end training of world-space pose estimators and improves state-of-the-art performance without real data. The work demonstrates significant gains in shape accuracy and world-coordinate pose estimation, and releases extensive assets, tools, and benchmarks to foster further research. While acknowledging limitations such as object interactions and facial motion, BEDLAM2.0 establishes a practical, scalable platform for advancing 3D human understanding in dynamic, real-world-like scenarios with moving cameras.
Abstract
Inferring 3D human motion from video remains a challenging problem with many applications. While traditional methods estimate the human in image coordinates, many applications require human motion to be estimated in world coordinates. This is particularly challenging when there is both human and camera motion. Progress on this topic has been limited by the lack of rich video data with ground truth human and camera movement. We address this with BEDLAM2.0, a new dataset that goes beyond the popular BEDLAM dataset in important ways. In addition to introducing more diverse and realistic cameras and camera motions, BEDLAM2.0 increases diversity and realism of body shape, motions, clothing, hair, and 3D environments. Additionally, it adds shoes, which were missing in BEDLAM. BEDLAM has become a key resource for training 3D human pose and motion regressors today and we show that BEDLAM2.0 is significantly better, particularly for training methods that estimate humans in world coordinates. We compare state-of-the art methods trained on BEDLAM and BEDLAM2.0, and find that BEDLAM2.0 significantly improves accuracy over BEDLAM. For research purposes, we provide the rendered videos, ground truth body parameters, and camera motions. We also provide the 3D assets to which we have rights and links to those from third parties.
