Table of Contents
Fetching ...

BEDLAM2.0: Synthetic Humans and Cameras in Motion

Joachim Tesch, Giorgio Becherini, Prerana Achar, Anastasios Yiannakidis, Muhammed Kocabas, Priyanka Patel, Michael J. Black

TL;DR

BEDLAM2.0 provides a substantially richer synthetic dataset for 3D human motion in world coordinates, expanding camera intrinsics, camera motions, body shapes, hair, shoes, clothing, scenes, and lighting. By combining synthetic and captured camera motions with a large, diverse pool of SMPL-X bodies, hair grooms, and clothing simulated in realistic environments, it enables end-to-end training of world-space pose estimators and improves state-of-the-art performance without real data. The work demonstrates significant gains in shape accuracy and world-coordinate pose estimation, and releases extensive assets, tools, and benchmarks to foster further research. While acknowledging limitations such as object interactions and facial motion, BEDLAM2.0 establishes a practical, scalable platform for advancing 3D human understanding in dynamic, real-world-like scenarios with moving cameras.

Abstract

Inferring 3D human motion from video remains a challenging problem with many applications. While traditional methods estimate the human in image coordinates, many applications require human motion to be estimated in world coordinates. This is particularly challenging when there is both human and camera motion. Progress on this topic has been limited by the lack of rich video data with ground truth human and camera movement. We address this with BEDLAM2.0, a new dataset that goes beyond the popular BEDLAM dataset in important ways. In addition to introducing more diverse and realistic cameras and camera motions, BEDLAM2.0 increases diversity and realism of body shape, motions, clothing, hair, and 3D environments. Additionally, it adds shoes, which were missing in BEDLAM. BEDLAM has become a key resource for training 3D human pose and motion regressors today and we show that BEDLAM2.0 is significantly better, particularly for training methods that estimate humans in world coordinates. We compare state-of-the art methods trained on BEDLAM and BEDLAM2.0, and find that BEDLAM2.0 significantly improves accuracy over BEDLAM. For research purposes, we provide the rendered videos, ground truth body parameters, and camera motions. We also provide the 3D assets to which we have rights and links to those from third parties.

BEDLAM2.0: Synthetic Humans and Cameras in Motion

TL;DR

BEDLAM2.0 provides a substantially richer synthetic dataset for 3D human motion in world coordinates, expanding camera intrinsics, camera motions, body shapes, hair, shoes, clothing, scenes, and lighting. By combining synthetic and captured camera motions with a large, diverse pool of SMPL-X bodies, hair grooms, and clothing simulated in realistic environments, it enables end-to-end training of world-space pose estimators and improves state-of-the-art performance without real data. The work demonstrates significant gains in shape accuracy and world-coordinate pose estimation, and releases extensive assets, tools, and benchmarks to foster further research. While acknowledging limitations such as object interactions and facial motion, BEDLAM2.0 establishes a practical, scalable platform for advancing 3D human understanding in dynamic, real-world-like scenarios with moving cameras.

Abstract

Inferring 3D human motion from video remains a challenging problem with many applications. While traditional methods estimate the human in image coordinates, many applications require human motion to be estimated in world coordinates. This is particularly challenging when there is both human and camera motion. Progress on this topic has been limited by the lack of rich video data with ground truth human and camera movement. We address this with BEDLAM2.0, a new dataset that goes beyond the popular BEDLAM dataset in important ways. In addition to introducing more diverse and realistic cameras and camera motions, BEDLAM2.0 increases diversity and realism of body shape, motions, clothing, hair, and 3D environments. Additionally, it adds shoes, which were missing in BEDLAM. BEDLAM has become a key resource for training 3D human pose and motion regressors today and we show that BEDLAM2.0 is significantly better, particularly for training methods that estimate humans in world coordinates. We compare state-of-the art methods trained on BEDLAM and BEDLAM2.0, and find that BEDLAM2.0 significantly improves accuracy over BEDLAM. For research purposes, we provide the rendered videos, ground truth body parameters, and camera motions. We also provide the 3D assets to which we have rights and links to those from third parties.

Paper Structure

This paper contains 46 sections, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Camera intrinsics. (left) Log frequency of focal lengths. Red: BEDLAM; Blue BEDLAM2.0. (right) Histogram of the horizontal field of view (HFOV).
  • Figure 2: Camera motion statistics. (top) Log histogram of camera height above the ground (50-1200cm). (bottom left) Video duration (seconds). (bottom middle) Histogram of pitch. (bottom right) Log histogram of roll. (right) Distribution of different types of camera motion (see text).
  • Figure 3: Sample camera motions used in dataset. Left: Synthetic, Right: Captured.
  • Figure 4: Body shapes. Left: Histogram of BMIs for the body shapes in BEDLAM (red) and BEDLAM2.0 (blue). Right: Resampled histogram used to generate more diverse bodies in BEDLAM2.0.
  • Figure 5: Strand-based hair. These examples illustrate the realism and diversity, which is much better than in BEDLAM.
  • ...and 11 more figures