CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized Cameras
James Tang, Shashwat Suri, Daniel Ajisafe, Bastian Wandt, Helge Rhodin
TL;DR
This work tackles the problem of reconstructing accurate 3D human motion from sparse, unsynchronized camera views by automatically calibrating intrinsics, extrinsics, and temporal offsets. It introduces a cascaded framework that partitions the high-dimensional calibration problem into sequential stages, solving for $N(4\times 6+1)$ parameters across cameras and refining with ICP and bundle adjustment. Key contributions include the cascade decomposition with tailored objective functions, an end-to-end pipeline that uses 2D keypoints as the sole input, and open-source code with hyperparameters for reproducibility. The approach enables practical multi-view motion capture with consumer-grade cameras, offering an automated alternative to marker-based calibration and hardware synchronization in diverse settings.
Abstract
It is now possible to estimate 3D human pose from monocular images with off-the-shelf 3D pose estimators. However, many practical applications require fine-grained absolute pose information for which multi-view cues and camera calibration are necessary. Such multi-view recordings are laborious because they require manual calibration, and are expensive when using dedicated hardware. Our goal is full automation, which includes temporal synchronization, as well as intrinsic and extrinsic camera calibration. This is done by using persons in the scene as the calibration objects. Existing methods either address only synchronization or calibration, assume one of the former as input, or have significant limitations. A common limitation is that they only consider single persons, which eases correspondence finding. We attain this generality by partitioning the high-dimensional time and calibration space into a cascade of subspaces and introduce tailored algorithms to optimize each efficiently and robustly. The outcome is an easy-to-use, flexible, and robust motion capture toolbox that we release to enable scientific applications, which we demonstrate on diverse multi-view benchmarks. Project website: https://github.com/jamestang1998/CasCalib.
