Humans as Checkerboards: Calibrating Camera Motion Scale for World-Coordinate Human Mesh Recovery
Fengyuan Yang, Kerui Gu, Ha Linh Nguyen, Tze Ho Elden Tse, Angela Yao
TL;DR
This work tackles the persistent problem of unknown scale in monocular SLAM for world-coordinate human mesh recovery. It introduces HAC, an optimization-free framework that uses the absolute depth of human joints predicted by HMR as calibration references to directly recover global camera and human motion in world coordinates. By linking the HMR-predicted depth of human-ground contacts to SLAM’s relative scene depth and employing a ground-plane fallback for out-of-view cases, HAC achieves state-of-the-art global motion accuracy while dramatically reducing computation time (roughly 100x faster than optimization-based methods). The approach demonstrates robust performance across diverse datasets and SLAM/HMR backbones, enabling scalable, real-time-like global human motion estimation in challenging video conditions.
Abstract
Accurate camera motion estimation is essential for recovering global human motion in world coordinates from RGB video inputs. SLAM is widely used for estimating camera trajectory and point cloud, but monocular SLAM does so only up to an unknown scale factor. Previous works estimate the scale factor through optimization, but this is unreliable and time-consuming. This paper presents an optimization-free scale calibration framework, Human as Checkerboard (HAC). HAC innovatively leverages the human body predicted by human mesh recovery model as a calibration reference. Specifically, it uses the absolute depth of human-scene contact joints as references to calibrate the corresponding relative scene depth from SLAM. HAC benefits from geometric priors encoded in human mesh recovery models to estimate the SLAM scale and achieves precise global human motion estimation. Simple yet powerful, our method sets a new state-of-the-art performance for global human mesh estimation tasks, reducing motion errors by 50% over prior local-to-global methods while using 100$\times$ less inference time than optimization-based methods. Project page: https://martayang.github.io/HAC.
