Kineo: Calibration-Free Metric Motion Capture From Sparse RGB Cameras
Charles Javerliat, Pierre Raimbaud, Guillaume Lavoué
TL;DR
Kineo tackles the challenge of calibration-free, markerless multi-view motion capture with unsynchronized consumer RGB cameras. It combines a 2D-keypoint–driven SfM-style pipeline with a graph-based, distortion-aware calibration strategy that estimates intrinsics, extrinsics, and metric scale without manual setup, while producing 3D keypoints and dense scene maps at real-world scale. The approach introduces a confidence-driven keypoint sampling, a minimum spanning tree calibration, a novel 3D confidence score, and dual scale-recovery strategies (SMPL-based and metric-depth) to achieve state-of-the-art performance among calibration-free methods on EgoHumans and Human3.6M, with substantial reductions in translation and rotation errors and improved W-MPJPE. By prioritizing modular, detector-agnostic components and efficient computation, Kineo demonstrates practical applicability for long sequences and real-time scenarios on commodity hardware, with open-source releases to promote adoption. Overall, the paper delivers a robust, scalable, and accessible framework that closes much of the gap between calibration-free and calibrated motion capture while enabling real-world deployment across humans and non-human subjects.
Abstract
Markerless multiview motion capture is often constrained by the need for precise camera calibration, limiting accessibility for non-experts and in-the-wild captures. Existing calibration-free approaches mitigate this requirement but suffer from high computational cost and reduced reconstruction accuracy. We present Kineo, a fully automatic, calibration-free pipeline for markerless motion capture from videos captured by unsynchronized, uncalibrated, consumer-grade RGB cameras. Kineo leverages 2D keypoints from off-the-shelf detectors to simultaneously calibrate cameras, including Brown-Conrady distortion coefficients, and reconstruct 3D keypoints and dense scene point maps at metric scale. A confidence-driven spatio-temporal keypoint sampling strategy, combined with graph-based global optimization, ensures robust calibration at a fixed computational cost independent of sequence length. We further introduce a pairwise reprojection consensus score to quantify 3D reconstruction reliability for downstream tasks. Evaluations on EgoHumans and Human3.6M demonstrate substantial improvements over prior calibration-free methods. Compared to previous state-of-the-art approaches, Kineo reduces camera translation error by approximately 83-85%, camera angular error by 86-92%, and world mean-per-joint error (W-MPJPE) by 83-91%. Kineo is also efficient in real-world scenarios, processing multi-view sequences faster than their duration in specific configuration (e.g., 36min to process 1h20min of footage). The full pipeline and evaluation code are openly released to promote reproducibility and practical adoption at https://liris-xr.github.io/kineo/.
