Table of Contents
Fetching ...

FreeCap: Hybrid Calibration-Free Motion Capture in Open Environments

Aoru Xue, Yiming Ren, Zining Song, Mao Ye, Xinge Zhu, Yuexin Ma

TL;DR

FreeCap tackles open-environment multi-person motion capture without sensor calibration by fusing a single LiDAR with expandable moving cameras. It introduces Pose-aware Cross-sensor Matching to establish robust cross-sensor alignment and a coarse-to-fine Sensor-expandable Pose Optimizer that fuses multi-modal data and refines 3D key points, followed by an SMPL-based solver to recover full body meshes in a unified world coordinate system. The approach demonstrates state-of-the-art performance on large-scale datasets (Human-M3, FreeMotion) and maintains robustness under novel camera viewpoints, highlighting its practical applicability for flexible, scalable mocap in diverse settings. By combining 2D/3D key point fusion, temporal context, and cross-modal interaction, FreeCap provides an expandable, calibration-free Mocap solution with strong potential for sport analytics, animation, and AR/VR applications.

Abstract

We propose a novel hybrid calibration-free method FreeCap to accurately capture global multi-person motions in open environments. Our system combines a single LiDAR with expandable moving cameras, allowing for flexible and precise motion estimation in a unified world coordinate. In particular, We introduce a local-to-global pose-aware cross-sensor human-matching module that predicts the alignment among each sensor, even in the absence of calibration. Additionally, our coarse-to-fine sensor-expandable pose optimizer further optimizes the 3D human key points and the alignments, it is also capable of incorporating additional cameras to enhance accuracy. Extensive experiments on Human-M3 and FreeMotion datasets demonstrate that our method significantly outperforms state-of-the-art single-modal methods, offering an expandable and efficient solution for multi-person motion capture across various applications.

FreeCap: Hybrid Calibration-Free Motion Capture in Open Environments

TL;DR

FreeCap tackles open-environment multi-person motion capture without sensor calibration by fusing a single LiDAR with expandable moving cameras. It introduces Pose-aware Cross-sensor Matching to establish robust cross-sensor alignment and a coarse-to-fine Sensor-expandable Pose Optimizer that fuses multi-modal data and refines 3D key points, followed by an SMPL-based solver to recover full body meshes in a unified world coordinate system. The approach demonstrates state-of-the-art performance on large-scale datasets (Human-M3, FreeMotion) and maintains robustness under novel camera viewpoints, highlighting its practical applicability for flexible, scalable mocap in diverse settings. By combining 2D/3D key point fusion, temporal context, and cross-modal interaction, FreeCap provides an expandable, calibration-free Mocap solution with strong potential for sport analytics, animation, and AR/VR applications.

Abstract

We propose a novel hybrid calibration-free method FreeCap to accurately capture global multi-person motions in open environments. Our system combines a single LiDAR with expandable moving cameras, allowing for flexible and precise motion estimation in a unified world coordinate. In particular, We introduce a local-to-global pose-aware cross-sensor human-matching module that predicts the alignment among each sensor, even in the absence of calibration. Additionally, our coarse-to-fine sensor-expandable pose optimizer further optimizes the 3D human key points and the alignments, it is also capable of incorporating additional cameras to enhance accuracy. Extensive experiments on Human-M3 and FreeMotion datasets demonstrate that our method significantly outperforms state-of-the-art single-modal methods, offering an expandable and efficient solution for multi-person motion capture across various applications.

Paper Structure

This paper contains 20 sections, 10 equations, 4 figures, 3 tables, 2 algorithms.

Figures (4)

  • Figure 1: Visualization of our FreeCap in a real-time captured scenario. Our settings include a single LiDAR and four cameras. Camera-1 follows the running person, camera-2 surrounds two people playing soccer, camera-3 focuses on the main person playing frisbee and camera-4 captures three persons. We zoom in some cases to the right.
  • Figure 2: The pipeline of FreeCap. It consists of three main modules, including the pose-aware cross-sensor matching estimate of the optimal pairs and aligns matrix, the sensor-expandable pose optimizer predicts the 3D human joints, and the SMPL solver regresses the SMPL parameters. We also show the details of PCM and SPO.
  • Figure 3: Qualitative comparisons. We show the global human mesh with point cloud, the point cloud matches the result better, representing more accurate estimation.
  • Figure 4: Visualization of our matching results in Human-M3. The view of camera and LiDAR is different, while the location of camera is labeled by the camera logo. We zoom in on some cases on the right.