RoMo: A Robust Solver for Full-body Unlabeled Optical Motion Capture
Xiaoyu Pan, Bowen Zheng, Xinwei Jiang, Zijiao Zeng, Qilong Kou, He Wang, Xiaogang Jin
TL;DR
RoMo addresses mislabeling and positional noise in full-body unlabeled optical MoCap by introducing a divide-and-conquer labeling pipeline that leverages temporal tracklets via a K-partite graph and a hybrid inverse kinematics solver that uses joint positions as intermediate representations. The approach decouples alignment, segmentation, and part-specific labeling, while tracklets exploit temporal continuity and deep marker features to improve labeling accuracy; motion solving is performed through forward kinematics guided by global joint positions, reducing error accumulation along the kinematic chain. Empirical results across multiple datasets show RoMo achieving state-of-the-art labeling and solving performance, with high body labeling accuracy, strong hand labeling, and substantial reductions in joint-position errors, all while remaining applicable where commercial systems may struggle. The work provides a cost-free, flexible framework that can handle diverse marker layouts and occlusions, and its code and data are publicly available for broader adoption and benchmarking.
Abstract
Optical motion capture (MoCap) is the "gold standard" for accurately capturing full-body motions. To make use of raw MoCap point data, the system labels the points with corresponding body part locations and solves the full-body motions. However, MoCap data often contains mislabeling, occlusion and positional errors, requiring extensive manual correction. To alleviate this burden, we introduce RoMo, a learning-based framework for robustly labeling and solving raw optical motion capture data. In the labeling stage, RoMo employs a divide-and-conquer strategy to break down the complex full-body labeling challenge into manageable subtasks: alignment, full-body segmentation and part-specific labeling. To utilize the temporal continuity of markers, RoMo generates marker tracklets using a K-partite graph-based clustering algorithm, where markers serve as nodes, and edges are formed based on positional and feature similarities. For motion solving, to prevent error accumulation along the kinematic chain, we introduce a hybrid inverse kinematic solver that utilizes joint positions as intermediate representations and adjusts the template skeleton to match estimated joint positions. We demonstrate that RoMo achieves high labeling and solving accuracy across multiple metrics and various datasets. Extensive comparisons show that our method outperforms state-of-the-art research methods. On a real dataset, RoMo improves the F1 score of hand labeling from 0.94 to 0.98, and reduces joint position error of body motion solving by 25%. Furthermore, RoMo can be applied in scenarios where commercial systems are inadequate. The code and data for RoMo are available at https://github.com/non-void/RoMo.
