RoMo: A Robust Solver for Full-body Unlabeled Optical Motion Capture

Xiaoyu Pan; Bowen Zheng; Xinwei Jiang; Zijiao Zeng; Qilong Kou; He Wang; Xiaogang Jin

RoMo: A Robust Solver for Full-body Unlabeled Optical Motion Capture

Xiaoyu Pan, Bowen Zheng, Xinwei Jiang, Zijiao Zeng, Qilong Kou, He Wang, Xiaogang Jin

TL;DR

RoMo addresses mislabeling and positional noise in full-body unlabeled optical MoCap by introducing a divide-and-conquer labeling pipeline that leverages temporal tracklets via a K-partite graph and a hybrid inverse kinematics solver that uses joint positions as intermediate representations. The approach decouples alignment, segmentation, and part-specific labeling, while tracklets exploit temporal continuity and deep marker features to improve labeling accuracy; motion solving is performed through forward kinematics guided by global joint positions, reducing error accumulation along the kinematic chain. Empirical results across multiple datasets show RoMo achieving state-of-the-art labeling and solving performance, with high body labeling accuracy, strong hand labeling, and substantial reductions in joint-position errors, all while remaining applicable where commercial systems may struggle. The work provides a cost-free, flexible framework that can handle diverse marker layouts and occlusions, and its code and data are publicly available for broader adoption and benchmarking.

Abstract

Optical motion capture (MoCap) is the "gold standard" for accurately capturing full-body motions. To make use of raw MoCap point data, the system labels the points with corresponding body part locations and solves the full-body motions. However, MoCap data often contains mislabeling, occlusion and positional errors, requiring extensive manual correction. To alleviate this burden, we introduce RoMo, a learning-based framework for robustly labeling and solving raw optical motion capture data. In the labeling stage, RoMo employs a divide-and-conquer strategy to break down the complex full-body labeling challenge into manageable subtasks: alignment, full-body segmentation and part-specific labeling. To utilize the temporal continuity of markers, RoMo generates marker tracklets using a K-partite graph-based clustering algorithm, where markers serve as nodes, and edges are formed based on positional and feature similarities. For motion solving, to prevent error accumulation along the kinematic chain, we introduce a hybrid inverse kinematic solver that utilizes joint positions as intermediate representations and adjusts the template skeleton to match estimated joint positions. We demonstrate that RoMo achieves high labeling and solving accuracy across multiple metrics and various datasets. Extensive comparisons show that our method outperforms state-of-the-art research methods. On a real dataset, RoMo improves the F1 score of hand labeling from 0.94 to 0.98, and reduces joint position error of body motion solving by 25%. Furthermore, RoMo can be applied in scenarios where commercial systems are inadequate. The code and data for RoMo are available at https://github.com/non-void/RoMo.

RoMo: A Robust Solver for Full-body Unlabeled Optical Motion Capture

TL;DR

Abstract

Paper Structure (10 sections, 5 equations, 6 figures, 3 tables)

This paper contains 10 sections, 5 equations, 6 figures, 3 tables.

Introduction
Related Work
Method
Point Cloud Pre-processing
Tracklet Generation and Labeling
Hybrid Inverse Kinematics-based Solving
Experiments
Point Cloud Labeling
Motion Solving
Conclusion, Limitations and Future Work

Figures (6)

Figure 1: RoMo's pipeline consists of three modules. Top: In the pre-processing stage, RoMo accepts 3D sparse unordered MoCap point clouds with varying point numbers. It then conducts point cloud alignment to eliminate the global transformations and segmentation to partition the point cloud into body and hand point clouds. Subsequently, it employs a network consisting of alternating global self-attention and local aggregation layers to extract markers' features. Bottom Left: In the tracklet construction stage, RoMo addresses a K-partite graph-based clustering problem to create tracklets, and assigns markers within the same tracklet to a same label. Bottom Right: RoMo utilizes a hybrid inverse kinematics-based method to solve the motion, which iteratively adjust the joints of template skeleton along the kinematic tree to match the estimated joint positions.
Figure 2: a) An illustration of rotation decomposition. b) RoMo utilizes global joint positions and avoids error accumulation along kinematic chain.
Figure 3: A qualitative comparison with other methods on hand labeling, where each column represents a timestamp in a MoCap sequence. In cases of outliers, we either randomly place them in positions corresponding to occluded markers or discard them if there isn't enough space.
Figure 4: Qualitative comparisons of solved motions. To compare with the ground truth, we render the skeletons and overlay the ground truth’s skeleton onto those generated by solving methods. Positions with significant differences are indicated with red boxes.
Figure 5: Additional qualitative comparison of labeling.
...and 1 more figures

RoMo: A Robust Solver for Full-body Unlabeled Optical Motion Capture

TL;DR

Abstract

RoMo: A Robust Solver for Full-body Unlabeled Optical Motion Capture

Authors

TL;DR

Abstract

Table of Contents

Figures (6)