SIDQL: An Efficient Keyframe Extraction and Motion Reconstruction Framework in Motion Capture
Xuling Zhang, Ziru Zhang, Yuyang Wang, Lik-hang Lee, Pan Hui
TL;DR
This work tackles ultra-low latency in motion capture for Metaverse applications by introducing SIDQL, a framework that uses Deep Q-Learning to select keyframes and a velocity-informed, bone-length-preserving reconstruction in a spherical coordinate system. The core idea is to minimize the mean reconstruction error $Q(S,K)$ by choosing keyframe indices $K$, enabling substantial data reduction while keeping the reconstruction error below $0.09$ with five keyframes. The method converts pose data into spherical coordinates, reconstructs non-root joints via spherical interpolation and roots via polynomial interpolation, and learns a label-free keyframe policy that generalizes over mixed-category motions. Experiments on the CMU MoCap dataset show significant latency reductions and competitive accuracy, with SIDQL offering a fast alternative to greedy approaches while achieving similar reconstruction quality.
Abstract
Metaverse, which integrates the virtual and physical worlds, has emerged as an innovative paradigm for changing people's lifestyles. Motion capture has become a reliable approach to achieve seamless synchronization of the movements between avatars and human beings, which plays an important role in diverse Metaverse applications. However, due to the continuous growth of data, current communication systems face a significant challenge of meeting the demand of ultra-low latency during application. In addition, current methods also have shortcomings when selecting keyframes, e.g., relying on recognizing motion types and artificially selected keyframes. Therefore, the utilization of keyframe extraction and motion reconstruction techniques could be considered a feasible and promising solution. In this work, a new motion reconstruction algorithm is designed in a spherical coordinate system involving location and velocity information. Then, we formalize the keyframe extraction problem into an optimization problem to reduce the reconstruction error. Using Deep Q-Learning (DQL), the Spherical Interpolation based Deep Q-Learning (SIDQL) framework is proposed to generate proper keyframes for reconstructing the motion sequences. We use the CMU database to train and evaluate the framework. Our scheme can significantly reduce the data volume and transmission latency compared to various baselines while maintaining a reconstruction error of less than 0.09 when extracting five keyframes.
