Table of Contents
Fetching ...

SIDQL: An Efficient Keyframe Extraction and Motion Reconstruction Framework in Motion Capture

Xuling Zhang, Ziru Zhang, Yuyang Wang, Lik-hang Lee, Pan Hui

TL;DR

This work tackles ultra-low latency in motion capture for Metaverse applications by introducing SIDQL, a framework that uses Deep Q-Learning to select keyframes and a velocity-informed, bone-length-preserving reconstruction in a spherical coordinate system. The core idea is to minimize the mean reconstruction error $Q(S,K)$ by choosing keyframe indices $K$, enabling substantial data reduction while keeping the reconstruction error below $0.09$ with five keyframes. The method converts pose data into spherical coordinates, reconstructs non-root joints via spherical interpolation and roots via polynomial interpolation, and learns a label-free keyframe policy that generalizes over mixed-category motions. Experiments on the CMU MoCap dataset show significant latency reductions and competitive accuracy, with SIDQL offering a fast alternative to greedy approaches while achieving similar reconstruction quality.

Abstract

Metaverse, which integrates the virtual and physical worlds, has emerged as an innovative paradigm for changing people's lifestyles. Motion capture has become a reliable approach to achieve seamless synchronization of the movements between avatars and human beings, which plays an important role in diverse Metaverse applications. However, due to the continuous growth of data, current communication systems face a significant challenge of meeting the demand of ultra-low latency during application. In addition, current methods also have shortcomings when selecting keyframes, e.g., relying on recognizing motion types and artificially selected keyframes. Therefore, the utilization of keyframe extraction and motion reconstruction techniques could be considered a feasible and promising solution. In this work, a new motion reconstruction algorithm is designed in a spherical coordinate system involving location and velocity information. Then, we formalize the keyframe extraction problem into an optimization problem to reduce the reconstruction error. Using Deep Q-Learning (DQL), the Spherical Interpolation based Deep Q-Learning (SIDQL) framework is proposed to generate proper keyframes for reconstructing the motion sequences. We use the CMU database to train and evaluate the framework. Our scheme can significantly reduce the data volume and transmission latency compared to various baselines while maintaining a reconstruction error of less than 0.09 when extracting five keyframes.

SIDQL: An Efficient Keyframe Extraction and Motion Reconstruction Framework in Motion Capture

TL;DR

This work tackles ultra-low latency in motion capture for Metaverse applications by introducing SIDQL, a framework that uses Deep Q-Learning to select keyframes and a velocity-informed, bone-length-preserving reconstruction in a spherical coordinate system. The core idea is to minimize the mean reconstruction error by choosing keyframe indices , enabling substantial data reduction while keeping the reconstruction error below with five keyframes. The method converts pose data into spherical coordinates, reconstructs non-root joints via spherical interpolation and roots via polynomial interpolation, and learns a label-free keyframe policy that generalizes over mixed-category motions. Experiments on the CMU MoCap dataset show significant latency reductions and competitive accuracy, with SIDQL offering a fast alternative to greedy approaches while achieving similar reconstruction quality.

Abstract

Metaverse, which integrates the virtual and physical worlds, has emerged as an innovative paradigm for changing people's lifestyles. Motion capture has become a reliable approach to achieve seamless synchronization of the movements between avatars and human beings, which plays an important role in diverse Metaverse applications. However, due to the continuous growth of data, current communication systems face a significant challenge of meeting the demand of ultra-low latency during application. In addition, current methods also have shortcomings when selecting keyframes, e.g., relying on recognizing motion types and artificially selected keyframes. Therefore, the utilization of keyframe extraction and motion reconstruction techniques could be considered a feasible and promising solution. In this work, a new motion reconstruction algorithm is designed in a spherical coordinate system involving location and velocity information. Then, we formalize the keyframe extraction problem into an optimization problem to reduce the reconstruction error. Using Deep Q-Learning (DQL), the Spherical Interpolation based Deep Q-Learning (SIDQL) framework is proposed to generate proper keyframes for reconstructing the motion sequences. We use the CMU database to train and evaluate the framework. Our scheme can significantly reduce the data volume and transmission latency compared to various baselines while maintaining a reconstruction error of less than 0.09 when extracting five keyframes.
Paper Structure (21 sections, 27 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 21 sections, 27 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: The procedure of reconstruction.
  • Figure 2: The procedure of SIDQL.
  • Figure 3: Simulation result of learning rate. Left: Loss performance of learning rate. Right: Mean angle error of learning rate.
  • Figure 4: Mean angle error of training interval.
  • Figure 5: Simulation result of memory size. Left: Loss performance of memory size. Right: Mean angle error of memory size.
  • ...and 2 more figures