Table of Contents
Fetching ...

A Plug-and-Play Physical Motion Restoration Approach for In-the-Wild High-Difficulty Motions

Youliang Zhang, Ronghui Li, Yachao Zhang, Liang Pan, Jingbo Wang, Yebin Liu, Xiu Li

TL;DR

This work introduces a mask-based motion correction module (MCM) that leverages motion context and video mask to repair flawed motions, producing imitation-friendly motions; and proposes a physics-based motion transfer module (PTM), which employs a pretrain and adapt approach for motion imitation, improving physical plausibility with the ability to handle in-the-wild and challenging motions.

Abstract

Extracting physically plausible 3D human motion from videos is a critical task. Although existing simulation-based motion imitation methods can enhance the physical quality of daily motions estimated from monocular video capture, extending this capability to high-difficulty motions remains an open challenge. This can be attributed to some flawed motion clips in video-based motion capture results and the inherent complexity in modeling high-difficulty motions. Therefore, sensing the advantage of segmentation in localizing human body, we introduce a mask-based motion correction module (MCM) that leverages motion context and video mask to repair flawed motions, producing imitation-friendly motions; and propose a physics-based motion transfer module (PTM), which employs a pretrain and adapt approach for motion imitation, improving physical plausibility with the ability to handle in-the-wild and challenging motions. Our approach is designed as a plug-and-play module to physically refine the video motion capture results, including high-difficulty in-the-wild motions. Finally, to validate our approach, we collected a challenging in-the-wild test set to establish a benchmark, and our method has demonstrated effectiveness on both the new benchmark and existing public datasets.https://physicalmotionrestoration.github.io

A Plug-and-Play Physical Motion Restoration Approach for In-the-Wild High-Difficulty Motions

TL;DR

This work introduces a mask-based motion correction module (MCM) that leverages motion context and video mask to repair flawed motions, producing imitation-friendly motions; and proposes a physics-based motion transfer module (PTM), which employs a pretrain and adapt approach for motion imitation, improving physical plausibility with the ability to handle in-the-wild and challenging motions.

Abstract

Extracting physically plausible 3D human motion from videos is a critical task. Although existing simulation-based motion imitation methods can enhance the physical quality of daily motions estimated from monocular video capture, extending this capability to high-difficulty motions remains an open challenge. This can be attributed to some flawed motion clips in video-based motion capture results and the inherent complexity in modeling high-difficulty motions. Therefore, sensing the advantage of segmentation in localizing human body, we introduce a mask-based motion correction module (MCM) that leverages motion context and video mask to repair flawed motions, producing imitation-friendly motions; and propose a physics-based motion transfer module (PTM), which employs a pretrain and adapt approach for motion imitation, improving physical plausibility with the ability to handle in-the-wild and challenging motions. Our approach is designed as a plug-and-play module to physically refine the video motion capture results, including high-difficulty in-the-wild motions. Finally, to validate our approach, we collected a challenging in-the-wild test set to establish a benchmark, and our method has demonstrated effectiveness on both the new benchmark and existing public datasets.https://physicalmotionrestoration.github.io

Paper Structure

This paper contains 17 sections, 6 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Illustration of motivation and two main challenges. (a) Our method effectively enhances the physical plausibility of video-captured motions, even handling high-difficulty motions like backflips. (b) highlights the challenging movements in the original video lead to flawed motion estimated by current video motion capture algorithms, where the current motion imitation model fails to restore overly degraded flawed motions. (c) demonstrates that even when video motion capture provides reasonable reference motions, existing motion imitation techniques still fail to track complex motions.
  • Figure 2: Illustration of our proposed method. If no mismatch is detected between the human mask and noise motion, the correction process will be skipped, and our PTM directly takes the noise motion as input. When failed with challenge motions, our PTM will adapt the policy to the current motion and update the network parameters until success or reach a certain step threshold.
  • Figure 3: Qualitative comparison with state-of-the-art method.