FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration

Yu Rong; Takaaki Shiratori; Hanbyul Joo

FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration

Yu Rong, Takaaki Shiratori, Hanbyul Joo

TL;DR

FrankMocap tackles the challenge of monocularly capturing simultaneous 3D hand and body motion by separating the problem into hand and body regression modules that output SMPL-X parameters, followed by an integration step to form a cohesive whole-body representation. The system achieves near real-time performance (~9.5 fps) with a fast copy-and-paste integration, and offers an optional optimization-based refinement that leverages 2D keypoints and exemplar priors for improved accuracy. It demonstrates state-of-the-art hand pose accuracy on public benchmarks and strong whole-body results in diverse, in-the-wild scenes, including live demos. Ablation studies validate the benefits of multi-dataset training and motion blur augmentation for robust in-the-wild generalization.

Abstract

Although the essential nuance of human motion is often conveyed as a combination of body movements and hand gestures, the existing monocular motion capture approaches mostly focus on either body motion capture only ignoring hand parts or hand motion capture only without considering body motion. In this paper, we present FrankMocap, a motion capture system that can estimate both 3D hand and body motion from in-the-wild monocular inputs with faster speed (9.5 fps) and better accuracy than previous work. Our method works in near real-time (9.5 fps) and produces 3D body and hand motion capture outputs as a unified parametric model structure. Our method aims to capture 3D body and hand motion simultaneously from challenging in-the-wild monocular videos. To construct FrankMocap, we build the state-of-the-art monocular 3D "hand" motion capture method by taking the hand part of the whole body parametric model (SMPL-X). Our 3D hand motion capture output can be efficiently integrated to monocular body motion capture output, producing whole body motion results in a unified parrametric model structure. We demonstrate the state-of-the-art performance of our hand motion capture system in public benchmarks, and show the high quality of our whole body motion capture result in various challenging real-world scenes, including a live demo scenario.

FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration

TL;DR

Abstract

FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)