Table of Contents
Fetching ...

PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images

Hongwen Zhang, Yating Tian, Yuxiang Zhang, Mengcheng Li, Liang An, Zhenan Sun, Yebin Liu

TL;DR

This work targets the problem of recovering expressive full-body meshes from monocular images with accurate mesh-image alignment. It introduces PyMAF, a regression-based loop that leverages a feature pyramid to extract mesh-aligned evidence and propagates it back to refine parameters, aided by auxiliary dense supervision and spatial alignment attention. It extends to PyMAF-X for full-body recovery by employing three part-specific regressors and an adaptive elbow-twist compensation to produce natural wrist poses while preserving part alignment. The approach achieves state-of-the-art or competitive results across body, hand, face, and full-body benchmarks, and demonstrates robust performance on indoor and in-the-wild data, with an efficient runtime suitable for practical use.

Abstract

We present PyMAF-X, a regression-based approach to recovering parametric full-body models from monocular images. This task is very challenging since minor parametric deviation may lead to noticeable misalignment between the estimated mesh and the input image. Moreover, when integrating part-specific estimations into the full-body model, existing solutions tend to either degrade the alignment or produce unnatural wrist poses. To address these issues, we propose a Pyramidal Mesh Alignment Feedback (PyMAF) loop in our regression network for well-aligned human mesh recovery and extend it as PyMAF-X for the recovery of expressive full-body models. The core idea of PyMAF is to leverage a feature pyramid and rectify the predicted parameters explicitly based on the mesh-image alignment status. Specifically, given the currently predicted parameters, mesh-aligned evidence will be extracted from finer-resolution features accordingly and fed back for parameter rectification. To enhance the alignment perception, an auxiliary dense supervision is employed to provide mesh-image correspondence guidance while spatial alignment attention is introduced to enable the awareness of the global contexts for our network. When extending PyMAF for full-body mesh recovery, an adaptive integration strategy is proposed in PyMAF-X to produce natural wrist poses while maintaining the well-aligned performance of the part-specific estimations. The efficacy of our approach is validated on several benchmark datasets for body, hand, face, and full-body mesh recovery, where PyMAF and PyMAF-X effectively improve the mesh-image alignment and achieve new state-of-the-art results. The project page with code and video results can be found at https://zhanghongwen.cn/pymaf-x.

PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images

TL;DR

This work targets the problem of recovering expressive full-body meshes from monocular images with accurate mesh-image alignment. It introduces PyMAF, a regression-based loop that leverages a feature pyramid to extract mesh-aligned evidence and propagates it back to refine parameters, aided by auxiliary dense supervision and spatial alignment attention. It extends to PyMAF-X for full-body recovery by employing three part-specific regressors and an adaptive elbow-twist compensation to produce natural wrist poses while preserving part alignment. The approach achieves state-of-the-art or competitive results across body, hand, face, and full-body benchmarks, and demonstrates robust performance on indoor and in-the-wild data, with an efficient runtime suitable for practical use.

Abstract

We present PyMAF-X, a regression-based approach to recovering parametric full-body models from monocular images. This task is very challenging since minor parametric deviation may lead to noticeable misalignment between the estimated mesh and the input image. Moreover, when integrating part-specific estimations into the full-body model, existing solutions tend to either degrade the alignment or produce unnatural wrist poses. To address these issues, we propose a Pyramidal Mesh Alignment Feedback (PyMAF) loop in our regression network for well-aligned human mesh recovery and extend it as PyMAF-X for the recovery of expressive full-body models. The core idea of PyMAF is to leverage a feature pyramid and rectify the predicted parameters explicitly based on the mesh-image alignment status. Specifically, given the currently predicted parameters, mesh-aligned evidence will be extracted from finer-resolution features accordingly and fed back for parameter rectification. To enhance the alignment perception, an auxiliary dense supervision is employed to provide mesh-image correspondence guidance while spatial alignment attention is introduced to enable the awareness of the global contexts for our network. When extending PyMAF for full-body mesh recovery, an adaptive integration strategy is proposed in PyMAF-X to produce natural wrist poses while maintaining the well-aligned performance of the part-specific estimations. The efficacy of our approach is validated on several benchmark datasets for body, hand, face, and full-body mesh recovery, where PyMAF and PyMAF-X effectively improve the mesh-image alignment and achieve new state-of-the-art results. The project page with code and video results can be found at https://zhanghongwen.cn/pymaf-x.
Paper Structure (43 sections, 10 equations, 18 figures, 11 tables)

This paper contains 43 sections, 10 equations, 18 figures, 11 tables.

Figures (18)

  • Figure 1: Top: PyMAF improves the mesh-image alignment of the estimated mesh. Bottom: PyMAF-X produces well-aligned full-body meshes with natural wrist poses.
  • Figure 2: (a) The commonly-used iterative error feedback. (b) The proposed mesh alignment feedback. (c) Mesh-aligned evidence extracted from a feature pyramid.
  • Figure 3: Illustration of the proposed Pyramidal Mesh Alignment Feedback (PyMAF) for human mesh recovery. PyMAF leverages a feature pyramid and enables an alignment feedback loop in our network. Given a coarse-aligned model prediction, mesh-aligned evidence is extracted from finer-resolution features accordingly and fed back to a regressor for parameter rectification.
  • Figure 4: Visualization of the spatial feature maps and predicted dense correspondences. Top: Input images. Second / Third Row: Spatial feature maps learned without/with Auxiliary Supervision (AS). Bottom: Predicted dense correspondence maps under auxiliary supervision.
  • Figure 5: The overall pipeline of PyMAF-X for full-body mesh recovery. PyMAF-X consists of three part-specific PyMAFs for part mesh prediction and integrates them together via the proposed adaptive integration strategy.
  • ...and 13 more figures