Table of Contents
Fetching ...

SHARE: Single-view Human Adversarial REconstruction

Shreelekha Revankar, Shijia Liao, Yu Shen, Junbang Liang, Huaishu Peng, Ming Lin

TL;DR

SHARE tackles the vulnerability of single-view 3D human pose and shape reconstruction to camera pose variations by introducing a general, adversarial data augmentation framework. It builds a pose-conditioned loss landscape $L_{3d}=f(\theta,\phi)$, uses RoME to focus sampling on high-error regions, and iteratively fine-tunes existing HPS models (e.g., HMR, SPIN, PARE, CLIFF, ExPose) without architectural changes. Across benchmarks and even hand reconstruction, SHARE yields substantial accuracy gains while preserving baseline performance, and a user study confirms perceptual improvements. The work offers public data-generation tools and demonstrates practical impact for diverse real-world applications, while acknowledging limitations related to non-pose factors such as body size and skin tone.

Abstract

The accuracy of 3D Human Pose and Shape reconstruction (HPS) from an image is progressively improving. Yet, no known method is robust across all image distortion. To address issues due to variations of camera poses, we introduce SHARE, a novel fine-tuning method that utilizes adversarial data augmentation to enhance the robustness of existing HPS techniques. We perform a comprehensive analysis on the impact of camera poses on HPS reconstruction outcomes. We first generated large-scale image datasets captured systematically from diverse camera perspectives. We then established a mapping between camera poses and reconstruction errors as a continuous function that characterizes the relationship between camera poses and HPS quality. Leveraging this representation, we introduce RoME (Regions of Maximal Error), a novel sampling technique for our adversarial fine-tuning method. The SHARE framework is generalizable across various single-view HPS methods and we demonstrate its performance on HMR, SPIN, PARE, CLIFF and ExPose. Our results illustrate a reduction in mean joint errors across single-view HPS techniques, for images captured from multiple camera positions without compromising their baseline performance. In many challenging cases, our method surpasses the performance of existing models, highlighting its practical significance for diverse real-world applications.

SHARE: Single-view Human Adversarial REconstruction

TL;DR

SHARE tackles the vulnerability of single-view 3D human pose and shape reconstruction to camera pose variations by introducing a general, adversarial data augmentation framework. It builds a pose-conditioned loss landscape , uses RoME to focus sampling on high-error regions, and iteratively fine-tunes existing HPS models (e.g., HMR, SPIN, PARE, CLIFF, ExPose) without architectural changes. Across benchmarks and even hand reconstruction, SHARE yields substantial accuracy gains while preserving baseline performance, and a user study confirms perceptual improvements. The work offers public data-generation tools and demonstrates practical impact for diverse real-world applications, while acknowledging limitations related to non-pose factors such as body size and skin tone.

Abstract

The accuracy of 3D Human Pose and Shape reconstruction (HPS) from an image is progressively improving. Yet, no known method is robust across all image distortion. To address issues due to variations of camera poses, we introduce SHARE, a novel fine-tuning method that utilizes adversarial data augmentation to enhance the robustness of existing HPS techniques. We perform a comprehensive analysis on the impact of camera poses on HPS reconstruction outcomes. We first generated large-scale image datasets captured systematically from diverse camera perspectives. We then established a mapping between camera poses and reconstruction errors as a continuous function that characterizes the relationship between camera poses and HPS quality. Leveraging this representation, we introduce RoME (Regions of Maximal Error), a novel sampling technique for our adversarial fine-tuning method. The SHARE framework is generalizable across various single-view HPS methods and we demonstrate its performance on HMR, SPIN, PARE, CLIFF and ExPose. Our results illustrate a reduction in mean joint errors across single-view HPS techniques, for images captured from multiple camera positions without compromising their baseline performance. In many challenging cases, our method surpasses the performance of existing models, highlighting its practical significance for diverse real-world applications.
Paper Structure (22 sections, 3 equations, 11 figures, 3 tables)

This paper contains 22 sections, 3 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: The SHARE Framework adversarially augments and modifies synthetic training data for a single-view HPS model. It is initialized by generating training data from all camera poses. Each iteration of SHARE operates in four phases: (1) augmenting the model's training data to train the model, (2) assessing camera-pose-wise performance, (3) sampling the most adversarial camera poses and (4) down-selecting a new training dataset for augmentation using RoME sampled poses.
  • Figure 2: Examples of images generated using our data generator from various camera poses.
  • Figure 3: Sensitivity analysis on PARE kocabas2021pare with respect to camera pose using PA-MPJPE. The x-axis iterates through all camera poses $(\theta,\phi)$, where $\phi$ represents the azimuthal angle around the body (0, 360), and $\theta$ represents the vertical viewing angle (-60, 60) for each $\phi$. The y-axis represents the average error in PA-MPJPE over a diverse dataset encompassing a wide range of bodies, body poses, and environments. This plot explicitly depicts the average error associated with each camera pose, revealing a discernible oscillatory bias with varying performance in different regions around the human body. Additional plots for single-person datasets and comparisons with other HPS techniques are available in the appendix.
  • Figure 4: Loss Landscape for PARE kocabas2021pare. The x-axis represents the scaled $\theta$ values, while the y-axis represents the scaled $\phi$ values, the z-axis depicts the predicted PA-MPJPE associated with a given camera pose. We include the loss landscapes for all baselines in the appendix.
  • Figure 5: Qualitative results on internet and MPI-INF-3DHP images using baselines kocabas2021parehmrKanazawa17li2022cliffkolotouros2019spin before (center in red) and after fine-tuning with SHARE (right in green). Additional qualitative results on MPI-INF-3DHP mono-3dhp2017 for individual baselines can be found in the appendix.
  • ...and 6 more figures