SHARE: Single-view Human Adversarial REconstruction
Shreelekha Revankar, Shijia Liao, Yu Shen, Junbang Liang, Huaishu Peng, Ming Lin
TL;DR
SHARE tackles the vulnerability of single-view 3D human pose and shape reconstruction to camera pose variations by introducing a general, adversarial data augmentation framework. It builds a pose-conditioned loss landscape $L_{3d}=f(\theta,\phi)$, uses RoME to focus sampling on high-error regions, and iteratively fine-tunes existing HPS models (e.g., HMR, SPIN, PARE, CLIFF, ExPose) without architectural changes. Across benchmarks and even hand reconstruction, SHARE yields substantial accuracy gains while preserving baseline performance, and a user study confirms perceptual improvements. The work offers public data-generation tools and demonstrates practical impact for diverse real-world applications, while acknowledging limitations related to non-pose factors such as body size and skin tone.
Abstract
The accuracy of 3D Human Pose and Shape reconstruction (HPS) from an image is progressively improving. Yet, no known method is robust across all image distortion. To address issues due to variations of camera poses, we introduce SHARE, a novel fine-tuning method that utilizes adversarial data augmentation to enhance the robustness of existing HPS techniques. We perform a comprehensive analysis on the impact of camera poses on HPS reconstruction outcomes. We first generated large-scale image datasets captured systematically from diverse camera perspectives. We then established a mapping between camera poses and reconstruction errors as a continuous function that characterizes the relationship between camera poses and HPS quality. Leveraging this representation, we introduce RoME (Regions of Maximal Error), a novel sampling technique for our adversarial fine-tuning method. The SHARE framework is generalizable across various single-view HPS methods and we demonstrate its performance on HMR, SPIN, PARE, CLIFF and ExPose. Our results illustrate a reduction in mean joint errors across single-view HPS techniques, for images captured from multiple camera positions without compromising their baseline performance. In many challenging cases, our method surpasses the performance of existing models, highlighting its practical significance for diverse real-world applications.
