Improving 2D Human Pose Estimation in Rare Camera Views with Synthetic Data

Miroslav Purkrabek; Jiri Matas

Improving 2D Human Pose Estimation in Rare Camera Views with Synthetic Data

Miroslav Purkrabek, Jiri Matas

TL;DR

This work tackles the scarcity of extreme-view 2D human pose data by introducing RePoGen, an SMPL-X-based synthetic data generator that can produce novel poses and unseen views to augment COCO. By sampling from a bounded pose space and applying textures and random backgrounds, RePoGen yields diverse training data that improves top- and bottom-view pose estimation without sacrificing orbital-view accuracy, advancing performance in extreme-view scenarios. The authors present a new RePo dataset of real extreme-view images and the RePoGen dataset variants, demonstrate strong gains over baselines and AMASS-based synthesis, and show that strong rotation augmentation is crucial for extreme-view robustness. They also provide an analysis of pose spaces and emphasize that anatomical plausibility is not strictly required for effective learning, underscoring the practical impact of synthetic data in rare-camera-view contexts.

Abstract

Methods and datasets for human pose estimation focus predominantly on side- and front-view scenarios. We overcome the limitation by leveraging synthetic data and introduce RePoGen (RarE POses GENerator), an SMPL-based method for generating synthetic humans with comprehensive control over pose and view. Experiments on top-view datasets and a new dataset of real images with diverse poses show that adding the RePoGen data to the COCO dataset outperforms previous approaches to top- and bottom-view pose estimation without harming performance on common views. An ablation study shows that anatomical plausibility, a property prior research focused on, is not a prerequisite for effective performance. The introduced dataset and the corresponding code are available on https://mirapurkrabek.github.io/RePoGen-paper/ .

Improving 2D Human Pose Estimation in Rare Camera Views with Synthetic Data

TL;DR

Abstract

Paper Structure (21 sections, 13 figures, 6 tables)

This paper contains 21 sections, 13 figures, 6 tables.

Introduction
Related Work
Method
Pose Generation
Texture
Random Background
Ground Truth Extraction
Experiments
Implementation Details
Datasets
Comparison with baseline
Ablation Study
Conclusions
Pose spaces analysis
Definitions
...and 6 more sections

Figures (13)

Figure 1: Pose estimation trained on COCO (left) and by our method (right). The COCO trained model swaps the left and right sides and interprets the right hand as the left leg and the right leg as the left hand (color codes the corresponding label).
Figure 2: Examples from the RePo test set. ViTPose-s estimates when trained on COCO (left) and on RePoGen data (right). Colors as in \ref{['fig:intro']} -- right hand, right leg, left hand and left leg
Figure 3: RePoGen synthetic data generation pipeline. All steps are detailed in \ref{['sec:method']}. The ground truth outputs of the method are (A) 2D and 3D keypoints, (B) the depth map, (C) the mask, and (D) an RGB image.
Figure 4: Set of tested joint rotation distributions used for pose generation. The pair of Gaussians is used in the final pipeline. Shown distribution is for left shoulder external and internal rotation.
Figure 5: AP on the Bottom dataset of RePo; training with pair of Gaussians distribution with different values of pose variance. Low pose variance means that poses are not diverse enough, while high numbers signify too unrealistic poses.
...and 8 more figures

Theorems & Definitions (5)

Definition 3.1: $Space \ P^{bounded}$
Definition 6.1: $P^{all}$
Definition 6.2: $P^{bounded}$
Definition 6.3: $P^{anatomical}$
Definition 6.4: $P^{AMASS}$

Improving 2D Human Pose Estimation in Rare Camera Views with Synthetic Data

TL;DR

Abstract

Improving 2D Human Pose Estimation in Rare Camera Views with Synthetic Data

Authors

TL;DR

Abstract

Table of Contents

Figures (13)

Theorems & Definitions (5)