Table of Contents
Fetching ...

Improving 2D Human Pose Estimation in Rare Camera Views with Synthetic Data

Miroslav Purkrabek, Jiri Matas

TL;DR

This work tackles the scarcity of extreme-view 2D human pose data by introducing RePoGen, an SMPL-X-based synthetic data generator that can produce novel poses and unseen views to augment COCO. By sampling from a bounded pose space and applying textures and random backgrounds, RePoGen yields diverse training data that improves top- and bottom-view pose estimation without sacrificing orbital-view accuracy, advancing performance in extreme-view scenarios. The authors present a new RePo dataset of real extreme-view images and the RePoGen dataset variants, demonstrate strong gains over baselines and AMASS-based synthesis, and show that strong rotation augmentation is crucial for extreme-view robustness. They also provide an analysis of pose spaces and emphasize that anatomical plausibility is not strictly required for effective learning, underscoring the practical impact of synthetic data in rare-camera-view contexts.

Abstract

Methods and datasets for human pose estimation focus predominantly on side- and front-view scenarios. We overcome the limitation by leveraging synthetic data and introduce RePoGen (RarE POses GENerator), an SMPL-based method for generating synthetic humans with comprehensive control over pose and view. Experiments on top-view datasets and a new dataset of real images with diverse poses show that adding the RePoGen data to the COCO dataset outperforms previous approaches to top- and bottom-view pose estimation without harming performance on common views. An ablation study shows that anatomical plausibility, a property prior research focused on, is not a prerequisite for effective performance. The introduced dataset and the corresponding code are available on https://mirapurkrabek.github.io/RePoGen-paper/ .

Improving 2D Human Pose Estimation in Rare Camera Views with Synthetic Data

TL;DR

This work tackles the scarcity of extreme-view 2D human pose data by introducing RePoGen, an SMPL-X-based synthetic data generator that can produce novel poses and unseen views to augment COCO. By sampling from a bounded pose space and applying textures and random backgrounds, RePoGen yields diverse training data that improves top- and bottom-view pose estimation without sacrificing orbital-view accuracy, advancing performance in extreme-view scenarios. The authors present a new RePo dataset of real extreme-view images and the RePoGen dataset variants, demonstrate strong gains over baselines and AMASS-based synthesis, and show that strong rotation augmentation is crucial for extreme-view robustness. They also provide an analysis of pose spaces and emphasize that anatomical plausibility is not strictly required for effective learning, underscoring the practical impact of synthetic data in rare-camera-view contexts.

Abstract

Methods and datasets for human pose estimation focus predominantly on side- and front-view scenarios. We overcome the limitation by leveraging synthetic data and introduce RePoGen (RarE POses GENerator), an SMPL-based method for generating synthetic humans with comprehensive control over pose and view. Experiments on top-view datasets and a new dataset of real images with diverse poses show that adding the RePoGen data to the COCO dataset outperforms previous approaches to top- and bottom-view pose estimation without harming performance on common views. An ablation study shows that anatomical plausibility, a property prior research focused on, is not a prerequisite for effective performance. The introduced dataset and the corresponding code are available on https://mirapurkrabek.github.io/RePoGen-paper/ .
Paper Structure (21 sections, 13 figures, 6 tables)

This paper contains 21 sections, 13 figures, 6 tables.

Figures (13)

  • Figure 1: Pose estimation trained on COCO (left) and by our method (right). The COCO trained model swaps the left and right sides and interprets the right hand as the left leg and the right leg as the left hand (color codes the corresponding label).
  • Figure 2: Examples from the RePo test set. ViTPose-s estimates when trained on COCO (left) and on RePoGen data (right). Colors as in \ref{['fig:intro']} -- right hand, right leg, left hand and left leg
  • Figure 3: RePoGen synthetic data generation pipeline. All steps are detailed in \ref{['sec:method']}. The ground truth outputs of the method are (A) 2D and 3D keypoints, (B) the depth map, (C) the mask, and (D) an RGB image.
  • Figure 4: Set of tested joint rotation distributions used for pose generation. The pair of Gaussians is used in the final pipeline. Shown distribution is for left shoulder external and internal rotation.
  • Figure 5: AP on the Bottom dataset of RePo; training with pair of Gaussians distribution with different values of pose variance. Low pose variance means that poses are not diverse enough, while high numbers signify too unrealistic poses.
  • ...and 8 more figures

Theorems & Definitions (5)

  • Definition 3.1: $Space \ P^{bounded}$
  • Definition 6.1: $P^{all}$
  • Definition 6.2: $P^{bounded}$
  • Definition 6.3: $P^{anatomical}$
  • Definition 6.4: $P^{AMASS}$