Table of Contents
Fetching ...

Data Augmentation in Human-Centric Vision

Wentao Jiang, Yige Zhang, Shaozhong Zheng, Si Liu, Shuicheng Yan

TL;DR

The paper addresses data scarcity and overfitting in human-centric vision tasks such as person ReID, human parsing, pose estimation, and pedestrian detection. It presents a taxonomy separating data perturbation and data generation, detailing subtypes and mapping them to each task. A comprehensive literature review, task-specific insights, and future directions—particularly the potential of Latent Diffusion Models for realistic augmentation—are its core contributions. This framework guides the development of more robust, accurate, and efficient human-centric vision systems by systematically expanding training data and improving generalization.

Abstract

This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks, a first of its kind in the field. It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection, addressing the significant challenges posed by overfitting and limited training data in these domains. Our work categorizes data augmentation methods into two main types: data generation and data perturbation. Data generation covers techniques like graphic engine-based generation, generative model-based generation, and data recombination, while data perturbation is divided into image-level and human-level perturbations. Each method is tailored to the unique requirements of human-centric tasks, with some applicable across multiple areas. Our contributions include an extensive literature review, providing deep insights into the influence of these augmentation techniques in human-centric vision and highlighting the nuances of each method. We also discuss open issues and future directions, such as the integration of advanced generative models like Latent Diffusion Models, for creating more realistic and diverse training data. This survey not only encapsulates the current state of data augmentation in human-centric vision but also charts a course for future research, aiming to develop more robust, accurate, and efficient human-centric vision systems.

Data Augmentation in Human-Centric Vision

TL;DR

The paper addresses data scarcity and overfitting in human-centric vision tasks such as person ReID, human parsing, pose estimation, and pedestrian detection. It presents a taxonomy separating data perturbation and data generation, detailing subtypes and mapping them to each task. A comprehensive literature review, task-specific insights, and future directions—particularly the potential of Latent Diffusion Models for realistic augmentation—are its core contributions. This framework guides the development of more robust, accurate, and efficient human-centric vision systems by systematically expanding training data and improving generalization.

Abstract

This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks, a first of its kind in the field. It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection, addressing the significant challenges posed by overfitting and limited training data in these domains. Our work categorizes data augmentation methods into two main types: data generation and data perturbation. Data generation covers techniques like graphic engine-based generation, generative model-based generation, and data recombination, while data perturbation is divided into image-level and human-level perturbations. Each method is tailored to the unique requirements of human-centric tasks, with some applicable across multiple areas. Our contributions include an extensive literature review, providing deep insights into the influence of these augmentation techniques in human-centric vision and highlighting the nuances of each method. We also discuss open issues and future directions, such as the integration of advanced generative models like Latent Diffusion Models, for creating more realistic and diverse training data. This survey not only encapsulates the current state of data augmentation in human-centric vision but also charts a course for future research, aiming to develop more robust, accurate, and efficient human-centric vision systems.
Paper Structure (25 sections, 21 figures, 7 tables)

This paper contains 25 sections, 21 figures, 7 tables.

Figures (21)

  • Figure 1: Examples of global perturbation. The figure contains representative works of Style Transfer zhong2018camstyle, Scaling Rotating and Occluding peng2018jointly and Noise Injection wang2021human.
  • Figure 2: Examples of region-level perturbation. The figure contains representative works of Random Earsing zhong2020random, CutOut and Stylized Cygert2020TowardRP and Random Grayscale Patch Replacement Gong2021APR.
  • Figure 3: Examples of human-level occlusion generation. The figure contains representative works of Keypoint masking ke2018multi, Copy-Paste bin2020adversarial and Nearby-person occlusion chen2021nearby.
  • Figure 4: Examples of human body perturbation. The figure contains representative works of Deformable Shape Augmentation chen2021shape, 2D Pose Transformation jiang2022posetrans and 3D Pose Transformation guan2023posegu.
  • Figure 5: Examples of graphic engine-based generation. The figure contains representative works of MixedPeds Cheung2017MixedPedsPD, Synthetic Human on Real Background chen2016synthesizing and Synthetic Humans varol2017learning.
  • ...and 16 more figures