Table of Contents
Fetching ...

Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

Inès Hyeonsu Kim, JoungBin Lee, Woojeong Jin, Soowon Son, Kyusun Cho, Junyoung Seo, Min-Seop Kwak, Seokju Cho, JeongYeol Baek, Byeongwon Lee, Seungryong Kim

TL;DR

Pose-dIVE is a novel data augmentation approach that incorporates sparse and underrepresented human pose and camera viewpoint examples into the training data, addressing the limited diversity in the original training data distribution and enhancing the generalizability of Re-ID models compared to other data augmentation-based Re-ID approaches.

Abstract

Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. We propose Pose-dIVE, a novel data augmentation approach that incorporates sparse and underrepresented human pose and camera viewpoint examples into the training data, addressing the limited diversity in the original training data distribution. Our objective is to augment the training dataset to enable existing Re-ID models to learn features unbiased by human pose and camera viewpoint variations. To achieve this, we leverage the knowledge of pre-trained large-scale diffusion models. By conditioning the diffusion model on both the human pose and camera viewpoint concurrently through the SMPL model, we generate training data with diverse human poses and camera viewpoints. Experimental results demonstrate the effectiveness of our method in addressing human pose bias and enhancing the generalizability of Re-ID models compared to other data augmentation-based Re-ID approaches.

Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

TL;DR

Pose-dIVE is a novel data augmentation approach that incorporates sparse and underrepresented human pose and camera viewpoint examples into the training data, addressing the limited diversity in the original training data distribution and enhancing the generalizability of Re-ID models compared to other data augmentation-based Re-ID approaches.

Abstract

Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. We propose Pose-dIVE, a novel data augmentation approach that incorporates sparse and underrepresented human pose and camera viewpoint examples into the training data, addressing the limited diversity in the original training data distribution. Our objective is to augment the training dataset to enable existing Re-ID models to learn features unbiased by human pose and camera viewpoint variations. To achieve this, we leverage the knowledge of pre-trained large-scale diffusion models. By conditioning the diffusion model on both the human pose and camera viewpoint concurrently through the SMPL model, we generate training data with diverse human poses and camera viewpoints. Experimental results demonstrate the effectiveness of our method in addressing human pose bias and enhancing the generalizability of Re-ID models compared to other data augmentation-based Re-ID approaches.
Paper Structure (35 sections, 1 equation, 10 figures, 6 tables)

This paper contains 35 sections, 1 equation, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Pose-dIVE diversifies the viewpoint and human pose of the Re-ID dataset to help generalize and improve the performance of arbitrary Re-ID models.
  • Figure 2: Visualization of the effect of viewpoint and human pose augmentation. We compare visualizations of camera viewpoint and human pose distributions for the Market-1501 Market1501. The left figures (i) display the camera viewpoint distribution derived from SMPL, while the right figures (ii) illustrate the pose distribution. In (i), from left to right, we show the viewpoint distributions of the training dataset, the augmented dataset, and the combination of both. Similarly, in (ii), from left to right, we present t-SNE van2008visualizing visualizations of the human pose distributions, showing poses from the training dataset, followed by augmented poses sourced from outside the dataset. These visualizations demonstrate that our pose augmentation successfully diversifies both viewpoint and human pose distributions.
  • Figure 3: Pose-dIVE framework. Upon observing the highly biased viewpoint and human pose distributions in training dataset, we augment the dataset by manipulating SMPL body shapes and feeding the rendered shapes into a generative model to fill in sparsely distributed poses and viewpoints. With this augmented dataset, we can train a Re-ID model that is robust to viewpoint and human pose biases.
  • Figure 4: Qualitative comparison. We compare our generated output with DPTN zhang2022exploring, showing that Pose-dIVE can generate more realistic images while better preserving identity and accurately following the target pose.
  • Figure 5: Qualitative results. Example images from the augmented MSMT17 and Market-1501 dataset demonstrate how the generated images preserve original identities while maintaining realism and consistency with the Re-ID dataset.
  • ...and 5 more figures