Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

Inès Hyeonsu Kim; JoungBin Lee; Woojeong Jin; Soowon Son; Kyusun Cho; Junyoung Seo; Min-Seop Kwak; Seokju Cho; JeongYeol Baek; Byeongwon Lee; Seungryong Kim

Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

Inès Hyeonsu Kim, JoungBin Lee, Woojeong Jin, Soowon Son, Kyusun Cho, Junyoung Seo, Min-Seop Kwak, Seokju Cho, JeongYeol Baek, Byeongwon Lee, Seungryong Kim

TL;DR

Pose-dIVE is a novel data augmentation approach that incorporates sparse and underrepresented human pose and camera viewpoint examples into the training data, addressing the limited diversity in the original training data distribution and enhancing the generalizability of Re-ID models compared to other data augmentation-based Re-ID approaches.

Abstract

Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. We propose Pose-dIVE, a novel data augmentation approach that incorporates sparse and underrepresented human pose and camera viewpoint examples into the training data, addressing the limited diversity in the original training data distribution. Our objective is to augment the training dataset to enable existing Re-ID models to learn features unbiased by human pose and camera viewpoint variations. To achieve this, we leverage the knowledge of pre-trained large-scale diffusion models. By conditioning the diffusion model on both the human pose and camera viewpoint concurrently through the SMPL model, we generate training data with diverse human poses and camera viewpoints. Experimental results demonstrate the effectiveness of our method in addressing human pose bias and enhancing the generalizability of Re-ID models compared to other data augmentation-based Re-ID approaches.

Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

TL;DR

Abstract

Paper Structure (35 sections, 1 equation, 10 figures, 6 tables)

This paper contains 35 sections, 1 equation, 10 figures, 6 tables.

Introduction
Related Work
Person re-identification.
Data augmentation in re-identification.
Pose-conditioned diffusion models.
Method
Motivation and Overview
Human Pose and Camera Viewpoint Condition with SMPL
Pose Diversified Augmentation
Pose Diversified Generation with Stable Diffusion
Experiments
Implementation Details
Step 1: Training of generative model.
Step 2: Augmentation with generative model.
Step 3: Training baseline Re-ID models.
...and 20 more sections

Figures (10)

Figure 1: Pose-dIVE diversifies the viewpoint and human pose of the Re-ID dataset to help generalize and improve the performance of arbitrary Re-ID models.
Figure 2: Visualization of the effect of viewpoint and human pose augmentation. We compare visualizations of camera viewpoint and human pose distributions for the Market-1501 Market1501. The left figures (i) display the camera viewpoint distribution derived from SMPL, while the right figures (ii) illustrate the pose distribution. In (i), from left to right, we show the viewpoint distributions of the training dataset, the augmented dataset, and the combination of both. Similarly, in (ii), from left to right, we present t-SNE van2008visualizing visualizations of the human pose distributions, showing poses from the training dataset, followed by augmented poses sourced from outside the dataset. These visualizations demonstrate that our pose augmentation successfully diversifies both viewpoint and human pose distributions.
Figure 3: Pose-dIVE framework. Upon observing the highly biased viewpoint and human pose distributions in training dataset, we augment the dataset by manipulating SMPL body shapes and feeding the rendered shapes into a generative model to fill in sparsely distributed poses and viewpoints. With this augmented dataset, we can train a Re-ID model that is robust to viewpoint and human pose biases.
Figure 4: Qualitative comparison. We compare our generated output with DPTN zhang2022exploring, showing that Pose-dIVE can generate more realistic images while better preserving identity and accurately following the target pose.
Figure 5: Qualitative results. Example images from the augmented MSMT17 and Market-1501 dataset demonstrate how the generated images preserve original identities while maintaining realism and consistency with the Re-ID dataset.
...and 5 more figures

Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

TL;DR

Abstract

Pose-dIVE: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification

Authors

TL;DR

Abstract

Table of Contents

Figures (10)