Table of Contents
Fetching ...

Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training

Ke Niu, Haiyang Yu, Xuelin Qian, Teng Fu, Bin Li, Xiangyang Xue

TL;DR

The paper addresses data scarcity and domain gap in person Re-ID by introducing Diffusion-ReID, a two-stage paradigm that synthesizes ID-consistent and attribute-diverse images via diffusion models and subsequent filtering. It introduces LPE and IIR to maintain identity fidelity and DI to enhance diversity, plus a Fine-Grain-Specific Prior Preservation Loss to guide generation. The Diff-Person dataset (over 777K images, 5,183 identities) is built from existing labeled data and used to pre-train backbones that consistently outperform ImageNet-1K initializations across six Re-ID settings, including few-shot, unsupervised, and domain adaptation tasks. The work demonstrates improved initialization, faster convergence, and strong practical impact for real-world Re-ID systems, while noting avenues for expansion to additional datasets and data sources.

Abstract

Existing person re-identification (Re-ID) methods principally deploy the ImageNet-1K dataset for model initialization, which inevitably results in sub-optimal situations due to the large domain gap. One of the key challenges is that building large-scale person Re-ID datasets is time-consuming. Some previous efforts address this problem by collecting person images from the internet e.g., LUPerson, but it struggles to learn from unlabeled, uncontrollable, and noisy data. In this paper, we present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities without requiring any cost of data collection and annotation. Technically, this paradigm unfolds in two stages: generation and filtering. During the generation stage, we propose Language Prompts Enhancement (LPE) to ensure the ID consistency between the input image sequence and the generated images. In the diffusion process, we propose a Diversity Injection (DI) module to increase attribute diversity. In order to make the generated data have higher quality, we apply a Re-ID confidence threshold filter to further remove the low-quality images. Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities. Next, we build a stronger person Re-ID backbone pre-trained on our Diff-Person. Extensive experiments are conducted on four person Re-ID benchmarks in six widely used settings. Compared with other pre-training and self-supervised competitors, our approach shows significant superiority.

Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training

TL;DR

The paper addresses data scarcity and domain gap in person Re-ID by introducing Diffusion-ReID, a two-stage paradigm that synthesizes ID-consistent and attribute-diverse images via diffusion models and subsequent filtering. It introduces LPE and IIR to maintain identity fidelity and DI to enhance diversity, plus a Fine-Grain-Specific Prior Preservation Loss to guide generation. The Diff-Person dataset (over 777K images, 5,183 identities) is built from existing labeled data and used to pre-train backbones that consistently outperform ImageNet-1K initializations across six Re-ID settings, including few-shot, unsupervised, and domain adaptation tasks. The work demonstrates improved initialization, faster convergence, and strong practical impact for real-world Re-ID systems, while noting avenues for expansion to additional datasets and data sources.

Abstract

Existing person re-identification (Re-ID) methods principally deploy the ImageNet-1K dataset for model initialization, which inevitably results in sub-optimal situations due to the large domain gap. One of the key challenges is that building large-scale person Re-ID datasets is time-consuming. Some previous efforts address this problem by collecting person images from the internet e.g., LUPerson, but it struggles to learn from unlabeled, uncontrollable, and noisy data. In this paper, we present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities without requiring any cost of data collection and annotation. Technically, this paradigm unfolds in two stages: generation and filtering. During the generation stage, we propose Language Prompts Enhancement (LPE) to ensure the ID consistency between the input image sequence and the generated images. In the diffusion process, we propose a Diversity Injection (DI) module to increase attribute diversity. In order to make the generated data have higher quality, we apply a Re-ID confidence threshold filter to further remove the low-quality images. Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities. Next, we build a stronger person Re-ID backbone pre-trained on our Diff-Person. Extensive experiments are conducted on four person Re-ID benchmarks in six widely used settings. Compared with other pre-training and self-supervised competitors, our approach shows significant superiority.
Paper Structure (14 sections, 1 equation, 8 figures, 10 tables)

This paper contains 14 sections, 1 equation, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Example images generated by our proposed paradigm Diffusion-ReID.
  • Figure 2: The overview of our paradigm Diffusion-ReID.
  • Figure 3: Comparison of generated results between manual text prompt input and captioning models.
  • Figure 4: Illustration of three proposed candidate filtering approaches, as well as the output visualization.
  • Figure 5: Identity distribution of Diff-Person and LUPerson-NL. A curve point(X, Y) indicates Y% of identities each has less than X images.
  • ...and 3 more figures