Table of Contents
Fetching ...

Diffusion-based Synthetic Data Generation for Visible-Infrared Person Re-Identification

Wenbo Dai, Lijing Lu, Zhihang Li

TL;DR

This work tackles VI-ReID data scarcity and privacy constraints by introducing DiVE, a diffusion-based framework that automatically generates large-scale RGB-IR paired data with identity preservation. Key to DiVE is a unified mapping function that decouples identity from modality, enabling identity-consistent infrared synthesis via text-driven diffusion with textual inversion for identities and DreamBooth-like modality tokens (via LoRA fine-tuning). The method expands external RGB datasets to create rich RGB-IR pairs, improving VI-ReID performance across multiple models and datasets, with notable gains such as a up to 9% mAP increase on LLCM and improved Rank-1/mAP on SYSU-MM01. The results demonstrate that diffusion-based synthetic data can closely approximate real IR distributions, reduce labeling costs, and offer a scalable path for cross-modal person re-identification.

Abstract

The performance of models is intricately linked to the abundance of training data. In Visible-Infrared person Re-IDentification (VI-ReID) tasks, collecting and annotating large-scale images of each individual under various cameras and modalities is tedious, time-expensive, costly and must comply with data protection laws, posing a severe challenge in meeting dataset requirements. Current research investigates the generation of synthetic data as an efficient and privacy-ensuring alternative to collecting real data in the field. However, a specific data synthesis technique tailored for VI-ReID models has yet to be explored. In this paper, we present a novel data generation framework, dubbed Diffusion-based VI-ReID data Expansion (DiVE), that automatically obtain massive RGB-IR paired images with identity preserving by decoupling identity and modality to improve the performance of VI-ReID models. Specifically, identity representation is acquired from a set of samples sharing the same ID, whereas the modality of images is learned by fine-tuning the Stable Diffusion (SD) on modality-specific data. DiVE extend the text-driven image synthesis to identity-preserving RGB-IR multimodal image synthesis. This approach significantly reduces data collection and annotation costs by directly incorporating synthetic data into ReID model training. Experiments have demonstrated that VI-ReID models trained on synthetic data produced by DiVE consistently exhibit notable enhancements. In particular, the state-of-the-art method, CAJ, trained with synthetic images, achieves an improvement of about $9\%$ in mAP over the baseline on the LLCM dataset. Code: https://github.com/BorgDiven/DiVE

Diffusion-based Synthetic Data Generation for Visible-Infrared Person Re-Identification

TL;DR

This work tackles VI-ReID data scarcity and privacy constraints by introducing DiVE, a diffusion-based framework that automatically generates large-scale RGB-IR paired data with identity preservation. Key to DiVE is a unified mapping function that decouples identity from modality, enabling identity-consistent infrared synthesis via text-driven diffusion with textual inversion for identities and DreamBooth-like modality tokens (via LoRA fine-tuning). The method expands external RGB datasets to create rich RGB-IR pairs, improving VI-ReID performance across multiple models and datasets, with notable gains such as a up to 9% mAP increase on LLCM and improved Rank-1/mAP on SYSU-MM01. The results demonstrate that diffusion-based synthetic data can closely approximate real IR distributions, reduce labeling costs, and offer a scalable path for cross-modal person re-identification.

Abstract

The performance of models is intricately linked to the abundance of training data. In Visible-Infrared person Re-IDentification (VI-ReID) tasks, collecting and annotating large-scale images of each individual under various cameras and modalities is tedious, time-expensive, costly and must comply with data protection laws, posing a severe challenge in meeting dataset requirements. Current research investigates the generation of synthetic data as an efficient and privacy-ensuring alternative to collecting real data in the field. However, a specific data synthesis technique tailored for VI-ReID models has yet to be explored. In this paper, we present a novel data generation framework, dubbed Diffusion-based VI-ReID data Expansion (DiVE), that automatically obtain massive RGB-IR paired images with identity preserving by decoupling identity and modality to improve the performance of VI-ReID models. Specifically, identity representation is acquired from a set of samples sharing the same ID, whereas the modality of images is learned by fine-tuning the Stable Diffusion (SD) on modality-specific data. DiVE extend the text-driven image synthesis to identity-preserving RGB-IR multimodal image synthesis. This approach significantly reduces data collection and annotation costs by directly incorporating synthetic data into ReID model training. Experiments have demonstrated that VI-ReID models trained on synthetic data produced by DiVE consistently exhibit notable enhancements. In particular, the state-of-the-art method, CAJ, trained with synthetic images, achieves an improvement of about in mAP over the baseline on the LLCM dataset. Code: https://github.com/BorgDiven/DiVE

Paper Structure

This paper contains 36 sections, 8 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Performance comparison of different VI-ReID methods on the SYSU-MM01 dataset before and after using our proposed data expansion approach.
  • Figure 2: The left image comes from the RGB-IR ReID dataset, while the right images come from the synthetic dataset generated by our method, DiVE. It can be seen that our method not only can synthesize images in the IR domain but also can maintain identity information. Especially, details such as backpacks, clothes, and hairstyles remain consistent with the RGB images. The generated images also exhibit different poses and scenes, enriching the diversity of datasets.
  • Figure 3: Illustration of our DiVE. (a): The training of the DiVE involves unpaired RGB-IR data. DiVE disentangles identity and modality representations to enrich the identity diversity of the generated images. (b): After training the generator, we leverage it to transfer a great deal of RGB images to IR images with identity preserved. These synthetic samples are used to train arbitrary VI-ReID approaches.
  • Figure 4: Visual comparison of synthetic infrared images. Column 1: Real IR images from SYSU-MM01 dataset and RGB images from Market-1501 dataset. Columns 2-6: Synthetic IR images generated by CycleGAN, AlignGAN, VI-Diff, CycleGAN-Turbo, and the proposed DiVE model, respectively.
  • Figure 5: Performance under different number of Augmented IDs. Different colors represent different identities.
  • ...and 5 more figures