Table of Contents
Fetching ...

Unleashing the Potential of Pre-Trained Diffusion Models for Generalizable Person Re-Identification

Jiachen Li, Xiaojin Gong

TL;DR

The paper tackles domain-generalizable person re-identification (DG Re-ID) by addressing shortcut learning in discriminative setups. It introduces DCAC, a diffusion model-assisted representation learning framework that couples a CLIP-based Re-ID backbone with a pre-trained diffusion model via a correlation-aware conditioning scheme using ID-wise prompts and dark knowledge from classification logits. The diffusion model is adapted with LoRA adapters to balance preserving pre-trained knowledge and enabling downstream adaptation, with gradients flowing back to the Re-ID model to improve generalization. Across single-source and multi-source DG Re-ID benchmarks, DCAC achieves state-of-the-art or competitive results and is supported by extensive ablations validating the conditioning strategy, diffusion assistance, and efficiency advantages.

Abstract

Domain-generalizable re-identification (DG Re-ID) aims to train a model on one or more source domains and evaluate its performance on unseen target domains, a task that has attracted growing attention due to its practical relevance. While numerous methods have been proposed, most rely on discriminative or contrastive learning frameworks to learn generalizable feature representations. However, these approaches often fail to mitigate shortcut learning, leading to suboptimal performance. In this work, we propose a novel method called diffusion model-assisted representation learning with a correlation-aware conditioning scheme (DCAC) to enhance DG Re-ID. Our method integrates a discriminative and contrastive Re-ID model with a pre-trained diffusion model through a correlation-aware conditioning scheme. By incorporating ID classification probabilities generated from the Re-ID model with a set of learnable ID-wise prompts, the conditioning scheme injects dark knowledge that captures ID correlations to guide the diffusion process. Simultaneously, feedback from the diffusion model is back-propagated through the conditioning scheme to the Re-ID model, effectively improving the generalization capability of Re-ID features. Extensive experiments on both single-source and multi-source DG Re-ID tasks demonstrate that our method achieves state-of-the-art performance. Comprehensive ablation studies further validate the effectiveness of the proposed approach, providing insights into its robustness. Codes will be available at https://github.com/RikoLi/DCAC.

Unleashing the Potential of Pre-Trained Diffusion Models for Generalizable Person Re-Identification

TL;DR

The paper tackles domain-generalizable person re-identification (DG Re-ID) by addressing shortcut learning in discriminative setups. It introduces DCAC, a diffusion model-assisted representation learning framework that couples a CLIP-based Re-ID backbone with a pre-trained diffusion model via a correlation-aware conditioning scheme using ID-wise prompts and dark knowledge from classification logits. The diffusion model is adapted with LoRA adapters to balance preserving pre-trained knowledge and enabling downstream adaptation, with gradients flowing back to the Re-ID model to improve generalization. Across single-source and multi-source DG Re-ID benchmarks, DCAC achieves state-of-the-art or competitive results and is supported by extensive ablations validating the conditioning strategy, diffusion assistance, and efficiency advantages.

Abstract

Domain-generalizable re-identification (DG Re-ID) aims to train a model on one or more source domains and evaluate its performance on unseen target domains, a task that has attracted growing attention due to its practical relevance. While numerous methods have been proposed, most rely on discriminative or contrastive learning frameworks to learn generalizable feature representations. However, these approaches often fail to mitigate shortcut learning, leading to suboptimal performance. In this work, we propose a novel method called diffusion model-assisted representation learning with a correlation-aware conditioning scheme (DCAC) to enhance DG Re-ID. Our method integrates a discriminative and contrastive Re-ID model with a pre-trained diffusion model through a correlation-aware conditioning scheme. By incorporating ID classification probabilities generated from the Re-ID model with a set of learnable ID-wise prompts, the conditioning scheme injects dark knowledge that captures ID correlations to guide the diffusion process. Simultaneously, feedback from the diffusion model is back-propagated through the conditioning scheme to the Re-ID model, effectively improving the generalization capability of Re-ID features. Extensive experiments on both single-source and multi-source DG Re-ID tasks demonstrate that our method achieves state-of-the-art performance. Comprehensive ablation studies further validate the effectiveness of the proposed approach, providing insights into its robustness. Codes will be available at https://github.com/RikoLi/DCAC.

Paper Structure

This paper contains 27 sections, 13 equations, 3 figures, 10 tables.

Figures (3)

  • Figure 1: Illustration of different diffusion-based representation learning designs. (a) A separate denoising decoder and a classifier with a shared encoder, (b) intertwined feature extraction and feature denoising, (c) a diffusion model with a separate image encoder for instance-wise conditioning, and (d) a diffusion model and a classification model bridged by a correlation-aware ID-wise conditioning scheme. In addition, (e) illustrates that the dark knowledge embedded in the logits of classifiers is able to capture the ID relationships, including nuanced similarities and differences beyond the hard ID labels, which helps generate better conditions to guide the diffusion model.
  • Figure 2: An overview of the proposed framework. It consists of a baseline Re-ID model, a pre-trained diffusion model, and a correlation-aware conditioning scheme based on learnable ID-wise prompts. The Re-ID model is built upon the pre-trained CLIP image encoder CLIP and a BN Neck BoT, optimized by an ID loss and a prototypical contrastive loss. The diffusion model is constructed on via pre-trained stable diffusion StableDiffusion, with LoRA LoRA for efficient adaptation. The informative classification probabilities predicted by the Re-ID model is employed to produce a correlation-aware condition to guide the diffusion model for unleashing specific knowledge of generalization, with gradients back-propagated to the Re-ID model for enhanced generalizable feature learning.
  • Figure 3: GradCAM GradCAM visualization of several visually similar IDs selected from the Market1501 Market1501 dataset. In groups (a) to (c), the activation maps are computed with the images of ID #27, #76, and #649 under ID #28, respectively, which reflects the Re-ID model's capability of capturing correlations among IDs. From left to right, each group contains the original image and the activation maps of the baseline model, the instance-wise condition-guided model, and our correlation-aware-condition-guided model, respectively.