SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification

Yuhao Wang; Xiang Hu; Lixin Wang; Pingping Zhang; Huchuan Lu

SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification

Yuhao Wang, Xiang Hu, Lixin Wang, Pingping Zhang, Huchuan Lu

TL;DR

SD-ReID tackles cross-view aerial-ground person re-identification by introducing a two-stage framework that jointly learns identity- and view-aware representations and then uses a Stable Diffusion model to generate view-specific features conditioned on identity and view cues. A memory bank stores global view prototypes to guide inference when instance-level view information is unavailable, while a View-Refined Decoder (VRD) aligns generated features with backbone representations to reduce distribution gaps. The method integrates a condition learner to fuse intermediate identity descriptors with global view cues, enabling robust cross-view generation. Across five AG-ReID benchmarks, SD-ReID achieves state-of-the-art performance, demonstrating the value of explicit view-specific feature generation for cross-view retrieval and its practical potential for real-world surveillance scenarios.

Abstract

Aerial-Ground Person Re-IDentification (AG-ReID) aims to retrieve specific persons across cameras with different viewpoints. Previous works focus on designing discriminative models to maintain the identity consistency despite drastic changes in camera viewpoints. The core idea behind these methods is quite natural, but designing a view-robust model is a very challenging task. Moreover, they overlook the contribution of view-specific features in enhancing the model's ability to represent persons. To address these issues, we propose a novel generative framework named SD-ReID for AG-ReID, which leverages generative models to mimic the feature distribution of different views while extracting robust identity representations. More specifically, we first train a ViT-based model to extract person representations along with controllable conditions, including identity and view conditions. We then fine-tune the Stable Diffusion (SD) model to enhance person representations guided by these controllable conditions. Furthermore, we introduce the View-Refined Decoder (VRD) to bridge the gap between instance-level and global-level features. Finally, both person representations and all-view features are employed to retrieve target persons. Extensive experiments on five AG-ReID benchmarks (i.e., CARGO, AG-ReIDv1, AG-ReIDv2, LAGPeR and G2APS-ReID) demonstrate the effectiveness of our proposed method. The source code will be available.

SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification

TL;DR

Abstract

SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)