SeCap: Self-Calibrating and Adaptive Prompts for Cross-view Person Re-Identification in Aerial-Ground Networks
Shining Wang, Yunlong Wang, Ruiqi Wu, Bingliang Jiao, Wenxuan Wang, Peng Wang
TL;DR
SeCap addresses the challenging cross-view AGPReID problem by introducing self-calibrating and adaptive prompts within an encoder–decoder transformer. The View Decoupling Transformer (VDT) decouples viewpoint information in the encoder, while the Prompt Re-calibration Module (PRM) and Local Feature Refinement Module (LFRM) in the decoder adapt prompts and refine local features to learn view-invariant representations. The approach is supported by two real-world datasets, LAGPeR and G2APS-ReID, and achieves state-of-the-art results across multiple AGPReID benchmarks, including strong cross-view performance and robustness to occlusion and viewpoint diversity. Overall, SeCap demonstrates effective cross-view alignment, improved local-feature discrimination, and practical impact for real-world aerial-ground surveillance scenarios, while contributing valuable datasets for the community.
Abstract
When discussing the Aerial-Ground Person Re-identification (AGPReID) task, we face the main challenge of the significant appearance variations caused by different viewpoints, making identity matching difficult. To address this issue, previous methods attempt to reduce the differences between viewpoints by critical attributes and decoupling the viewpoints. While these methods can mitigate viewpoint differences to some extent, they still face two main issues: (1) difficulty in handling viewpoint diversity and (2) neglect of the contribution of local features. To effectively address these challenges, we design and implement the Self-Calibrating and Adaptive Prompt (SeCap) method for the AGPReID task. The core of this framework relies on the Prompt Re-calibration Module (PRM), which adaptively re-calibrates prompts based on the input. Combined with the Local Feature Refinement Module (LFRM), SeCap can extract view-invariant features from local features for AGPReID. Meanwhile, given the current scarcity of datasets in the AGPReID field, we further contribute two real-world Large-scale Aerial-Ground Person Re-Identification datasets, LAGPeR and G2APS-ReID. The former is collected and annotated by us independently, covering $4,231$ unique identities and containing $63,841$ high-quality images; the latter is reconstructed from the person search dataset G2APS. Through extensive experiments on AGPReID datasets, we demonstrate that SeCap is a feasible and effective solution for the AGPReID task. The datasets and source code available on https://github.com/wangshining681/SeCap-AGPReID.
