AG-ReID.v2: Bridging Aerial and Ground Views for Person Re-identification
Huy Nguyen, Kien Nguyen, Sridha Sridharan, Clinton Fookes
TL;DR
This work tackles cross-view person re-identification by introducing AG-ReID.v2, a large-scale dataset combining aerial, CCTV, and wearable imagery with 15 soft attributes to capture cross-view variations. It proposes a novel three-stream architecture built on a Vision Transformer backbone: a transformer-based ReID stream, an elevated-view head-focused stream, and an explainable attribute-guided stream, integrated through a metric and attribute-aware loss framework. Key contributions include the expanded AG-ReID.v2 dataset, the Explainable Elevated-View Attention (EP+EVA) architecture, and comprehensive experimental validation showing improvements over state-of-the-art baselines on aerial-ground ReID tasks. The dataset and code are publicly released to accelerate research in cross-domain surveillance and attribute-guided ReID under realistic multimodal conditions.
Abstract
Aerial-ground person re-identification (Re-ID) presents unique challenges in computer vision, stemming from the distinct differences in viewpoints, poses, and resolutions between high-altitude aerial and ground-based cameras. Existing research predominantly focuses on ground-to-ground matching, with aerial matching less explored due to a dearth of comprehensive datasets. To address this, we introduce AG-ReID.v2, a dataset specifically designed for person Re-ID in mixed aerial and ground scenarios. This dataset comprises 100,502 images of 1,615 unique individuals, each annotated with matching IDs and 15 soft attribute labels. Data were collected from diverse perspectives using a UAV, stationary CCTV, and smart glasses-integrated camera, providing a rich variety of intra-identity variations. Additionally, we have developed an explainable attention network tailored for this dataset. This network features a three-stream architecture that efficiently processes pairwise image distances, emphasizes key top-down features, and adapts to variations in appearance due to altitude differences. Comparative evaluations demonstrate the superiority of our approach over existing baselines. We plan to release the dataset and algorithm source code publicly, aiming to advance research in this specialized field of computer vision. For access, please visit https://github.com/huynguyen792/AG-ReID.v2.
