Table of Contents
Fetching ...

AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification

Huy Nguyen, Kien Nguyen, Akila Pemasiri, Feng Liu, Sridha Sridharan, Clinton Fookes

TL;DR

AG-VPReID-Net is introduced, a new large-scale dataset for aerial-ground video-based person re-identification that comprises 6,632 subjects, 32,321 tracklets and over 9.6 million frames captured by drones, CCTV, CCTV, and wearable cameras that offers a real-world benchmark for evaluating the robustness to significant viewpoint changes, scale variations, and resolution differences in cross-platform aerial-ground settings.

Abstract

We introduce AG-VPReID, a new large-scale dataset for aerial-ground video-based person re-identification (ReID) that comprises 6,632 subjects, 32,321 tracklets and over 9.6 million frames captured by drones (altitudes ranging from 15-120m), CCTV, and wearable cameras. This dataset offers a real-world benchmark for evaluating the robustness to significant viewpoint changes, scale variations, and resolution differences in cross-platform aerial-ground settings. In addition, to address these challenges, we propose AG-VPReID-Net, an end-to-end framework composed of three complementary streams: (1) an Adapted Temporal-Spatial Stream addressing motion pattern inconsistencies and facilitating temporal feature learning, (2) a Normalized Appearance Stream leveraging physics-informed techniques to tackle resolution and appearance changes, and (3) a Multi-Scale Attention Stream handling scale variations across drone altitudes. We integrate visual-semantic cues from all streams to form a robust, viewpoint-invariant whole-body representation. Extensive experiments demonstrate that AG-VPReID-Net outperforms state-of-the-art approaches on both our new dataset and existing video-based ReID benchmarks, showcasing its effectiveness and generalizability. Nevertheless, the performance gap observed on AG-VPReID across all methods underscores the dataset's challenging nature. The dataset, code and trained models are available at https://github.com/agvpreid25/AG-VPReID-Net.

AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification

TL;DR

AG-VPReID-Net is introduced, a new large-scale dataset for aerial-ground video-based person re-identification that comprises 6,632 subjects, 32,321 tracklets and over 9.6 million frames captured by drones, CCTV, CCTV, and wearable cameras that offers a real-world benchmark for evaluating the robustness to significant viewpoint changes, scale variations, and resolution differences in cross-platform aerial-ground settings.

Abstract

We introduce AG-VPReID, a new large-scale dataset for aerial-ground video-based person re-identification (ReID) that comprises 6,632 subjects, 32,321 tracklets and over 9.6 million frames captured by drones (altitudes ranging from 15-120m), CCTV, and wearable cameras. This dataset offers a real-world benchmark for evaluating the robustness to significant viewpoint changes, scale variations, and resolution differences in cross-platform aerial-ground settings. In addition, to address these challenges, we propose AG-VPReID-Net, an end-to-end framework composed of three complementary streams: (1) an Adapted Temporal-Spatial Stream addressing motion pattern inconsistencies and facilitating temporal feature learning, (2) a Normalized Appearance Stream leveraging physics-informed techniques to tackle resolution and appearance changes, and (3) a Multi-Scale Attention Stream handling scale variations across drone altitudes. We integrate visual-semantic cues from all streams to form a robust, viewpoint-invariant whole-body representation. Extensive experiments demonstrate that AG-VPReID-Net outperforms state-of-the-art approaches on both our new dataset and existing video-based ReID benchmarks, showcasing its effectiveness and generalizability. Nevertheless, the performance gap observed on AG-VPReID across all methods underscores the dataset's challenging nature. The dataset, code and trained models are available at https://github.com/agvpreid25/AG-VPReID-Net.

Paper Structure

This paper contains 28 sections, 8 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Our AG-VPReID dataset was captured using a variety of six cameras, including aerial drones, CCTVs, and GoPros. Sample images and camera locations are illustrated on the right side of the figure. The left side depicts the cross-camera appearance variations of two pedestrians, showcasing differences across various sessions and times of the day.
  • Figure 2: The AG-VPReID dataset presents several key challenges: extreme viewpoints, varying resolutions and subject sizes, pose/illumination variations, occlusions, and similar clothing among subjects.
  • Figure 3: The three-stream AG-VPReID-Net architecture addresses aerial-ground ReID challenges: Temporal-Spatial stream for motion modeling and temporal features, Normalized Appearance for resolution/appearance variations, and Multi-Scale Attention for aerial-ground scale variations.
  • Figure 4: Baseline vs our method on AG-VPReID dataset. Green/red: correct/incorrect labels. First tracklet image shown. Ranks show improvements in bold.
  • Figure 5: Soft-biometric attributes in our AG-VPReID dataset, showing Physical (top) and Appearance (bottom) Traits of a person from a top view. The attributes are categorized into physical characteristics (such as gender, age, height) and appearance details (such as clothing and accessories).
  • ...and 3 more figures