Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

Shizhou Zhang; Wenlong Luo; De Cheng; Qingchun Yang; Lingyan Ran; Yinghui Xing; Yanning Zhang

Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

Shizhou Zhang, Wenlong Luo, De Cheng, Qingchun Yang, Lingyan Ran, Yinghui Xing, Yanning Zhang

TL;DR

A new benchmark approach for cross-platform ReID is proposed by transforming the cross-platform visual alignment problem into visual-semantic alignment through vision-language model (i.e., CLIP) and applying a parameter-efficient Video Set-Level-Adapter module to adapt image-based foundation model to video ReID tasks, termed VSLA-CLIP.

Abstract

In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) Large number of annotated identities; 3) Rich outdoor scenarios; 4) Huge difference in resolution. Additionally, we propose a new benchmark approach for cross-platform ReID by transforming the cross-platform visual alignment problem into visual-semantic alignment through vision-language model (i.e., CLIP) and applying a parameter-efficient Video Set-Level-Adapter module to adapt image-based foundation model to video ReID tasks, termed VSLA-CLIP. Besides, to further reduce the great discrepancy across the platforms, we also devise the platform-bridge prompts for efficient visual feature alignment. Extensive experiments demonstrate the superiority of the proposed method on all existing video ReID datasets and our proposed G2A-VReID dataset.

Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

TL;DR

Abstract

Paper Structure (18 sections, 16 equations, 4 figures, 5 tables)

This paper contains 18 sections, 16 equations, 4 figures, 5 tables.

Introduction
Related Works
Dataset
Dataset Collection
Annotation
Characteristics of Our G2A-VReID
Privacy Protection
Approach
Revisiting CLIP-ReID
Visual-Semantic Alignment
Video Set-Level-Adapter for Efficient Model Tuning
Platform-Bridge Prompt
Experiments
Datasets and Evaluation Metrics
Implementation Details
...and 3 more sections

Figures (4)

Figure 1: Visualization of proposed G2A-VReID at different heights.
Figure 2: The distributions of sequence length.
Figure 3: Overview of our proposed framework. ID-specific descriptions and shared text prompts are learned in stage one (left). Video Set-Level-Adapter and PBP are introduced and trained in the second stage (right) while freezing other parameters.
Figure 4: Analysis on the depth and length of PBP on our G2A-VReID.

Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

TL;DR

Abstract

Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

Authors

TL;DR

Abstract

Table of Contents

Figures (4)