Table of Contents
Fetching ...

Multi-modal Multi-platform Person Re-Identification: Benchmark and Method

Ruiyang Ha, Songyi Jiang, Bin Li, Bikang Pan, Yihang Zhu, Junjie Zhang, Xiatian Zhu, Shaogang Gong, Jingya Wang

TL;DR

This work introduces MP-ReID, the first large-scale benchmark for multi-modality and multi-platform person re-identification, integrating ground RGB/IR and UAV RGB/thermal data across indoor and outdoor scenes for 1,930 identities. It also presents Uni-Prompt ReID, a CLIP-based prompt-learning framework that fuses Specified-ReID, Modality-Aware, and Platform-Aware prompts with a Visual-Enhanced network to address cross-modality and cross-platform gaps. Experiments show Uni-Prompt ReID achieves state-of-the-art performance across cross-modality, cross-platform, and joint tasks on MP-ReID, with ablations quantifying the contribution of each prompt component. The dataset is privacy-preserving and publicly released to encourage robust evaluation in realistic, dynamic environments, facilitating future research in complex ReID scenarios.

Abstract

Conventional person re-identification (ReID) research is often limited to single-modality sensor data from static cameras, which fails to address the complexities of real-world scenarios where multi-modal signals are increasingly prevalent. For instance, consider an urban ReID system integrating stationary RGB cameras, nighttime infrared sensors, and UAVs equipped with dynamic tracking capabilities. Such systems face significant challenges due to variations in camera perspectives, lighting conditions, and sensor modalities, hindering effective person ReID. To address these challenges, we introduce the MP-ReID benchmark, a novel dataset designed specifically for multi-modality and multi-platform ReID. This benchmark uniquely compiles data from 1,930 identities across diverse modalities, including RGB, infrared, and thermal imaging, captured by both UAVs and ground-based cameras in indoor and outdoor environments. Building on this benchmark, we introduce Uni-Prompt ReID, a framework with specific-designed prompts, tailored for cross-modality and cross-platform scenarios. Our method consistently outperforms state-of-the-art approaches, establishing a robust foundation for future research in complex and dynamic ReID environments. Our dataset are available at:https://mp-reid.github.io/.

Multi-modal Multi-platform Person Re-Identification: Benchmark and Method

TL;DR

This work introduces MP-ReID, the first large-scale benchmark for multi-modality and multi-platform person re-identification, integrating ground RGB/IR and UAV RGB/thermal data across indoor and outdoor scenes for 1,930 identities. It also presents Uni-Prompt ReID, a CLIP-based prompt-learning framework that fuses Specified-ReID, Modality-Aware, and Platform-Aware prompts with a Visual-Enhanced network to address cross-modality and cross-platform gaps. Experiments show Uni-Prompt ReID achieves state-of-the-art performance across cross-modality, cross-platform, and joint tasks on MP-ReID, with ablations quantifying the contribution of each prompt component. The dataset is privacy-preserving and publicly released to encourage robust evaluation in realistic, dynamic environments, facilitating future research in complex ReID scenarios.

Abstract

Conventional person re-identification (ReID) research is often limited to single-modality sensor data from static cameras, which fails to address the complexities of real-world scenarios where multi-modal signals are increasingly prevalent. For instance, consider an urban ReID system integrating stationary RGB cameras, nighttime infrared sensors, and UAVs equipped with dynamic tracking capabilities. Such systems face significant challenges due to variations in camera perspectives, lighting conditions, and sensor modalities, hindering effective person ReID. To address these challenges, we introduce the MP-ReID benchmark, a novel dataset designed specifically for multi-modality and multi-platform ReID. This benchmark uniquely compiles data from 1,930 identities across diverse modalities, including RGB, infrared, and thermal imaging, captured by both UAVs and ground-based cameras in indoor and outdoor environments. Building on this benchmark, we introduce Uni-Prompt ReID, a framework with specific-designed prompts, tailored for cross-modality and cross-platform scenarios. Our method consistently outperforms state-of-the-art approaches, establishing a robust foundation for future research in complex and dynamic ReID environments. Our dataset are available at:https://mp-reid.github.io/.

Paper Structure

This paper contains 28 sections, 6 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: A new MP-ReID dataset is represented by a conceptual diagram, showcasing the inclusion of six ground RGB cameras, six ground infrared cameras, one UAV RGB camera and one UAV thermal camera. The data is collected from a variety of environments, including outdoor and indoor settings, as well as from UAV cameras for aerial perspectives. This integration of diverse data sources and modalities creates a comprehensive and versatile dataset, aimed at enhancing research in multi-modal human perception.
  • Figure 2: Our MP-ReID dataset comprises three distinct modalities and three different scenes, with notable disparities between images captured in different modalities and scenes. We showcase a range of variations to highlight the challenges present in person re-identification. From left to right, selected samples from different scenes and modalities illustrate the disparities between various viewpoints, instances of low resolution, cases of motion blur, and scenarios involving occlusion, respectively. These examples serve to demonstrate the complex nature of the gaps and obstacles within our dataset, emphasizing the diversity and real-world applicability of the MP-ReID benchmark.
  • Figure 3: Uni-Prompt ReID divides the learnable context into three parts: Specified-ReID Prompt, Modality-Aware Prompt, and Platform-Aware Prompt. During the training stage, the specified-ReID prompt will be learned first in a warm-up stage and then be frozen with the update of modality-aware prompt and platform-aware prompt. An Inherent Information Embedding Net is also updated with the prompts.
  • Figure 4: We provide 6 examples showing images of the same IDs in different scenes and various modalities within our MP-ReID. From left to right, indoor RGB, outdoor RGB, UAV RGB, indoor infrared, outdoor infrared and UAV thermal are shown for each ID, respectively.
  • Figure 5: Bounding box analysis of the MP-ReID dataset. (a) shows the image proportion in different scenes, and (b) shows the image proposition in different modalities.
  • ...and 4 more figures