Table of Contents
Fetching ...

ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification

Chen Mao, Chong Tan, Jingqi Hu, Min Zheng

TL;DR

Extensive experiments in real-world scenarios demonstrate that the method effectively uncovers the correlations between heterogeneous data, bridges the gap between visual and signal modalities, significantly expands the sensing range, and improves ReID accuracy across multiple sensors.

Abstract

Person re-identification(ReID), as a crucial technology in the field of security, plays a vital role in safety inspections, personnel counting, and more. Most current ReID approaches primarily extract features from images, which are easily affected by objective conditions such as clothing changes and occlusions. In addition to cameras, we leverage widely available routers as sensing devices by capturing gait information from pedestrians through the Channel State Information (CSI) in WiFi signals and contribute a multimodal dataset. We employ a two-stream network to separately process video understanding and signal analysis tasks, and conduct multi-modal fusion and contrastive learning on pedestrian video and WiFi data. Extensive experiments in real-world scenarios demonstrate that our method effectively uncovers the correlations between heterogeneous data, bridges the gap between visual and signal modalities, significantly expands the sensing range, and improves ReID accuracy across multiple sensors.

ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification

TL;DR

Extensive experiments in real-world scenarios demonstrate that the method effectively uncovers the correlations between heterogeneous data, bridges the gap between visual and signal modalities, significantly expands the sensing range, and improves ReID accuracy across multiple sensors.

Abstract

Person re-identification(ReID), as a crucial technology in the field of security, plays a vital role in safety inspections, personnel counting, and more. Most current ReID approaches primarily extract features from images, which are easily affected by objective conditions such as clothing changes and occlusions. In addition to cameras, we leverage widely available routers as sensing devices by capturing gait information from pedestrians through the Channel State Information (CSI) in WiFi signals and contribute a multimodal dataset. We employ a two-stream network to separately process video understanding and signal analysis tasks, and conduct multi-modal fusion and contrastive learning on pedestrian video and WiFi data. Extensive experiments in real-world scenarios demonstrate that our method effectively uncovers the correlations between heterogeneous data, bridges the gap between visual and signal modalities, significantly expands the sensing range, and improves ReID accuracy across multiple sensors.

Paper Structure

This paper contains 12 sections, 5 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: A pair of WiFi transceivers and a camera are used to identify the person
  • Figure 2: Example of one video clip of a sample and the corresponding matrix of WiFi CSI data frame.
  • Figure 3: The overall architecture diagram of ViFi-ReID.
  • Figure 4: WiFi Data Preprocessing in WiFormer.
  • Figure 5: Visualizations of pedestrian feature clustering in two-dimensional space and ROC curve.
  • ...and 1 more figures