Table of Contents
Fetching ...

Event-based Video Person Re-identification via Cross-Modality and Temporal Collaboration

Renkai Li, Xin Yuan, Wei Liu, Xin Xu

TL;DR

This work addresses privacy concerns in video person ReID by switching from RGB video to event-based data, preserving motion cues while reducing visible identity details. It introduces CMTC, a network that uses EventNet to generate Auxiliary information from events and fuses this with the event stream through Cross-Modality and Temporal Collaboration, capturing both appearance-like cues and inter-frame motion. Key contributions include the EventNet design, the MC and TC modules, and ablation evidence showing their additive benefits across multiple event-converted datasets. The approach yields improved Rank-1 and mAP on PRID-2011, iLIDS-VID, and MARS, highlighting the potential of privacy-preserving, event-based ReID in surveillance contexts.

Abstract

Video-based person re-identification (ReID) has become increasingly important due to its applications in video surveillance applications. By employing events in video-based person ReID, more motion information can be provided between continuous frames to improve recognition accuracy. Previous approaches have assisted by introducing event data into the video person ReID task, but they still cannot avoid the privacy leakage problem caused by RGB images. In order to avoid privacy attacks and to take advantage of the benefits of event data, we consider using only event data. To make full use of the information in the event stream, we propose a Cross-Modality and Temporal Collaboration (CMTC) network for event-based video person ReID. First, we design an event transform network to obtain corresponding auxiliary information from the input of raw events. Additionally, we propose a differential modality collaboration module to balance the roles of events and auxiliaries to achieve complementary effects. Furthermore, we introduce a temporal collaboration module to exploit motion information and appearance cues. Experimental results demonstrate that our method outperforms others in the task of event-based video person ReID.

Event-based Video Person Re-identification via Cross-Modality and Temporal Collaboration

TL;DR

This work addresses privacy concerns in video person ReID by switching from RGB video to event-based data, preserving motion cues while reducing visible identity details. It introduces CMTC, a network that uses EventNet to generate Auxiliary information from events and fuses this with the event stream through Cross-Modality and Temporal Collaboration, capturing both appearance-like cues and inter-frame motion. Key contributions include the EventNet design, the MC and TC modules, and ablation evidence showing their additive benefits across multiple event-converted datasets. The approach yields improved Rank-1 and mAP on PRID-2011, iLIDS-VID, and MARS, highlighting the potential of privacy-preserving, event-based ReID in surveillance contexts.

Abstract

Video-based person re-identification (ReID) has become increasingly important due to its applications in video surveillance applications. By employing events in video-based person ReID, more motion information can be provided between continuous frames to improve recognition accuracy. Previous approaches have assisted by introducing event data into the video person ReID task, but they still cannot avoid the privacy leakage problem caused by RGB images. In order to avoid privacy attacks and to take advantage of the benefits of event data, we consider using only event data. To make full use of the information in the event stream, we propose a Cross-Modality and Temporal Collaboration (CMTC) network for event-based video person ReID. First, we design an event transform network to obtain corresponding auxiliary information from the input of raw events. Additionally, we propose a differential modality collaboration module to balance the roles of events and auxiliaries to achieve complementary effects. Furthermore, we introduce a temporal collaboration module to exploit motion information and appearance cues. Experimental results demonstrate that our method outperforms others in the task of event-based video person ReID.
Paper Structure (13 sections, 12 equations, 4 figures, 2 tables)

This paper contains 13 sections, 12 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: (a) shows the auxiliary information generated by the original event after passing through our EventNet. (b) shows two specific cases, in each case giving the original event, the auxiliary information corresponding to the event, the feature map of AGW, and the feature map of our CMTC.
  • Figure 2: The overview of our Cross-Modality and Temporal Collaboration Network (CMTC).
  • Figure 3: The t-SNE visualization results for ten randomly selected identities on the PRID-2011 dataset, with different colors representing different identities.
  • Figure 4: Ranking results of CMTC. The red and green bounding indicate the error and correct result, respectively.