Table of Contents
Fetching ...

Cross-Modality Gait Recognition: Bridging LiDAR and Camera Modalities for Human Identification

Rui Wang, Chuanfu Shen, Manuel J. Marin-Jimenez, George Q. Huang, Shiqi Yu

TL;DR

This paper tackles cross-modality gait recognition by bridging LiDAR point clouds and camera silhouettes, a scenario common in real-world multi-sensor environments. It proposes CrossGait, a two-branch framework that learns modality-specific features and a modality-shared representation using a Prototypical Modality-shared Attention Module (PMAM) and a Cross-modality Feature Adapter (CMFA) to enable cross-modality matching with a targeted contrastive alignment objective. The method achieves notable cross-modality retrieval performance on the SUSTech1K dataset and demonstrates generalization to other camera-based representations, while maintaining strong single-modality performance. The work advances practical multi-sensor gait recognition and suggests future extensions to additional modalities such as infrared or event-based data.

Abstract

Current gait recognition research mainly focuses on identifying pedestrians captured by the same type of sensor, neglecting the fact that individuals may be captured by different sensors in order to adapt to various environments. A more practical approach should involve cross-modality matching across different sensors. Hence, this paper focuses on investigating the problem of cross-modality gait recognition, with the objective of accurately identifying pedestrians across diverse vision sensors. We present CrossGait inspired by the feature alignment strategy, capable of cross retrieving diverse data modalities. Specifically, we investigate the cross-modality recognition task by initially extracting features within each modality and subsequently aligning these features across modalities. To further enhance the cross-modality performance, we propose a Prototypical Modality-shared Attention Module that learns modality-shared features from two modality-specific features. Additionally, we design a Cross-modality Feature Adapter that transforms the learned modality-specific features into a unified feature space. Extensive experiments conducted on the SUSTech1K dataset demonstrate the effectiveness of CrossGait: (1) it exhibits promising cross-modality ability in retrieving pedestrians across various modalities from different sensors in diverse scenes, and (2) CrossGait not only learns modality-shared features for cross-modality gait recognition but also maintains modality-specific features for single-modality recognition.

Cross-Modality Gait Recognition: Bridging LiDAR and Camera Modalities for Human Identification

TL;DR

This paper tackles cross-modality gait recognition by bridging LiDAR point clouds and camera silhouettes, a scenario common in real-world multi-sensor environments. It proposes CrossGait, a two-branch framework that learns modality-specific features and a modality-shared representation using a Prototypical Modality-shared Attention Module (PMAM) and a Cross-modality Feature Adapter (CMFA) to enable cross-modality matching with a targeted contrastive alignment objective. The method achieves notable cross-modality retrieval performance on the SUSTech1K dataset and demonstrates generalization to other camera-based representations, while maintaining strong single-modality performance. The work advances practical multi-sensor gait recognition and suggests future extensions to additional modalities such as infrared or event-based data.

Abstract

Current gait recognition research mainly focuses on identifying pedestrians captured by the same type of sensor, neglecting the fact that individuals may be captured by different sensors in order to adapt to various environments. A more practical approach should involve cross-modality matching across different sensors. Hence, this paper focuses on investigating the problem of cross-modality gait recognition, with the objective of accurately identifying pedestrians across diverse vision sensors. We present CrossGait inspired by the feature alignment strategy, capable of cross retrieving diverse data modalities. Specifically, we investigate the cross-modality recognition task by initially extracting features within each modality and subsequently aligning these features across modalities. To further enhance the cross-modality performance, we propose a Prototypical Modality-shared Attention Module that learns modality-shared features from two modality-specific features. Additionally, we design a Cross-modality Feature Adapter that transforms the learned modality-specific features into a unified feature space. Extensive experiments conducted on the SUSTech1K dataset demonstrate the effectiveness of CrossGait: (1) it exhibits promising cross-modality ability in retrieving pedestrians across various modalities from different sensors in diverse scenes, and (2) CrossGait not only learns modality-shared features for cross-modality gait recognition but also maintains modality-specific features for single-modality recognition.
Paper Structure (15 sections, 7 equations, 7 figures, 8 tables)

This paper contains 15 sections, 7 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Comparison between single-modality and cross-modality gait recognition. The majority of methods concentrate on pedestrian retrieval within single-modality settings, where both probe and gallery subjects are captured by the same visual sensor (green arrow for camera-based, blue arrow for LiDAR-based gait recognition). Our objective is to expand recognition across different sensors, such as LiDAR and RGB camera, as illustrated by the orange arrow.
  • Figure 2: One-stream and two-stream cross-modality framework.
  • Figure 3: Feature alignment strategy for cross-modality gait recognition. (A) illustrates the complexities of cross-modality gait recognition, including modality and intra-modality variations in raw data (represented by two identities $i$ and $k$). (B) represents the process of feature extraction into two isolated modality-specific feature spaces. (C) aligns two isolated spaces into a unified modality-shared space.
  • Figure 4: The pipeline of our proposed CrossGait. Our method involves learning both modality-specific features and modality-shared prototypes using a Prototypical Modality-Shared Attention Module (PMAM), followed by a Cross-modality Feature Adapter to transform the modality-specific features into a unified feature space that encodes the modality-shared features.
  • Figure 5: The structure of Prototypical Modality-shared Attention Module (PMAM).
  • ...and 2 more figures