Walking Further: Semantic-aware Multimodal Gait Recognition Under Long-Range Conditions

Zhiyang Lu; Wen Jiang; Tianren Wu; Zhichao Wang; Changwang Zhang; Siqi Shen; Ming Cheng

Walking Further: Semantic-aware Multimodal Gait Recognition Under Long-Range Conditions

Zhiyang Lu, Wen Jiang, Tianren Wu, Zhichao Wang, Changwang Zhang, Siqi Shen, Ming Cheng

Abstract

Gait recognition is an emerging biometric technology that enables non-intrusive and hard-to-spoof human identification. However, most existing methods are confined to short-range, unimodal settings and fail to generalize to long-range and cross-distance scenarios under real-world conditions. To address this gap, we present \textbf{LRGait}, the first LiDAR-Camera multimodal benchmark designed for robust long-range gait recognition across diverse outdoor distances and environments. We further propose \textbf{EMGaitNet}, an end-to-end framework tailored for long-range multimodal gait recognition. To bridge the modality gap between RGB images and point clouds, we introduce a semantic-guided fusion pipeline. A CLIP-based Semantic Mining (SeMi) module first extracts human body-part-aware semantic cues, which are then employed to align 2D and 3D features via a Semantic-Guided Alignment (SGA) module within a unified embedding space. A Symmetric Cross-Attention Fusion (SCAF) module hierarchically integrates visual contours and 3D geometric features, and a Spatio-Temporal (ST) module captures global gait dynamics. Extensive experiments on various gait datasets validate the effectiveness of our method.

Walking Further: Semantic-aware Multimodal Gait Recognition Under Long-Range Conditions

Abstract

Paper Structure (29 sections, 21 equations, 11 figures, 5 tables)

This paper contains 29 sections, 21 equations, 11 figures, 5 tables.

Introduction
Related Works
Gait Recognition Methods
Gait Recognition Benchmark
Long-Range Gait Benchmark
Overall
Data Collection
Annotations and Representations
Statistics and Evaluation Metrics
Semantic-Guided Multimodal Gait Recognition
Problem Definition
Feature Extraction
CLIP-Based Semantic Mining
Semantic-Guided Alignment Module
Symmetric Cross-Attention Fusion Module
...and 14 more sections

Figures (11)

Figure 1: Visualization of our proposed multimodal dataset LRGait at various and long-range distances.
Figure 2: Visualizations under daytime and nighttime.
Figure 3: Statistics of the proposed LRGait.
Figure 4: Illustration of the proposed framework.
Figure 5: Details of the proposed SGA module.
...and 6 more figures

Walking Further: Semantic-aware Multimodal Gait Recognition Under Long-Range Conditions

Abstract

Walking Further: Semantic-aware Multimodal Gait Recognition Under Long-Range Conditions

Authors

Abstract

Table of Contents

Figures (11)