Table of Contents
Fetching ...

HorGait: A Hybrid Model for Accurate Gait Recognition in LiDAR Point Cloud Planar Projections

Jiaxing Hao, Yanxi Wang, Zhigang Chang, Hongmin Gao, Zihao Cheng, Chen Wu, Xin Zhao, Peiye Fang, Rachmat Muwardi

TL;DR

HorGait addresses LiDAR-based gait recognition by integrating a Transformer framework with CNN-based segmentation through the LiDAR Hybrid Module (LHM), enabling input adaptation and high-order spatial interactions on planar LiDAR projections. The method mitigates dumb patches via a recursive gated convolution ($g^{\Omega}$Conv) and large-kernel CNN blocks, achieving strong Transformer-only performance while preserving full Transformer processing. Experiments on SUSTech1K show HorGait as the first LiDAR gait model to outperform state-of-the-art Transformer methods, with the best configuration $[1,1,3,3]$ and planar projection, though CNN methods still hold advantages in some overall metrics. The work underscores the potential of high-order Transformer interactions and projection-aware design for robust, privacy-preserving gait recognition in challenging environments.

Abstract

Gait recognition is a remote biometric technology that utilizes the dynamic characteristics of human movement to identify individuals even under various extreme lighting conditions. Due to the limitation in spatial perception capability inherent in 2D gait representations, LiDAR can directly capture 3D gait features and represent them as point clouds, reducing environmental and lighting interference in recognition while significantly advancing privacy protection. For complex 3D representations, shallow networks fail to achieve accurate recognition, making vision Transformers the foremost prevalent method. However, the prevalence of dumb patches has limited the widespread use of Transformer architecture in gait recognition. This paper proposes a method named HorGait, which utilizes a hybrid model with a Transformer architecture for gait recognition on the planar projection of 3D point clouds from LiDAR. Specifically, it employs a hybrid model structure called LHM Block to achieve input adaptation, long-range, and high-order spatial interaction of the Transformer architecture. Additionally, it uses large convolutional kernel CNNs to segment the input representation, replacing attention windows to reduce dumb patches. We conducted extensive experiments, and the results show that HorGait achieves state-of-the-art performance among Transformer architecture methods on the SUSTech1K dataset, verifying that the hybrid model can complete the full Transformer process and perform better in point cloud planar projection. The outstanding performance of HorGait offers new insights for the future application of the Transformer architecture in gait recognition.

HorGait: A Hybrid Model for Accurate Gait Recognition in LiDAR Point Cloud Planar Projections

TL;DR

HorGait addresses LiDAR-based gait recognition by integrating a Transformer framework with CNN-based segmentation through the LiDAR Hybrid Module (LHM), enabling input adaptation and high-order spatial interactions on planar LiDAR projections. The method mitigates dumb patches via a recursive gated convolution (Conv) and large-kernel CNN blocks, achieving strong Transformer-only performance while preserving full Transformer processing. Experiments on SUSTech1K show HorGait as the first LiDAR gait model to outperform state-of-the-art Transformer methods, with the best configuration and planar projection, though CNN methods still hold advantages in some overall metrics. The work underscores the potential of high-order Transformer interactions and projection-aware design for robust, privacy-preserving gait recognition in challenging environments.

Abstract

Gait recognition is a remote biometric technology that utilizes the dynamic characteristics of human movement to identify individuals even under various extreme lighting conditions. Due to the limitation in spatial perception capability inherent in 2D gait representations, LiDAR can directly capture 3D gait features and represent them as point clouds, reducing environmental and lighting interference in recognition while significantly advancing privacy protection. For complex 3D representations, shallow networks fail to achieve accurate recognition, making vision Transformers the foremost prevalent method. However, the prevalence of dumb patches has limited the widespread use of Transformer architecture in gait recognition. This paper proposes a method named HorGait, which utilizes a hybrid model with a Transformer architecture for gait recognition on the planar projection of 3D point clouds from LiDAR. Specifically, it employs a hybrid model structure called LHM Block to achieve input adaptation, long-range, and high-order spatial interaction of the Transformer architecture. Additionally, it uses large convolutional kernel CNNs to segment the input representation, replacing attention windows to reduce dumb patches. We conducted extensive experiments, and the results show that HorGait achieves state-of-the-art performance among Transformer architecture methods on the SUSTech1K dataset, verifying that the hybrid model can complete the full Transformer process and perform better in point cloud planar projection. The outstanding performance of HorGait offers new insights for the future application of the Transformer architecture in gait recognition.

Paper Structure

This paper contains 17 sections, 7 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: The main representations of gait include silhouettes obtained through cameras, manually extracted 3D bones, and 3D point clouds obtained directly by using LiDAR. LiDAR allows for the preservation of the target's original 3D features while also protecting privacy.
  • Figure 2: Gait recognition datasets often contain a large number of information-free areas, leading to numerous 'dumb patches' in the Transformer architecture. This significantly increases the risk of generating useless or invalid gradients during the self-attention calculation process. The convolution kernel's movement in convolutional networks can help reduce the number of these dumb patches. To address this issue, current solutions include integrating CNNs into the Transformer framework.
  • Figure 3: The position and radius of the reference sphere influence the spherical projection. The z-axis height of the reference sphere determines the height of the compression center in the projection; too high or too low will result in missing points.
  • Figure 4: (a) The CNN network structure in LHM Block comprises two convolutional networks with varying convolution kernel sizes.(b) The FFN layer in LHM Block consists of two linear layers and a GELU activation layer.
  • Figure 5: (a) The network structure of SwinGait, where the backbone consists of four stages. (b) The structure of SwinTransformer V2 block (c) HorNet block (c) LHM block. This work uses these blocks to replace the stages in the SwinGait network structure to construct the Transformer method and the hybrid model method.
  • ...and 4 more figures