HorGait: A Hybrid Model for Accurate Gait Recognition in LiDAR Point Cloud Planar Projections
Jiaxing Hao, Yanxi Wang, Zhigang Chang, Hongmin Gao, Zihao Cheng, Chen Wu, Xin Zhao, Peiye Fang, Rachmat Muwardi
TL;DR
HorGait addresses LiDAR-based gait recognition by integrating a Transformer framework with CNN-based segmentation through the LiDAR Hybrid Module (LHM), enabling input adaptation and high-order spatial interactions on planar LiDAR projections. The method mitigates dumb patches via a recursive gated convolution ($g^{\Omega}$Conv) and large-kernel CNN blocks, achieving strong Transformer-only performance while preserving full Transformer processing. Experiments on SUSTech1K show HorGait as the first LiDAR gait model to outperform state-of-the-art Transformer methods, with the best configuration $[1,1,3,3]$ and planar projection, though CNN methods still hold advantages in some overall metrics. The work underscores the potential of high-order Transformer interactions and projection-aware design for robust, privacy-preserving gait recognition in challenging environments.
Abstract
Gait recognition is a remote biometric technology that utilizes the dynamic characteristics of human movement to identify individuals even under various extreme lighting conditions. Due to the limitation in spatial perception capability inherent in 2D gait representations, LiDAR can directly capture 3D gait features and represent them as point clouds, reducing environmental and lighting interference in recognition while significantly advancing privacy protection. For complex 3D representations, shallow networks fail to achieve accurate recognition, making vision Transformers the foremost prevalent method. However, the prevalence of dumb patches has limited the widespread use of Transformer architecture in gait recognition. This paper proposes a method named HorGait, which utilizes a hybrid model with a Transformer architecture for gait recognition on the planar projection of 3D point clouds from LiDAR. Specifically, it employs a hybrid model structure called LHM Block to achieve input adaptation, long-range, and high-order spatial interaction of the Transformer architecture. Additionally, it uses large convolutional kernel CNNs to segment the input representation, replacing attention windows to reduce dumb patches. We conducted extensive experiments, and the results show that HorGait achieves state-of-the-art performance among Transformer architecture methods on the SUSTech1K dataset, verifying that the hybrid model can complete the full Transformer process and perform better in point cloud planar projection. The outstanding performance of HorGait offers new insights for the future application of the Transformer architecture in gait recognition.
