GaitPoint+: A Gait Recognition Network Incorporating Point Cloud Analysis and Recycling
Huantao Ren, Jiajing Chen, Senem Velipasalar
TL;DR
GaitPoint+ addresses the robustness gap in gait recognition under appearance changes by fusing silhouette-based features with skeleton information modeled as a 3D point cloud. It uses a lightweight PointNet-based skeleton module and introduces Recycling Max-Pooling (RMP) to reclaim discarded points, coupled with a multi-term loss that combines triplet objectives and a refinement component; the overall objective is $L = L_{c\_tp} + L_{p\_tp} + L_{g\_tp} + L_{rmp}$. Empirical results on CASIA-B show consistent improvements across state-of-the-art silhouette baselines, particularly in challenging BG and CL scenarios, with additional gains when RMP is applied; OUMVLP experiments indicate the approach generalizes with dataset-dependent effects. The work demonstrates that 3D point-cloud processing of skeletal keypoints can be efficiently integrated with CNN-based silhouette methods to yield more discriminative and robust gait representations, paving the way for broader use of lightweight point-cloud modules in biometric recognition.
Abstract
Gait is a behavioral biometric modality that can be used to recognize individuals by the way they walk from a far distance. Most existing gait recognition approaches rely on either silhouettes or skeletons, while their joint use is underexplored. Features from silhouettes and skeletons can provide complementary information for more robust recognition against appearance changes or pose estimation errors. To exploit the benefits of both silhouette and skeleton features, we propose a new gait recognition network, referred to as the GaitPoint+. Our approach models skeleton key points as a 3D point cloud, and employs a computational complexity-conscious 3D point processing approach to extract skeleton features, which are then combined with silhouette features for improved accuracy. Since silhouette- or CNN-based methods already require considerable amount of computational resources, it is preferable that the key point learning module is faster and more lightweight. We present a detailed analysis of the utilization of every human key point after the use of traditional max-pooling, and show that while elbow and ankle points are used most commonly, many useful points are discarded by max-pooling. Thus, we present a method to recycle some of the discarded points by a Recycling Max-Pooling module, during processing of skeleton point clouds, and achieve further performance improvement. We provide a comprehensive set of experimental results showing that (i) incorporating skeleton features obtained by a point-based 3D point cloud processing approach boosts the performance of three different state-of-the-art silhouette- and CNN-based baselines; (ii) recycling the discarded points increases the accuracy further. Ablation studies are also provided to show the effectiveness and contribution of different components of our approach.
