Table of Contents
Fetching ...

PSGait: Gait Recognition using Parsing Skeleton

Hangrui Xu, Chuanrui Zhang, Zhengxian Wu, Peng Jiao, Haoqian Wang

TL;DR

This work tackles the challenge of robust gait recognition in unconstrained environments by introducing Parsing Skeleton, a high-entropy, part-aware representation that converts pose-guided body parts into dense, CNN-friendly images. Building on this, PSGait fuses Parsing Skeletons with silhouettes to capture both fine-grained part dynamics and global shape, achieving state-of-the-art performance with improved efficiency across multiple benchmarks. The approach demonstrates strong generalization and plug-and-play compatibility with existing gait models, offering a practical pathway to deploy reliable gait systems in the wild. The authors also discuss limitations under extreme occlusion and propose future directions in adaptive fusion and architecture customization for Parsing Skeleton-based gait analysis.

Abstract

Gait recognition has emerged as a robust biometric modality due to its non-intrusive nature and resilience to occlusion. Conventional gait recognition methods typically rely on silhouettes or skeletons. Despite their success in gait recognition for controlled laboratory environments, they usually fail in real-world scenarios due to their limited information entropy for gait representations. To achieve accurate gait recognition in the wild, we propose a novel gait representation, named Parsing Skeleton. This representation innovatively introduces the skeleton-guided human parsing method to capture fine-grained body dynamics, so they have much higher information entropy to encode the shapes and dynamics of fine-grained human parts during walking. Moreover, to effectively explore the capability of the Parsing Skeleton representation, we propose a novel Parsing Skeleton-based gait recognition framework, named PSGait, which takes Parsing Skeletons and silhouettes as input. By fusing these two modalities, the resulting image sequences are fed into gait recognition models for enhanced individual differentiation. We conduct comprehensive benchmarks on various datasets to evaluate our model. PSGait outperforms existing state-of-the-art multimodal methods that utilize both skeleton and silhouette inputs while significantly reducing computational resources. Furthermore, as a plug-and-play method, PSGait leads to a maximum improvement of 10.9% in Rank-1 accuracy across various gait recognition models. These results demonstrate that Parsing Skeleton offers a lightweight, effective, and highly generalizable representation for gait recognition in the wild.

PSGait: Gait Recognition using Parsing Skeleton

TL;DR

This work tackles the challenge of robust gait recognition in unconstrained environments by introducing Parsing Skeleton, a high-entropy, part-aware representation that converts pose-guided body parts into dense, CNN-friendly images. Building on this, PSGait fuses Parsing Skeletons with silhouettes to capture both fine-grained part dynamics and global shape, achieving state-of-the-art performance with improved efficiency across multiple benchmarks. The approach demonstrates strong generalization and plug-and-play compatibility with existing gait models, offering a practical pathway to deploy reliable gait systems in the wild. The authors also discuss limitations under extreme occlusion and propose future directions in adaptive fusion and architecture customization for Parsing Skeleton-based gait analysis.

Abstract

Gait recognition has emerged as a robust biometric modality due to its non-intrusive nature and resilience to occlusion. Conventional gait recognition methods typically rely on silhouettes or skeletons. Despite their success in gait recognition for controlled laboratory environments, they usually fail in real-world scenarios due to their limited information entropy for gait representations. To achieve accurate gait recognition in the wild, we propose a novel gait representation, named Parsing Skeleton. This representation innovatively introduces the skeleton-guided human parsing method to capture fine-grained body dynamics, so they have much higher information entropy to encode the shapes and dynamics of fine-grained human parts during walking. Moreover, to effectively explore the capability of the Parsing Skeleton representation, we propose a novel Parsing Skeleton-based gait recognition framework, named PSGait, which takes Parsing Skeletons and silhouettes as input. By fusing these two modalities, the resulting image sequences are fed into gait recognition models for enhanced individual differentiation. We conduct comprehensive benchmarks on various datasets to evaluate our model. PSGait outperforms existing state-of-the-art multimodal methods that utilize both skeleton and silhouette inputs while significantly reducing computational resources. Furthermore, as a plug-and-play method, PSGait leads to a maximum improvement of 10.9% in Rank-1 accuracy across various gait recognition models. These results demonstrate that Parsing Skeleton offers a lightweight, effective, and highly generalizable representation for gait recognition in the wild.

Paper Structure

This paper contains 26 sections, 14 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: The comparison of various gait representations. Only a single frame is displayed for brevity. Result shows the average Rank-1 on CCPGccpg for different representation inputs to GaitBaseopengait.
  • Figure 2: Overview of the PSGait framework. During data preprocessing, the silhouettes and skeletons are generated from video frames. In multimodal fusion, after the parsing section, the Parsing Skeletons are fused with silhouettes. The fused data is then fed into the gait recognition model for individual differentiation. The parsing section refines body part representations through selected points, creating circular and linear representations to enhance the robustness and accuracy of gait recognition in the wild. Both the data preprocessing and gait recognition models are interchangeable.
  • Figure 3: Visualization of different gait representations. (a) Raw RGB frames from a walking sequence. (b) Binary silhouette images extracted via background subtraction. (c) 2D skeleton keypoints estimated from the RGB frames. (d) Our proposed Parsing Skeleton, which integrates part-level semantics and structural connectivity, providing richer spatial and motion cues for gait recognition. (e) Parsing Skeleton combined with silhouettes — the final fused representation used in PSGait, combining part-level semantics and body contours to enhance discriminability and structural consistency in gait recognition.
  • Figure 4: Illustration of the Parsing Skeleton construction process. (a) Original pose keypoints in coordinate form. (b) Skeleton image generated by connecting joints with lines and circles. (c) Parsing skeleton with semantic part colors. (d) Final Parsing Skeleton fused with silhouette, enriching body structure details with spatial and semantic cues.