Table of Contents
Fetching ...

It Takes Two: Accurate Gait Recognition in the Wild via Cross-granularity Alignment

Jinkai Zheng, Xinchen Liu, Boyue Zhang, Chenggang Yan, Jiyong Zhang, Wu Liu, Yongdong Zhang

TL;DR

A novel cross-granularity alignment gait recognition method, named XGait, to unleash the power of gait representations of different granularity to discover the advantages of silhouette and parsing and overcome their limitations.

Abstract

Existing studies for gait recognition primarily utilized sequences of either binary silhouette or human parsing to encode the shapes and dynamics of persons during walking. Silhouettes exhibit accurate segmentation quality and robustness to environmental variations, but their low information entropy may result in sub-optimal performance. In contrast, human parsing provides fine-grained part segmentation with higher information entropy, but the segmentation quality may deteriorate due to the complex environments. To discover the advantages of silhouette and parsing and overcome their limitations, this paper proposes a novel cross-granularity alignment gait recognition method, named XGait, to unleash the power of gait representations of different granularity. To achieve this goal, the XGait first contains two branches of backbone encoders to map the silhouette sequences and the parsing sequences into two latent spaces, respectively. Moreover, to explore the complementary knowledge across the features of two representations, we design the Global Cross-granularity Module (GCM) and the Part Cross-granularity Module (PCM) after the two encoders. In particular, the GCM aims to enhance the quality of parsing features by leveraging global features from silhouettes, while the PCM aligns the dynamics of human parts between silhouette and parsing features using the high information entropy in parsing sequences. In addition, to effectively guide the alignment of two representations with different granularity at the part level, an elaborate-designed learnable division mechanism is proposed for the parsing features. Comprehensive experiments on two large-scale gait datasets not only show the superior performance of XGait with the Rank-1 accuracy of 80.5% on Gait3D and 88.3% CCPG but also reflect the robustness of the learned features even under challenging conditions like occlusions and cloth changes.

It Takes Two: Accurate Gait Recognition in the Wild via Cross-granularity Alignment

TL;DR

A novel cross-granularity alignment gait recognition method, named XGait, to unleash the power of gait representations of different granularity to discover the advantages of silhouette and parsing and overcome their limitations.

Abstract

Existing studies for gait recognition primarily utilized sequences of either binary silhouette or human parsing to encode the shapes and dynamics of persons during walking. Silhouettes exhibit accurate segmentation quality and robustness to environmental variations, but their low information entropy may result in sub-optimal performance. In contrast, human parsing provides fine-grained part segmentation with higher information entropy, but the segmentation quality may deteriorate due to the complex environments. To discover the advantages of silhouette and parsing and overcome their limitations, this paper proposes a novel cross-granularity alignment gait recognition method, named XGait, to unleash the power of gait representations of different granularity. To achieve this goal, the XGait first contains two branches of backbone encoders to map the silhouette sequences and the parsing sequences into two latent spaces, respectively. Moreover, to explore the complementary knowledge across the features of two representations, we design the Global Cross-granularity Module (GCM) and the Part Cross-granularity Module (PCM) after the two encoders. In particular, the GCM aims to enhance the quality of parsing features by leveraging global features from silhouettes, while the PCM aligns the dynamics of human parts between silhouette and parsing features using the high information entropy in parsing sequences. In addition, to effectively guide the alignment of two representations with different granularity at the part level, an elaborate-designed learnable division mechanism is proposed for the parsing features. Comprehensive experiments on two large-scale gait datasets not only show the superior performance of XGait with the Rank-1 accuracy of 80.5% on Gait3D and 88.3% CCPG but also reflect the robustness of the learned features even under challenging conditions like occlusions and cloth changes.

Paper Structure

This paper contains 25 sections, 6 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Comparisons of different gait recognition methods, i.e., GaitSet aaai/ChaoHZF19, MTSGait mtsgait_zheng_mm2022, GaitBase opengait, DyGait dygait, and ParsingGait parsinggait_gps on the Gait3D dataset in terms of Rank-1 accuracy. (Best viewed in color.)
  • Figure 2: The architecture of our XGait. In the preprocessing stage, the silhouette sequence $S \in \mathbb{R}^{T \times C \times H \times W}$ and the parsing sequence $P \in \mathbb{R}^{T \times C \times H \times W}$ are extracted from the RGB sequences by segmentation method and human parsing model, respectively. In the Encoding stage, we employ two separate ResNet-like structure backbones $F_S(\cdot)$ and $F_P(\cdot)$ to extract the mid-level features from the silhouette sequence and the parsing sequence, respectively. In the Cross-granularity stage, the Global Cross-granularity Module (GCM) and the Part Cross-granularity Module (PCM) are proposed to explore the complementary knowledge from these two granularity features across global and part levels. GAP denotes the Global Average Pooling. FC means the Fully Connected layer. CA refers to the Cross-granularity Alignment module. FMH represents the Feature Mapping Head.
  • Figure 3: The illustration of the learnable division. $\gamma$ is the learnable parameter used to modulate the weight of the i-th body part. (Best viewed in color.)
  • Figure 4: The pipeline of the Part Cross-granularity Module (PCM). CA-Upper, CA-Middle, and CA-Down are three independent Cross-granularity Alignment modules for extracting fine-grained mutual information from various human body parts. (Best viewed in color.)
  • Figure 5: The heatmaps of $\mathbf{F}_s$, $\mathbf{F}_p$, $\mathbf{F}_{ga}$, and $\mathbf{F}_{pa}$ in our XGait framework. $Sil.$ denotes Silhouette. (Best viewed in color.)
  • ...and 4 more figures