Table of Contents
Fetching ...

GaitContour: Efficient Gait Recognition based on a Contour-Pose Representation

Yuxiang Guo, Anshul Shah, Jiang Liu, Ayush Gupta, Rama Chellappa, Cheng Peng

TL;DR

A novel, keypoint-based Contour-Pose representation, which compactly encodes both body shape and parts information is proposed, which significantly reduces the complexity of the attention operation and improves both efficiency and performance.

Abstract

Gait recognition holds the promise to robustly identify subjects based on walking patterns instead of appearance information. In recent years, this field has been dominated by learning methods based on two principal input representations: dense silhouette masks or sparse pose keypoints. In this work, we propose a novel, point-based Contour-Pose representation, which compactly expresses both body shape and body parts information. We further propose a local-to-global architecture, called GaitContour, to leverage this novel representation and efficiently compute subject embedding in two stages. The first stage consists of a local transformer that extracts features from five different body regions. The second stage then aggregates the regional features to estimate a global human gait representation. Such a design significantly reduces the complexity of the attention operation and improves efficiency and performance simultaneously. Through large scale experiments, GaitContour is shown to perform significantly better than previous point-based methods, while also being significantly more efficient than silhouette-based methods. On challenging datasets with significant distractors, GaitContour can even outperform silhouette-based methods.

GaitContour: Efficient Gait Recognition based on a Contour-Pose Representation

TL;DR

A novel, keypoint-based Contour-Pose representation, which compactly encodes both body shape and parts information is proposed, which significantly reduces the complexity of the attention operation and improves both efficiency and performance.

Abstract

Gait recognition holds the promise to robustly identify subjects based on walking patterns instead of appearance information. In recent years, this field has been dominated by learning methods based on two principal input representations: dense silhouette masks or sparse pose keypoints. In this work, we propose a novel, point-based Contour-Pose representation, which compactly expresses both body shape and body parts information. We further propose a local-to-global architecture, called GaitContour, to leverage this novel representation and efficiently compute subject embedding in two stages. The first stage consists of a local transformer that extracts features from five different body regions. The second stage then aggregates the regional features to estimate a global human gait representation. Such a design significantly reduces the complexity of the attention operation and improves efficiency and performance simultaneously. Through large scale experiments, GaitContour is shown to perform significantly better than previous point-based methods, while also being significantly more efficient than silhouette-based methods. On challenging datasets with significant distractors, GaitContour can even outperform silhouette-based methods.
Paper Structure (20 sections, 6 equations, 4 figures, 6 tables)

This paper contains 20 sections, 6 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Comparison of our proposed Contour-Pose and other gait representations. The size of bubbles denotes the number of parameters. GaitTR w/CP represents extracting Contour-Pose(CP) feature through GaitTR. GaitContour achieves a good balance between efficiency and accuracy.
  • Figure 2: (a) An Overview of GaitContour. The Contour-Pose is partitioned into five regions, i.e. head, left arm, right arm, left leg, and right leg. Local-CPT extracts features from each region separately. GaitContour combines these local features into an identity embedding through a Global Pose-Feature Transformer. This local-to-global design enhances both efficiency and effectiveness for GaitContour. (b) The structure of the Temporal Transformer Layer. It extracts the spatiotemporal correlation between each point, serving as a basic block for Local-CPT and Global-PFT.
  • Figure 3: The construction of Contour-Pose. The pose is combined with the contour points sampled from the silhouette edge. In particular, contour points are sampled based on their distances from neighborhood poses. As shown in the zoomed area, Contour-Pose is the $n$ nearby contour points from each pose with connections.
  • Figure 4: The effect of temporal information during training. Results are evaluated on the BRIAR dataset.