Table of Contents
Fetching ...

GaitGS: Temporal Feature Learning in Granularity and Span Dimension for Gait Recognition

Haijun Xiong, Yunze Deng, Bin Feng, Xinggang Wang, Wenyu Liu

TL;DR

GaitGS addresses the gap in gait recognition by jointly modeling temporal information across multiple granularity levels and temporal spans. It introduces the Multi-Granularity Feature Extractor (MGFE) to capture micro- and macro-motion and the Multi-Span Feature Extractor (MSFE) to extract local and global temporal cues, complemented by Prior Information Embedding Generation (PIEG) and a transformer-based Global-information Capture Module (GCM) with grouped-convolution positional encoding. The approach achieves state-of-the-art results on CASIA-B and OU-MVLP, demonstrating robustness to variations in speed and appearance and offering strong cross-view performance. This work advances practical gait recognition by integrating multi-dimensional temporal cues and provides code to support reproducibility.

Abstract

Gait recognition, a growing field in biological recognition technology, utilizes distinct walking patterns for accurate individual identification. However, existing methods lack the incorporation of temporal information. To reach the full potential of gait recognition, we advocate for the consideration of temporal features at varying granularities and spans. This paper introduces a novel framework, GaitGS, which aggregates temporal features simultaneously in both granularity and span dimensions. Specifically, the Multi-Granularity Feature Extractor (MGFE) is designed to capture micro-motion and macro-motion information at fine and coarse levels respectively, while the Multi-Span Feature Extractor (MSFE) generates local and global temporal representations. Through extensive experiments on two datasets, our method demonstrates state-of-the-art performance, achieving Rank-1 accuracy of 98.2%, 96.5%, and 89.7% on CASIA-B under different conditions, and 97.6% on OU-MVLP. The source code will be available at https://github.com/Haijun-Xiong/GaitGS.

GaitGS: Temporal Feature Learning in Granularity and Span Dimension for Gait Recognition

TL;DR

GaitGS addresses the gap in gait recognition by jointly modeling temporal information across multiple granularity levels and temporal spans. It introduces the Multi-Granularity Feature Extractor (MGFE) to capture micro- and macro-motion and the Multi-Span Feature Extractor (MSFE) to extract local and global temporal cues, complemented by Prior Information Embedding Generation (PIEG) and a transformer-based Global-information Capture Module (GCM) with grouped-convolution positional encoding. The approach achieves state-of-the-art results on CASIA-B and OU-MVLP, demonstrating robustness to variations in speed and appearance and offering strong cross-view performance. This work advances practical gait recognition by integrating multi-dimensional temporal cues and provides code to support reproducibility.

Abstract

Gait recognition, a growing field in biological recognition technology, utilizes distinct walking patterns for accurate individual identification. However, existing methods lack the incorporation of temporal information. To reach the full potential of gait recognition, we advocate for the consideration of temporal features at varying granularities and spans. This paper introduces a novel framework, GaitGS, which aggregates temporal features simultaneously in both granularity and span dimensions. Specifically, the Multi-Granularity Feature Extractor (MGFE) is designed to capture micro-motion and macro-motion information at fine and coarse levels respectively, while the Multi-Span Feature Extractor (MSFE) generates local and global temporal representations. Through extensive experiments on two datasets, our method demonstrates state-of-the-art performance, achieving Rank-1 accuracy of 98.2%, 96.5%, and 89.7% on CASIA-B under different conditions, and 97.6% on OU-MVLP. The source code will be available at https://github.com/Haijun-Xiong/GaitGS.
Paper Structure (13 sections, 9 equations, 5 figures, 5 tables)

This paper contains 13 sections, 9 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Motivation: (a) Human body parts display distinct movement patterns during walking, with variations in motion across different regions. (b) Accurate discrimination of gait sequences requires both local and global temporal clues.
  • Figure 2: Comparison of temporal modeling methods between previous gait recognition and our method in terms of temporal granularity and span.
  • Figure 3: (a) Overview of GaitGS. The Multi-Granularity Feature Extractor (MGFE) extracts both the fine-level feature and coarse-level feature from the initial shallow feature. The Multi-Span Feature Extractor (MSFE) generates local and global temporal information at both fine and coarse levels respectively. (b) Details of MGFE, comprising the Fine Branch Feature Extractor and the Coarse Branch Feature Extractor. The Unit Temporal Aggregation (UTA) operation aims to fuse fine-level information into coarse-level features. (c) Details of MSFE, consisting of MCM Fan_2020_CVPR and Global-information Capture Module (GCM). Take the fine level as an example. MCM captures local temporal details, while GCM focuses on global temporal clues.
  • Figure 4: Details of PIEG. The maximum score is marked by the red box, and the selected embedding $E_{prior}$ is indicated by the red arrow.
  • Figure 5: Details of GCM. Taking the generation process of fine-level global temporal feature $S_{f}^g$ as an example.