Table of Contents
Fetching ...

GaitSnippet: Gait Recognition Beyond Unordered Sets and Ordered Sequences

Saihui Hou, Chenye Wang, Wenpeng Lang, Zhengxiang Lan, Yongzhen Huang

TL;DR

This work introduces a non-trivial solution for snippet-based gait recognition, focusing on Snippet Sampling and Snippet Modeling as key components, and proposes a new perspective that conceptualizes human gait as a composition of individualized actions.

Abstract

Recent advancements in gait recognition have significantly enhanced performance by treating silhouettes as either an unordered set or an ordered sequence. However, both set-based and sequence-based approaches exhibit notable limitations. Specifically, set-based methods tend to overlook short-range temporal context for individual frames, while sequence-based methods struggle to capture long-range temporal dependencies effectively. To address these challenges, we draw inspiration from human identification and propose a new perspective that conceptualizes human gait as a composition of individualized actions. Each action is represented by a series of frames, randomly selected from a continuous segment of the sequence, which we term a snippet. Fundamentally, the collection of snippets for a given sequence enables the incorporation of multi-scale temporal context, facilitating more comprehensive gait feature learning. Moreover, we introduce a non-trivial solution for snippet-based gait recognition, focusing on Snippet Sampling and Snippet Modeling as key components. Extensive experiments on four widely-used gait datasets validate the effectiveness of our proposed approach and, more importantly, highlight the potential of gait snippets. For instance, our method achieves the rank-1 accuracy of 77.5% on Gait3D and 81.7% on GREW using a 2D convolution-based backbone.

GaitSnippet: Gait Recognition Beyond Unordered Sets and Ordered Sequences

TL;DR

This work introduces a non-trivial solution for snippet-based gait recognition, focusing on Snippet Sampling and Snippet Modeling as key components, and proposes a new perspective that conceptualizes human gait as a composition of individualized actions.

Abstract

Recent advancements in gait recognition have significantly enhanced performance by treating silhouettes as either an unordered set or an ordered sequence. However, both set-based and sequence-based approaches exhibit notable limitations. Specifically, set-based methods tend to overlook short-range temporal context for individual frames, while sequence-based methods struggle to capture long-range temporal dependencies effectively. To address these challenges, we draw inspiration from human identification and propose a new perspective that conceptualizes human gait as a composition of individualized actions. Each action is represented by a series of frames, randomly selected from a continuous segment of the sequence, which we term a snippet. Fundamentally, the collection of snippets for a given sequence enables the incorporation of multi-scale temporal context, facilitating more comprehensive gait feature learning. Moreover, we introduce a non-trivial solution for snippet-based gait recognition, focusing on Snippet Sampling and Snippet Modeling as key components. Extensive experiments on four widely-used gait datasets validate the effectiveness of our proposed approach and, more importantly, highlight the potential of gait snippets. For instance, our method achieves the rank-1 accuracy of 77.5% on Gait3D and 81.7% on GREW using a 2D convolution-based backbone.

Paper Structure

This paper contains 37 sections, 3 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Illustration of gait snippets in comparison to unordered sets and ordered sequences. Best viewed in color.
  • Figure 2: Snippet sampling for training. $\{G_{1}, \cdots, G_{K}\}$ represent the total segments of a sequence, where $L$ is the segment length and $L_1$ for the first segment is a random integer to enhance sampling diversity. $\{G_{1}^{'}, \cdots, G_{M}^{'}\}$ represent the sampled segments. $M$ and $N$ denote the number of sampled snippets per sequence and the number of sampled frames per snippet, respectively.
  • Figure 3: Illustration of GaitSnippet. (1) Residual Snippet Block integrating Intra-Snippet Modeling as shown in Figure \ref{['fig:snippet_modeling']}(b) serves as the basic component to construct the backbone. (2) At the end of the backbone, we first apply Intra-Snippet Gathering (the Gathering step for Intra-Snippet Modeling) to derive snippet-level representations and then perform Cross-Snippet Modeling to obtain sequence-level representations. (3) In addition to sequence-level supervision, an auxiliary branch is introduced to enforce supervision on snippet-level features only for training.
  • Figure 4: (a) Snippet Block. (b) Residual Snippet Block. $M$ and $N$ denote the number of snippets and the number of frames per snippet in a sequence, while $C$, $H$, and $W$ represent the dimensions of channel, height, and width.
  • Figure 5: Computation cost in terms of parameters and FLOPs. The statistics are obtained on Gait3D, following the methodology of wang2023hihhuang2025occluded.