Table of Contents
Fetching ...

On Denoising Walking Videos for Gait Recognition

Dongyang Jin, Chao Fan, Jingzhe Ma, Jingkai Zhou, Weihua Chen, Shiqi Yu

TL;DR

This work tackles RGB-based gait recognition by suppressing identity-irrelevant cues such as clothing texture and background. It introduces DenoisingGait, which integrates knowledge-driven diffusion-based denoising with a geometry-driven Feature Matching module to produce a Gait Feature Field comprising static appearance and dynamic motion components. By tuning the diffusion timestep $t$ (with $t \in \{1, \dots, T\}$ and typically $T=1000$) to balance coarse structure and fine detail, the method filters gait-irrelevant RGB content, while AoD-based within-frame and cross-frame matching yields two-channel gait fields that are texture-invariant and discriminative. Experiments on CCPG, CASIA-B*, and SUSTech1K demonstrate state-of-the-art results in both within- and cross-domain settings, with ablations confirming the contributions of diffusion-based denoising, geometry-driven feature matching, background removal, and texture suppression.

Abstract

To capture individual gait patterns, excluding identity-irrelevant cues in walking videos, such as clothing texture and color, remains a persistent challenge for vision-based gait recognition. Traditional silhouette- and pose-based methods, though theoretically effective at removing such distractions, often fall short of high accuracy due to their sparse and less informative inputs. Emerging end-to-end methods address this by directly denoising RGB videos using human priors. Building on this trend, we propose DenoisingGait, a novel gait denoising method. Inspired by the philosophy that "what I cannot create, I do not understand", we turn to generative diffusion models, uncovering how they partially filter out irrelevant factors for gait understanding. Additionally, we introduce a geometry-driven Feature Matching module, which, combined with background removal via human silhouettes, condenses the multi-channel diffusion features at each foreground pixel into a two-channel direction vector. Specifically, the proposed within- and cross-frame matching respectively capture the local vectorized structures of gait appearance and motion, producing a novel flow-like gait representation termed Gait Feature Field, which further reduces residual noise in diffusion features. Experiments on the CCPG, CASIA-B*, and SUSTech1K datasets demonstrate that DenoisingGait achieves a new SoTA performance in most cases for both within- and cross-domain evaluations. Code is available at https://github.com/ShiqiYu/OpenGait.

On Denoising Walking Videos for Gait Recognition

TL;DR

This work tackles RGB-based gait recognition by suppressing identity-irrelevant cues such as clothing texture and background. It introduces DenoisingGait, which integrates knowledge-driven diffusion-based denoising with a geometry-driven Feature Matching module to produce a Gait Feature Field comprising static appearance and dynamic motion components. By tuning the diffusion timestep (with and typically ) to balance coarse structure and fine detail, the method filters gait-irrelevant RGB content, while AoD-based within-frame and cross-frame matching yields two-channel gait fields that are texture-invariant and discriminative. Experiments on CCPG, CASIA-B*, and SUSTech1K demonstrate state-of-the-art results in both within- and cross-domain settings, with ablations confirming the contributions of diffusion-based denoising, geometry-driven feature matching, background removal, and texture suppression.

Abstract

To capture individual gait patterns, excluding identity-irrelevant cues in walking videos, such as clothing texture and color, remains a persistent challenge for vision-based gait recognition. Traditional silhouette- and pose-based methods, though theoretically effective at removing such distractions, often fall short of high accuracy due to their sparse and less informative inputs. Emerging end-to-end methods address this by directly denoising RGB videos using human priors. Building on this trend, we propose DenoisingGait, a novel gait denoising method. Inspired by the philosophy that "what I cannot create, I do not understand", we turn to generative diffusion models, uncovering how they partially filter out irrelevant factors for gait understanding. Additionally, we introduce a geometry-driven Feature Matching module, which, combined with background removal via human silhouettes, condenses the multi-channel diffusion features at each foreground pixel into a two-channel direction vector. Specifically, the proposed within- and cross-frame matching respectively capture the local vectorized structures of gait appearance and motion, producing a novel flow-like gait representation termed Gait Feature Field, which further reduces residual noise in diffusion features. Experiments on the CCPG, CASIA-B*, and SUSTech1K datasets demonstrate that DenoisingGait achieves a new SoTA performance in most cases for both within- and cross-domain evaluations. Code is available at https://github.com/ShiqiYu/OpenGait.

Paper Structure

This paper contains 15 sections, 9 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: The proposed knowledge-driven denoising, derived from generative diffusion models, and the geometry-driven denoising, enforced by Feature Matching.
  • Figure 2: (a) A simple baseline on diffusion models for gait representation learning. (b) The rank-1 accuracy of our baseline with varying timestep $t$. (c) The pipeline of the proposed DenoisingGait.
  • Figure 3: (a) our Feature Matching module. (b) Assignment of Direction (AoD). Background removal operation uses pre-extracted silhouettes to mask background regions.
  • Figure 4: (a) Raw RGB images. (b) Static gait feature field, $G^{\text{Static}}$. (c) Dynamic gait feature field, $G^{\text{Dynamic}}$. (d) Activation focus on $G^{\text{Static}}$. (e) Activation focus on $G^{\text{Dynamic}}$. (For optimal viewing, please refer to the color version and zoom in.)