Table of Contents
Fetching ...

Gait Recognition via Collaborating Discriminative and Generative Diffusion Models

Haijun Xiong, Bin Feng, Bang Wang, Xinggang Wang, Wenyu Liu

TL;DR

CoD2 addresses the limitations of purely discriminative gait recognition by integrating diffusion-based generative modeling with semantic feature learning. It introduces a Multi-level Conditional Control that uses high-level identity semantics and low-level appearance/motion details to guide generation, while generated sequences reinforce discriminative learning through identity-consistent feedback. The approach achieves state-of-the-art results on four benchmarks and proves broadly compatible with existing gait extractors, offering improved robustness with modest training overhead. This collaboration between generative priors and discriminative features has practical implications for more reliable gait-based identification in varied real-world conditions.

Abstract

Gait recognition offers a non-intrusive biometric solution by identifying individuals through their walking patterns. Although discriminative models have achieved notable success in this domain, the full potential of generative models remains largely underexplored. In this paper, we introduce \textbf{CoD$^2$}, a novel framework that combines the data distribution modeling capabilities of diffusion models with the semantic representation learning strengths of discriminative models to extract robust gait features. We propose a Multi-level Conditional Control strategy that incorporates both high-level identity-aware semantic conditions and low-level visual details. Specifically, the high-level condition, extracted by the discriminative extractor, guides the generation of identity-consistent gait sequences, whereas low-level visual details, such as appearance and motion, are preserved to enhance consistency. Furthermore, the generated sequences facilitate the discriminative extractor's learning, enabling it to capture more comprehensive high-level semantic features. Extensive experiments on four datasets (SUSTech1K, CCPG, GREW, and Gait3D) demonstrate that CoD$^2$ achieves state-of-the-art performance and can be seamlessly integrated with existing discriminative methods, yielding consistent improvements.

Gait Recognition via Collaborating Discriminative and Generative Diffusion Models

TL;DR

CoD2 addresses the limitations of purely discriminative gait recognition by integrating diffusion-based generative modeling with semantic feature learning. It introduces a Multi-level Conditional Control that uses high-level identity semantics and low-level appearance/motion details to guide generation, while generated sequences reinforce discriminative learning through identity-consistent feedback. The approach achieves state-of-the-art results on four benchmarks and proves broadly compatible with existing gait extractors, offering improved robustness with modest training overhead. This collaboration between generative priors and discriminative features has practical implications for more reliable gait-based identification in varied real-world conditions.

Abstract

Gait recognition offers a non-intrusive biometric solution by identifying individuals through their walking patterns. Although discriminative models have achieved notable success in this domain, the full potential of generative models remains largely underexplored. In this paper, we introduce \textbf{CoD}, a novel framework that combines the data distribution modeling capabilities of diffusion models with the semantic representation learning strengths of discriminative models to extract robust gait features. We propose a Multi-level Conditional Control strategy that incorporates both high-level identity-aware semantic conditions and low-level visual details. Specifically, the high-level condition, extracted by the discriminative extractor, guides the generation of identity-consistent gait sequences, whereas low-level visual details, such as appearance and motion, are preserved to enhance consistency. Furthermore, the generated sequences facilitate the discriminative extractor's learning, enabling it to capture more comprehensive high-level semantic features. Extensive experiments on four datasets (SUSTech1K, CCPG, GREW, and Gait3D) demonstrate that CoD achieves state-of-the-art performance and can be seamlessly integrated with existing discriminative methods, yielding consistent improvements.

Paper Structure

This paper contains 21 sections, 11 equations, 3 figures, 14 tables.

Figures (3)

  • Figure 1: Comparison of different methods for gait recognition. (a) Naive discriminative methods, such as GaitSet GaitSet; (b) Generative-assisted methods, such as DenoisingGait DenoisingGait; (c) Our proposed CoD2, which integrates collaborating discriminative and generative models.
  • Figure 2: Overview of our proposed method. The discriminative extractor ${\mathcal{D}}$ (e.g., GaitSet, GaitGL, GaitBase, or DeepGaitV2) first extracts the identity feature ${\bm{f}}_I$ from the input gait sequence ${\bm{X}}_0$. This feature serves as a high-level semantic condition to guide the generative diffusion model ${\mathcal{G}}$ during sequence generation. The noise sequence ${\bm{X}}_t$ is composed of Gaussian noise ${\bm{X}}_t^n \sim \mathcal{N}(0, I)$ and low-level visual information ${\bm{X}}_0^m$ sampled from ${\bm{X}}_0$. The generated gait sequence $\hat{{\bm{X}}}_0$ is then processed by ${\mathcal{D}}$ to extract its identity feature $\hat{{\bm{f}}}_I$. Finally, ${\mathcal{D}}$ and ${\mathcal{G}}$ are jointly optimized with the loss ${\mathcal{L}}_D$ and ${\mathcal{L}}_G$, where ${\mathcal{G}}$ is employed only for training, while ${\mathcal{D}}$ is used for both training and inference.
  • Figure 3: Details of High-level Control Module. The S-FC denotes a separate fully connected layer, and $\boldsymbol{\lambda} \in \mathbb{R}^{C'}$ is a learnable channel-wise control vector that regulates the adjustment intensity across different feature channels.