Table of Contents
Fetching ...

Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition

Haijun Xiong, Bin Feng, Xinggang Wang, Wenyu Liu

TL;DR

This work tackles the persistent problem of ID feature entanglement with non-ID confounders in gait recognition by introducing CLTD, a causality-inspired discriminative learning module. CLTD integrates a Cross Pixel-wise Attention Generator and a Fourier Projection Head to eliminate confounders across spatial, temporal, and spectral domains, supervised by a Factual and Counterfactual Loss that leverages InfoNCE and Total Direct Effect concepts. By deploying CLTD at multiple stages of a gait backbone, the method achieves state-of-the-art performance across multiple datasets (OU-MVLP, CASIA-B, GREW, Gait3D) and demonstrates strong robustness on wild data, with notable improvements over baselines. The approach offers a versatile, plug-and-play training paradigm that can enhance diverse gait recognition models and potentially extend to other computer vision tasks requiring robust, confounder-free representations.

Abstract

Gait recognition is a biometric technology that distinguishes individuals by their walking patterns. However, previous methods face challenges when accurately extracting identity features because they often become entangled with non-identity clues. To address this challenge, we propose CLTD, a causality-inspired discriminative feature learning module designed to effectively eliminate the influence of confounders in triple domains, \ie, spatial, temporal, and spectral. Specifically, we utilize the Cross Pixel-wise Attention Generator (CPAG) to generate attention distributions for factual and counterfactual features in spatial and temporal domains. Then, we introduce the Fourier Projection Head (FPH) to project spatial features into the spectral space, which preserves essential information while reducing computational costs. Additionally, we employ an optimization method with contrastive learning to enforce semantic consistency constraints across sequences from the same subject. Our approach has demonstrated significant performance improvements on challenging datasets, proving its effectiveness. Moreover, it can be seamlessly integrated into existing gait recognition methods.

Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition

TL;DR

This work tackles the persistent problem of ID feature entanglement with non-ID confounders in gait recognition by introducing CLTD, a causality-inspired discriminative learning module. CLTD integrates a Cross Pixel-wise Attention Generator and a Fourier Projection Head to eliminate confounders across spatial, temporal, and spectral domains, supervised by a Factual and Counterfactual Loss that leverages InfoNCE and Total Direct Effect concepts. By deploying CLTD at multiple stages of a gait backbone, the method achieves state-of-the-art performance across multiple datasets (OU-MVLP, CASIA-B, GREW, Gait3D) and demonstrates strong robustness on wild data, with notable improvements over baselines. The approach offers a versatile, plug-and-play training paradigm that can enhance diverse gait recognition models and potentially extend to other computer vision tasks requiring robust, confounder-free representations.

Abstract

Gait recognition is a biometric technology that distinguishes individuals by their walking patterns. However, previous methods face challenges when accurately extracting identity features because they often become entangled with non-identity clues. To address this challenge, we propose CLTD, a causality-inspired discriminative feature learning module designed to effectively eliminate the influence of confounders in triple domains, \ie, spatial, temporal, and spectral. Specifically, we utilize the Cross Pixel-wise Attention Generator (CPAG) to generate attention distributions for factual and counterfactual features in spatial and temporal domains. Then, we introduce the Fourier Projection Head (FPH) to project spatial features into the spectral space, which preserves essential information while reducing computational costs. Additionally, we employ an optimization method with contrastive learning to enforce semantic consistency constraints across sequences from the same subject. Our approach has demonstrated significant performance improvements on challenging datasets, proving its effectiveness. Moreover, it can be seamlessly integrated into existing gait recognition methods.
Paper Structure (14 sections, 8 equations, 5 figures, 3 tables)

This paper contains 14 sections, 8 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Motivation. Illustration of entanglement between non-ID and ID clues. With our approach, the impact of non-ID clues is systematically eliminated.
  • Figure 2: Overview of our approach. We illustrate CLTD using DyGait wang2023dygait as a backbone. In \ref{['sec: exp']}, we will show its versatility to various gait recognition models. Multiple CLTDs are used along the backbone of DyGait. Each CLTD consists of two branches: the Factual Branch and the Counterfactual Branch. These branches aim to generate the factual feature $\boldsymbol{x}_f$ and the counterfactual feature $\boldsymbol{x}_{cf}$, respectively. Notably, CLTDs are only used for training and excluded during testing.
  • Figure 3: Detailes of FPH. The symbol ⓒ stands for the concatenating operation.
  • Figure 4: t-SNE visualization examples of feature distributions between the baseline and our approach on CASIA-B, OU-MVLP, GREW, and Gait3D, respectively. Different colors denote distinct identities. Best viewed by color and zooming in.
  • Figure 5: Visualization of heatmaps representing counterfactual and factual features after CLTD. Best viewed in color. (a) Original input. (b) Counterfactual features. (c) Factual features.