Table of Contents
Fetching ...

BigGait: Learning Gait Representation You Want by Large Vision Models

Dingqiang Ye, Chao Fan, Jingzhe Ma, Xiaoming Liu, Shiqi Yu

TL;DR

The paper tackles the high annotation cost of task-specific gait representations by leveraging all-purpose knowledge from large vision models (LVMs) to learn gait representations without external supervision. It introduces BigGait, which uses a frozen DINOv2 upstream, a Gait Representation Extractor (GRE) with mask, appearance, and denoising branches, and a downstream GaitBase for metric learning, aided by a Pad-and-Resize trick. Empirically, BigGait achieves state-of-the-art results on CCPG and strong cross-domain transfer on CASIA-B* and SUSTech1K, supported by visualizations and ablations that show effective suppression of gait-irrelevant texture and robust foreground cues. The work highlights both the practical potential of LVM-based gait representations and future challenges around interpretability and purity, offering a foundation for broader adoption and further research in LVM-based gait recognition, with code available at https://github.com/ShiqiYu/OpenGait.

Abstract

Gait recognition stands as one of the most pivotal remote identification technologies and progressively expands across research and industry communities. However, existing gait recognition methods heavily rely on task-specific upstream driven by supervised learning to provide explicit gait representations like silhouette sequences, which inevitably introduce expensive annotation costs and potential error accumulation. Escaping from this trend, this work explores effective gait representations based on the all-purpose knowledge produced by task-agnostic Large Vision Models (LVMs) and proposes a simple yet efficient gait framework, termed BigGait. Specifically, the Gait Representation Extractor (GRE) within BigGait draws upon design principles from established gait representations, effectively transforming all-purpose knowledge into implicit gait representations without requiring third-party supervision signals. Experiments on CCPG, CAISA-B* and SUSTech1K indicate that BigGait significantly outperforms the previous methods in both within-domain and cross-domain tasks in most cases, and provides a more practical paradigm for learning the next-generation gait representation. Finally, we delve into prospective challenges and promising directions in LVMs-based gait recognition, aiming to inspire future work in this emerging topic. The source code is available at https://github.com/ShiqiYu/OpenGait.

BigGait: Learning Gait Representation You Want by Large Vision Models

TL;DR

The paper tackles the high annotation cost of task-specific gait representations by leveraging all-purpose knowledge from large vision models (LVMs) to learn gait representations without external supervision. It introduces BigGait, which uses a frozen DINOv2 upstream, a Gait Representation Extractor (GRE) with mask, appearance, and denoising branches, and a downstream GaitBase for metric learning, aided by a Pad-and-Resize trick. Empirically, BigGait achieves state-of-the-art results on CCPG and strong cross-domain transfer on CASIA-B* and SUSTech1K, supported by visualizations and ablations that show effective suppression of gait-irrelevant texture and robust foreground cues. The work highlights both the practical potential of LVM-based gait representations and future challenges around interpretability and purity, offering a foundation for broader adoption and further research in LVM-based gait recognition, with code available at https://github.com/ShiqiYu/OpenGait.

Abstract

Gait recognition stands as one of the most pivotal remote identification technologies and progressively expands across research and industry communities. However, existing gait recognition methods heavily rely on task-specific upstream driven by supervised learning to provide explicit gait representations like silhouette sequences, which inevitably introduce expensive annotation costs and potential error accumulation. Escaping from this trend, this work explores effective gait representations based on the all-purpose knowledge produced by task-agnostic Large Vision Models (LVMs) and proposes a simple yet efficient gait framework, termed BigGait. Specifically, the Gait Representation Extractor (GRE) within BigGait draws upon design principles from established gait representations, effectively transforming all-purpose knowledge into implicit gait representations without requiring third-party supervision signals. Experiments on CCPG, CAISA-B* and SUSTech1K indicate that BigGait significantly outperforms the previous methods in both within-domain and cross-domain tasks in most cases, and provides a more practical paradigm for learning the next-generation gait representation. Finally, we delve into prospective challenges and promising directions in LVMs-based gait recognition, aiming to inspire future work in this emerging topic. The source code is available at https://github.com/ShiqiYu/OpenGait.
Paper Structure (18 sections, 7 equations, 7 figures, 13 tables)

This paper contains 18 sections, 7 equations, 7 figures, 13 tables.

Figures (7)

  • Figure 1: The upstream and downstream parts of existing gait recognition methods are responsible for gait representation and metric learning, respectively.
  • Figure 2: The illustration of (a) body proportion preservation trick is provided in the Supplementary Material. (b)-(d) respectively present the visualization of intermediate representation generated by (b) mask, (c) appearance, and (d) denoising branch.
  • Figure 3: The workflow of BigGait. Specifically, the upstream model is instantiated as DINOv2 aiming to produce all-purpose features. The central gait representation extractor (GRE) owns three branches respectively responsible for the background removal, feature transformation, and feature refining. In the end, the modified GaitBase is employed for gait metric learning.
  • Figure 4: Visualization of unsupervised mask learning.
  • Figure 5: The visualization of intermediate representations generated by BigGait v.s. three traditional gait representations. The red boxes indicate regions with strong texture patterns.
  • ...and 2 more figures