Table of Contents
Fetching ...

HybridGait: A Benchmark for Spatial-Temporal Cloth-Changing Gait Recognition with Hybrid Explorations

Yilan Dong, Chunlin Yu, Ruiyang Ha, Ye Shi, Yuexin Ma, Lan Xu, Yanwei Fu, Jingya Wang

TL;DR

This work tackles cloth-changing gait recognition in-the-wild by introducing CCGait, the first dataset to capture long-term appearance changes across time and space with multi-modal cues. It proposes HybridGait, a three-branch SMPL-aided framework that fuses appearance from silhouettes, temporal 3D dynamics via the Canonical Alignment Spatial-Temporal Transformer (CA-STT), and 2D projections guided by Silhouette-guided Deformation (SilD). The approach achieves state-of-the-art performance on both CCGait and Gait3D, demonstrating the benefit of integrating 3D mesh priors with 2D silhouette information for robust gait recognition under clothing and viewpoint variations. This dataset and method advance practical gait recognition for real-world surveillance, where clothing and lighting changes are common and challenging.

Abstract

Existing gait recognition benchmarks mostly include minor clothing variations in the laboratory environments, but lack persistent changes in appearance over time and space. In this paper, we propose the first in-the-wild benchmark CCGait for cloth-changing gait recognition, which incorporates diverse clothing changes, indoor and outdoor scenes, and multi-modal statistics over 92 days. To further address the coupling effect of clothing and viewpoint variations, we propose a hybrid approach HybridGait that exploits both temporal dynamics and the projected 2D information of 3D human meshes. Specifically, we introduce a Canonical Alignment Spatial-Temporal Transformer (CA-STT) module to encode human joint position-aware features, and fully exploit 3D dense priors via a Silhouette-guided Deformation with 3D-2D Appearance Projection (SilD) strategy. Our contributions are twofold: we provide a challenging benchmark CCGait that captures realistic appearance changes across an expanded and space, and we propose a hybrid framework HybridGait that outperforms prior works on CCGait and Gait3D benchmarks. Our project page is available at https://github.com/HCVLab/HybridGait.

HybridGait: A Benchmark for Spatial-Temporal Cloth-Changing Gait Recognition with Hybrid Explorations

TL;DR

This work tackles cloth-changing gait recognition in-the-wild by introducing CCGait, the first dataset to capture long-term appearance changes across time and space with multi-modal cues. It proposes HybridGait, a three-branch SMPL-aided framework that fuses appearance from silhouettes, temporal 3D dynamics via the Canonical Alignment Spatial-Temporal Transformer (CA-STT), and 2D projections guided by Silhouette-guided Deformation (SilD). The approach achieves state-of-the-art performance on both CCGait and Gait3D, demonstrating the benefit of integrating 3D mesh priors with 2D silhouette information for robust gait recognition under clothing and viewpoint variations. This dataset and method advance practical gait recognition for real-world surveillance, where clothing and lighting changes are common and challenging.

Abstract

Existing gait recognition benchmarks mostly include minor clothing variations in the laboratory environments, but lack persistent changes in appearance over time and space. In this paper, we propose the first in-the-wild benchmark CCGait for cloth-changing gait recognition, which incorporates diverse clothing changes, indoor and outdoor scenes, and multi-modal statistics over 92 days. To further address the coupling effect of clothing and viewpoint variations, we propose a hybrid approach HybridGait that exploits both temporal dynamics and the projected 2D information of 3D human meshes. Specifically, we introduce a Canonical Alignment Spatial-Temporal Transformer (CA-STT) module to encode human joint position-aware features, and fully exploit 3D dense priors via a Silhouette-guided Deformation with 3D-2D Appearance Projection (SilD) strategy. Our contributions are twofold: we provide a challenging benchmark CCGait that captures realistic appearance changes across an expanded and space, and we propose a hybrid framework HybridGait that outperforms prior works on CCGait and Gait3D benchmarks. Our project page is available at https://github.com/HCVLab/HybridGait.
Paper Structure (12 sections, 9 equations, 2 figures, 5 tables)

This paper contains 12 sections, 9 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Exemplary challenge frames extracted from the CCGait dataset are presented in this paper. Figure (a) displays the long-ter cloth changes observed for the same individual. In Figure (b), we showcase occlusions caused by both persons and objects. Figure (c) showcases diverse indoor and outdoor scenarios, while Figure (d) demonstrates variations in lighting conditions.
  • Figure 2: Our proposed framework comprises three key components. Component (B) is the basic appearance branch, responsible for extracting body representations from silhouettes, albeit susceptible to contextual perturbations. Component (A) is the temporal branch, which includes a 3D dynamic component, specifically the CA-STT model, to capture temporal information. Finally, component (C) is the Projection branch, which utilizes projected silhouettes from SMPL models as 3D-2D projection silhouettes to enhance the characterization of body representations.