Table of Contents
Fetching ...

Edges Are All You Need: Robust Gait Recognition via Label-Free Structure

Chao Zhang, Zhuang Zheng, Ruixin Li, Zhanyong Mei

TL;DR

SKETCHGAIT is proposed, a hierarchically disentangled multi-modal framework with two independent streams for modality-specific learning and a lightweight early-stage fusion branch to capture structural complementarity.

Abstract

Gait recognition is a non-intrusive biometric technique for security applications, yet existing studies are dominated by silhouette- and parsing-based representations. Silhouettes are sparse and miss internal structural details, limiting discriminability. Parsing enriches silhouettes with part-level structures, but relies heavily on upstream human parsers (e.g., label granularity and boundary precision), leading to unstable performance across datasets and sometimes even inferior results to silhouettes. We revisit gait representations from a structural perspective and describe a design space defined by edge density and supervision form: silhouettes use sparse boundary edges with weak single-label supervision, while parsing uses denser cues with strong semantic priors. In this space, we identify an underexplored paradigm: dense part-level structure without explicit semantic labels, and introduce SKETCH as a new visual modality for gait recognition. Sketch extracts high-frequency structural cues (e.g., limb articulations and self-occlusion contours) directly from RGB images via edge-based detectors in a label-free manner. We further show that label-guided parsing and label-free sketch are semantically decoupled and structurally complementary. Based on this, we propose SKETCHGAIT, a hierarchically disentangled multi-modal framework with two independent streams for modality-specific learning and a lightweight early-stage fusion branch to capture structural complementarity. Extensive experiments on SUSTech1K and CCPG validate the proposed modality and framework: SketchGait achieves 92.9% Rank-1 on SUSTech1K and 93.1% mean Rank-1 on CCPG.

Edges Are All You Need: Robust Gait Recognition via Label-Free Structure

TL;DR

SKETCHGAIT is proposed, a hierarchically disentangled multi-modal framework with two independent streams for modality-specific learning and a lightweight early-stage fusion branch to capture structural complementarity.

Abstract

Gait recognition is a non-intrusive biometric technique for security applications, yet existing studies are dominated by silhouette- and parsing-based representations. Silhouettes are sparse and miss internal structural details, limiting discriminability. Parsing enriches silhouettes with part-level structures, but relies heavily on upstream human parsers (e.g., label granularity and boundary precision), leading to unstable performance across datasets and sometimes even inferior results to silhouettes. We revisit gait representations from a structural perspective and describe a design space defined by edge density and supervision form: silhouettes use sparse boundary edges with weak single-label supervision, while parsing uses denser cues with strong semantic priors. In this space, we identify an underexplored paradigm: dense part-level structure without explicit semantic labels, and introduce SKETCH as a new visual modality for gait recognition. Sketch extracts high-frequency structural cues (e.g., limb articulations and self-occlusion contours) directly from RGB images via edge-based detectors in a label-free manner. We further show that label-guided parsing and label-free sketch are semantically decoupled and structurally complementary. Based on this, we propose SKETCHGAIT, a hierarchically disentangled multi-modal framework with two independent streams for modality-specific learning and a lightweight early-stage fusion branch to capture structural complementarity. Extensive experiments on SUSTech1K and CCPG validate the proposed modality and framework: SketchGait achieves 92.9% Rank-1 on SUSTech1K and 93.1% mean Rank-1 on CCPG.
Paper Structure (23 sections, 8 equations, 2 figures, 4 tables)

This paper contains 23 sections, 8 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: (A) Edge richness increases from boundary edges to sketch edges, where boundary/semantic edges are obtained from M2FP yang2024deep and sketch edges are extracted by TEED soria2023tiny. (B) Parsing-derived (M2FP) and label-free sketch (TEED) representations under self-occlusion, where TEED-based sketches provide complementary structural cues. (C) Parsing labels suppress clothing logos and texture patterns, providing cleaner structural supervision.
  • Figure 2: Overall pipeline. (A) Sketch generation. (B) The SketchGait framework with two modality-specific input branches and one fusion branch. CBS denotes a Conv $3\times3$ layer followed by BatchNorm and ReLU. Stage 1--4 correspond to different stages of the DeepGaitV2 backbone. The HEAD module consists of temporal max pooling, horizontal pyramid pooling (HPP), and separate fully connected (FC) layers for feature projection.