Table of Contents
Fetching ...

Exploring More from Multiple Gait Modalities for Human Identification

Dongyang Jin, Chao Fan, Weihua Chen, Shiqi Yu

TL;DR

This paper addresses identifying people from gait patterns by analyzing three RGB-based modalities—silhouette, human parsing, and optical flow—under fair, controlled comparisons. It introduces $C^2$Fusion within the $MultiGait++$ framework, a three‑stream model that extracts shared features while emphasizing each modality's unique information, resulting in state-of-the-art performance across Gait3D, GREW, CCPG, and SUSTech1K with a modest increase in computation. Through extensive ablations and modality fusion studies, the authors show that silhouette and parsing are largely homogeneous and benefit from input-level fusion, whereas optical flow provides complementary motion cues best exploited at higher fusion levels. The work offers a practical pathway for robust multimodal gait identification and motivates incorporating additional modalities in future research, with public code to foster reproducibility.

Abstract

The gait, as a kind of soft biometric characteristic, can reflect the distinct walking patterns of individuals at a distance, exhibiting a promising technique for unrestrained human identification. With largely excluding gait-unrelated cues hidden in RGB videos, the silhouette and skeleton, though visually compact, have acted as two of the most prevailing gait modalities for a long time. Recently, several attempts have been made to introduce more informative data forms like human parsing and optical flow images to capture gait characteristics, along with multi-branch architectures. However, due to the inconsistency within model designs and experiment settings, we argue that a comprehensive and fair comparative study among these popular gait modalities, involving the representational capacity and fusion strategy exploration, is still lacking. From the perspectives of fine vs. coarse-grained shape and whole vs. pixel-wise motion modeling, this work presents an in-depth investigation of three popular gait representations, i.e., silhouette, human parsing, and optical flow, with various fusion evaluations, and experimentally exposes their similarities and differences. Based on the obtained insights, we further develop a C$^2$Fusion strategy, consequently building our new framework MultiGait++. C$^2$Fusion preserves commonalities while highlighting differences to enrich the learning of gait features. To verify our findings and conclusions, extensive experiments on Gait3D, GREW, CCPG, and SUSTech1K are conducted. The code is available at https://github.com/ShiqiYu/OpenGait.

Exploring More from Multiple Gait Modalities for Human Identification

TL;DR

This paper addresses identifying people from gait patterns by analyzing three RGB-based modalities—silhouette, human parsing, and optical flow—under fair, controlled comparisons. It introduces Fusion within the framework, a three‑stream model that extracts shared features while emphasizing each modality's unique information, resulting in state-of-the-art performance across Gait3D, GREW, CCPG, and SUSTech1K with a modest increase in computation. Through extensive ablations and modality fusion studies, the authors show that silhouette and parsing are largely homogeneous and benefit from input-level fusion, whereas optical flow provides complementary motion cues best exploited at higher fusion levels. The work offers a practical pathway for robust multimodal gait identification and motivates incorporating additional modalities in future research, with public code to foster reproducibility.

Abstract

The gait, as a kind of soft biometric characteristic, can reflect the distinct walking patterns of individuals at a distance, exhibiting a promising technique for unrestrained human identification. With largely excluding gait-unrelated cues hidden in RGB videos, the silhouette and skeleton, though visually compact, have acted as two of the most prevailing gait modalities for a long time. Recently, several attempts have been made to introduce more informative data forms like human parsing and optical flow images to capture gait characteristics, along with multi-branch architectures. However, due to the inconsistency within model designs and experiment settings, we argue that a comprehensive and fair comparative study among these popular gait modalities, involving the representational capacity and fusion strategy exploration, is still lacking. From the perspectives of fine vs. coarse-grained shape and whole vs. pixel-wise motion modeling, this work presents an in-depth investigation of three popular gait representations, i.e., silhouette, human parsing, and optical flow, with various fusion evaluations, and experimentally exposes their similarities and differences. Based on the obtained insights, we further develop a CFusion strategy, consequently building our new framework MultiGait++. CFusion preserves commonalities while highlighting differences to enrich the learning of gait features. To verify our findings and conclusions, extensive experiments on Gait3D, GREW, CCPG, and SUSTech1K are conducted. The code is available at https://github.com/ShiqiYu/OpenGait.

Paper Structure

This paper contains 17 sections, 4 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Top: comparing three typical gait modalities, i.e., the binary silhouette, body parsing, and optical flow images. Bottom: the comparison between three common fusion strategies with our C$^2$Fusion.
  • Figure 2: The architecture of the MultiGait series. Here the symbols * and # can be instantiated with any of the employed gait modalities, such as the silhouette, human parsing, and optical flow, in theory.
  • Figure 3: Left: Our pipeline of MultiGait++. Right: The architecture of MultiGait$^{s+p+f}$
  • Figure 4: Visualization of silhouette, human parsing, and flow
  • Figure 5: The heatmaps zhou2016learning of MultiGait$^s$ v.s. MultiGait$^p$ and MultiGait$^f$. Each row corresponds to the same modality, while each column is sourced from the same RGB image.