Table of Contents
Fetching ...

ZipGait: Bridging Skeleton and Silhouette with Diffusion Model for Advancing Gait Recognition

Fanxu Min, Qing Cai, Shaoxiang Guo, Yang Yu, Hao Fan, Junyu Dong

TL;DR

ZipGait bridges skeleton and silhouette information for gait recognition by reconstructing dense body shapes from sparse skeletons using a diffusion-based DiffGait module. A two-stage Perceptual Gait Integration (PGI) then fuses reconstructed silhouettes with skeleton cues to produce robust hybrid gait representations, enabling a lightweight yet effective model-based framework. Across four public benchmarks, ZipGait achieves state-of-the-art performance in both intra- and cross-domain settings and yields notable plug-and-play improvements when embedded into existing skeleton-based methods. The approach demonstrates the potential of cross-modal gait modeling to close the gap with appearance-based methods while maintaining efficiency and flexibility for real-world deployment.

Abstract

Current gait recognition research predominantly focuses on extracting appearance features effectively, but the performance is severely compromised by the vulnerability of silhouettes under unconstrained scenes. Consequently, numerous studies have explored how to harness information from various models, particularly by sufficiently utilizing the intrinsic information of skeleton sequences. While these model-based methods have achieved significant performance, there is still a huge gap compared to appearance-based methods, which implies the potential value of bridging silhouettes and skeletons. In this work, we make the first attempt to reconstruct dense body shapes from discrete skeleton distributions via the diffusion model, demonstrating a new approach that connects cross-modal features rather than focusing solely on intrinsic features to improve model-based methods. To realize this idea, we propose a novel gait diffusion model named DiffGait, which has been designed with four specific adaptations suitable for gait recognition. Furthermore, to effectively utilize the reconstructed silhouettes and skeletons, we introduce Perception Gait Integration (PGI) to integrate different gait features through a two-stage process. Incorporating those modifications leads to an efficient model-based gait recognition framework called ZipGait. Through extensive experiments on four public benchmarks, ZipGait demonstrates superior performance, outperforming the state-of-the-art methods by a large margin under both cross-domain and intra-domain settings, while achieving significant plug-and-play performance improvements.

ZipGait: Bridging Skeleton and Silhouette with Diffusion Model for Advancing Gait Recognition

TL;DR

ZipGait bridges skeleton and silhouette information for gait recognition by reconstructing dense body shapes from sparse skeletons using a diffusion-based DiffGait module. A two-stage Perceptual Gait Integration (PGI) then fuses reconstructed silhouettes with skeleton cues to produce robust hybrid gait representations, enabling a lightweight yet effective model-based framework. Across four public benchmarks, ZipGait achieves state-of-the-art performance in both intra- and cross-domain settings and yields notable plug-and-play improvements when embedded into existing skeleton-based methods. The approach demonstrates the potential of cross-modal gait modeling to close the gap with appearance-based methods while maintaining efficiency and flexibility for real-world deployment.

Abstract

Current gait recognition research predominantly focuses on extracting appearance features effectively, but the performance is severely compromised by the vulnerability of silhouettes under unconstrained scenes. Consequently, numerous studies have explored how to harness information from various models, particularly by sufficiently utilizing the intrinsic information of skeleton sequences. While these model-based methods have achieved significant performance, there is still a huge gap compared to appearance-based methods, which implies the potential value of bridging silhouettes and skeletons. In this work, we make the first attempt to reconstruct dense body shapes from discrete skeleton distributions via the diffusion model, demonstrating a new approach that connects cross-modal features rather than focusing solely on intrinsic features to improve model-based methods. To realize this idea, we propose a novel gait diffusion model named DiffGait, which has been designed with four specific adaptations suitable for gait recognition. Furthermore, to effectively utilize the reconstructed silhouettes and skeletons, we introduce Perception Gait Integration (PGI) to integrate different gait features through a two-stage process. Incorporating those modifications leads to an efficient model-based gait recognition framework called ZipGait. Through extensive experiments on four public benchmarks, ZipGait demonstrates superior performance, outperforming the state-of-the-art methods by a large margin under both cross-domain and intra-domain settings, while achieving significant plug-and-play performance improvements.
Paper Structure (22 sections, 15 equations, 18 figures, 10 tables, 2 algorithms)

This paper contains 22 sections, 15 equations, 18 figures, 10 tables, 2 algorithms.

Figures (18)

  • Figure 1: Comparison with alternative methods on the Gait3D test set elucidates the performance disparities between two types of methods in real-world scenarios. DiffGait enhances model-based methods through a plug-and-play approach, reducing the performance gap. Meanwhile, ZipGait achieves the best-performing compared to methods using skeletons.
  • Figure 2: Differences between our approach and related gait recognition methods. Comparison of our cross-modal insight with previous works that primarily focused on skeleton extraction.
  • Figure 3: Overall pipeline of the proposed ZipGait framework. It consists of two fundamental improvements: DiffGait and Perceptual Gait Integration. The entire denoising process of DiffGait is summarized as Diffusion Silhouette Reconstruction. T&H represents the horizontal mapping 29-fu2019horizontal and temporal aggregation.
  • Figure 4: The upper half of the diagram delineates the specific network architecture and modules of DiffGait, while the lower half illustrates the forward process of DiffGait. The entire figure provides a detailed explanation of the feature flow process within DiffGait.
  • Figure 5: Visualization of the reverse process of (a) ours and (b) previous method.
  • ...and 13 more figures