The Paradox of Motion: Evidence for Spurious Correlations in Skeleton-based Gait Recognition Models
Andy Cătrună, Adrian Cosma, Emilian Rădoi
TL;DR
The paper investigates whether skeleton-based gait recognition models rely on motion patterns or exploit appearance and anthropometric information embedded in pose data. It employs normalization-based ablations to suppress height and screen-position cues across multiple architectures and introduces a single-pose spatial-transformer to test appearance-only recognition, evaluated on CASIA-B and GREW. Key findings show that removing height/position cues degrades controlled-benchmark performance, that a single pose can achieve notable accuracy, and that in-the-wild data (GREW) reduce shortcuts, underscoring the need to disentangle motion from appearance and to curate diverse datasets. The work highlights privacy concerns and methodological biases in current benchmarks, advocating for robust, diverse gait datasets and balanced Use of appearance versus motion cues in practical gait analysis.
Abstract
Gait, an unobtrusive biometric, is valued for its capability to identify individuals at a distance, across external outfits and environmental conditions. This study challenges the prevailing assumption that vision-based gait recognition, in particular skeleton-based gait recognition, relies primarily on motion patterns, revealing a significant role of the implicit anthropometric information encoded in the walking sequence. We show through a comparative analysis that removing height information leads to notable performance degradation across three models and two benchmarks (CASIA-B and GREW). Furthermore, we propose a spatial transformer model processing individual poses, disregarding any temporal information, which achieves unreasonably good accuracy, emphasizing the bias towards appearance information and indicating spurious correlations in existing benchmarks. These findings underscore the need for a nuanced understanding of the interplay between motion and appearance in vision-based gait recognition, prompting a reevaluation of the methodological assumptions in this field. Our experiments indicate that "in-the-wild" datasets are less prone to spurious correlations, prompting the need for more diverse and large scale datasets for advancing the field.
