A Dual-Use Framework for Clinical Gait Analysis: Attention-Based Sensor Optimization and Automated Dataset Auditing
Hamidreza Sadeghsalehi
TL;DR
The paper tackles quantitative gait analysis with wearable IMUs by addressing two core challenges: identifying minimal, task-specific sensor configurations and auditing datasets for hidden biases. It introduces an interpretable, attention-based multi-stream CNN that learns sensor-specific features and outputs cohort-level attention weights via $\alpha_i = \frac{\exp(e_i)}{\sum_j \exp(e_j)}$ and a context vector $c = \sum_i \alpha_i v_i$ to drive binary classification. Applied to Voisard et al.'s multi-cohort gait dataset across four tasks (PD screening, OA screening, CVA asymmetry, PD vs CVA differential), the framework reveals a severe right-foot laterality confound in OA and CVA cohorts while providing data-driven sensor synergies for optimized protocols. Beyond sensor optimization, the method functions as an automated data auditor, flagging hidden confounds and guiding future dataset design to improve robustness and clinical applicability of gait analyses.
Abstract
Objective gait analysis using wearable sensors and AI is critical for managing neurological and orthopedic conditions. However, models are vulnerable to hidden dataset biases, and task-specific sensor optimization remains a challenge. We propose a multi-stream attention-based deep learning framework that functions as both a sensor optimizer and an automated data auditor. Applied to the Voisard et al. (2025) multi-cohort gait dataset on four clinical tasks (PD, OA, CVA screening; PD vs CVA differential), the model's attention mechanism quantitatively discovered a severe dataset confound. For OA and CVA screening, tasks where bilateral assessment is clinically essential, the model assigned more than 70 percent attention to the Right Foot while statistically ignoring the Left Foot (less than 0.1 percent attention, 95 percent CI [0.0-0.1]). This was not a clinical finding but a direct reflection of a severe laterality bias (for example, 15 of 15 right-sided OA) in the public dataset. The primary contribution of this work is methodological, demonstrating that an interpretable framework can automatically audit dataset integrity. As a secondary finding, the model proposes novel, data-driven sensor synergies (for example, Head plus Foot for PD screening) as hypotheses for future optimized protocols.
