D-STGCNT: A Dense Spatio-Temporal Graph Conv-GRU Network based on transformer for assessment of patient physical rehabilitation
Youssef Mourchid, Rim Slama
TL;DR
The paper tackles home-based rehabilitation assessment by proposing D-STGCNT, a Dense Spatio-Temporal Graph Conv-GRU Network augmented with a Transformer encoder to handle variable-length skeleton sequences. The approach combines dense STGC-GRU blocks with multi-hop spatial graph convolutions, ConvGRU temporal modeling, and a transformer for long-range dependencies, accompanied by position encoding and multiple regression losses. Empirical results on KIMORE and UI-PRMD show state-of-the-art accuracy (MAD, RMSE, MAPE) and faster computation, along with qualitative joint-attention feedback that identifies key joints driving scores. This work enables accurate, real-time quality scoring and interpretable feedback for rehabilitation, potentially facilitating home-based therapy with clinician-like guidance.
Abstract
This paper tackles the challenge of automatically assessing physical rehabilitation exercises for patients who perform the exercises without clinician supervision. The objective is to provide a quality score to ensure correct performance and achieve desired results. To achieve this goal, a new graph-based model, the Dense Spatio-Temporal Graph Conv-GRU Network with Transformer, is introduced. This model combines a modified version of STGCN and transformer architectures for efficient handling of spatio-temporal data. The key idea is to consider skeleton data respecting its non-linear structure as a graph and detecting joints playing the main role in each rehabilitation exercise. Dense connections and GRU mechanisms are used to rapidly process large 3D skeleton inputs and effectively model temporal dynamics. The transformer encoder's attention mechanism focuses on relevant parts of the input sequence, making it useful for evaluating rehabilitation exercises. The evaluation of our proposed approach on the KIMORE and UI-PRMD datasets highlighted its potential, surpassing state-of-the-art methods in terms of accuracy and computational time. This resulted in faster and more accurate learning and assessment of rehabilitation exercises. Additionally, our model provides valuable feedback through qualitative illustrations, effectively highlighting the significance of joints in specific exercises.
