Table of Contents
Fetching ...

D-STGCNT: A Dense Spatio-Temporal Graph Conv-GRU Network based on transformer for assessment of patient physical rehabilitation

Youssef Mourchid, Rim Slama

TL;DR

The paper tackles home-based rehabilitation assessment by proposing D-STGCNT, a Dense Spatio-Temporal Graph Conv-GRU Network augmented with a Transformer encoder to handle variable-length skeleton sequences. The approach combines dense STGC-GRU blocks with multi-hop spatial graph convolutions, ConvGRU temporal modeling, and a transformer for long-range dependencies, accompanied by position encoding and multiple regression losses. Empirical results on KIMORE and UI-PRMD show state-of-the-art accuracy (MAD, RMSE, MAPE) and faster computation, along with qualitative joint-attention feedback that identifies key joints driving scores. This work enables accurate, real-time quality scoring and interpretable feedback for rehabilitation, potentially facilitating home-based therapy with clinician-like guidance.

Abstract

This paper tackles the challenge of automatically assessing physical rehabilitation exercises for patients who perform the exercises without clinician supervision. The objective is to provide a quality score to ensure correct performance and achieve desired results. To achieve this goal, a new graph-based model, the Dense Spatio-Temporal Graph Conv-GRU Network with Transformer, is introduced. This model combines a modified version of STGCN and transformer architectures for efficient handling of spatio-temporal data. The key idea is to consider skeleton data respecting its non-linear structure as a graph and detecting joints playing the main role in each rehabilitation exercise. Dense connections and GRU mechanisms are used to rapidly process large 3D skeleton inputs and effectively model temporal dynamics. The transformer encoder's attention mechanism focuses on relevant parts of the input sequence, making it useful for evaluating rehabilitation exercises. The evaluation of our proposed approach on the KIMORE and UI-PRMD datasets highlighted its potential, surpassing state-of-the-art methods in terms of accuracy and computational time. This resulted in faster and more accurate learning and assessment of rehabilitation exercises. Additionally, our model provides valuable feedback through qualitative illustrations, effectively highlighting the significance of joints in specific exercises.

D-STGCNT: A Dense Spatio-Temporal Graph Conv-GRU Network based on transformer for assessment of patient physical rehabilitation

TL;DR

The paper tackles home-based rehabilitation assessment by proposing D-STGCNT, a Dense Spatio-Temporal Graph Conv-GRU Network augmented with a Transformer encoder to handle variable-length skeleton sequences. The approach combines dense STGC-GRU blocks with multi-hop spatial graph convolutions, ConvGRU temporal modeling, and a transformer for long-range dependencies, accompanied by position encoding and multiple regression losses. Empirical results on KIMORE and UI-PRMD show state-of-the-art accuracy (MAD, RMSE, MAPE) and faster computation, along with qualitative joint-attention feedback that identifies key joints driving scores. This work enables accurate, real-time quality scoring and interpretable feedback for rehabilitation, potentially facilitating home-based therapy with clinician-like guidance.

Abstract

This paper tackles the challenge of automatically assessing physical rehabilitation exercises for patients who perform the exercises without clinician supervision. The objective is to provide a quality score to ensure correct performance and achieve desired results. To achieve this goal, a new graph-based model, the Dense Spatio-Temporal Graph Conv-GRU Network with Transformer, is introduced. This model combines a modified version of STGCN and transformer architectures for efficient handling of spatio-temporal data. The key idea is to consider skeleton data respecting its non-linear structure as a graph and detecting joints playing the main role in each rehabilitation exercise. Dense connections and GRU mechanisms are used to rapidly process large 3D skeleton inputs and effectively model temporal dynamics. The transformer encoder's attention mechanism focuses on relevant parts of the input sequence, making it useful for evaluating rehabilitation exercises. The evaluation of our proposed approach on the KIMORE and UI-PRMD datasets highlighted its potential, surpassing state-of-the-art methods in terms of accuracy and computational time. This resulted in faster and more accurate learning and assessment of rehabilitation exercises. Additionally, our model provides valuable feedback through qualitative illustrations, effectively highlighting the significance of joints in specific exercises.
Paper Structure (29 sections, 19 equations, 6 figures, 9 tables)

This paper contains 29 sections, 19 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Physical rehabilitation exercises process overview.
  • Figure 2: Flowchart of the proposed approach.
  • Figure 3: STGC-GRU block details.
  • Figure 4: First skeleton represents the 25 joints of a skeleton from KIMORE dataset. The other skeletons represent the nodes (in green) involved in the computation of the $1^{st}$, $2^{nd}$ and $k^{iem}$ hop order regarding a certain joint (in dark red).
  • Figure 5: An illustration of the attention value calculated by our approach that shows the involvement of the joints depending on the corresponding activities. On the left, we can see the calculated joints importance throw 5 exercises from the KIMORE dataset (lifting arms, arms extension, trunk rotation, pelvis rotation, squatting), and on the right, we illustrate the 25 joints of the skeleton human body as represented in KIMORE dataset.
  • ...and 1 more figures