Table of Contents
Fetching ...

Rehabilitation Exercise Quality Assessment through Supervised Contrastive Learning with Hard and Soft Negatives

Mark Karlov, Ali Abedi, Shehroz S. Khan

TL;DR

This work tackles cross-exercise-type rehabilitation exercise quality assessment when per-type data are scarce by proposing a single Spatial-Temporal Graph Convolutional Network (ST-GCN) trained with supervised contrastive learning using hard and soft negatives to leverage all available data. A reference representation per exercise type enables inference through cosine similarity, yielding robust, generalizable quality assessments across UI-PRMD, IRDS, and KIMORE. The approach achieves state-of-the-art accuracy and AUC while significantly reducing model count compared to per-type models, and it demonstrates effective transfer learning to KIMORE. These results advance practical home-based virtual rehabilitation by enabling a unified, scalable assessment framework with strong cross-dataset performance and potential for further interpretability and multitask extensions.

Abstract

Exercise-based rehabilitation programs have proven to be effective in enhancing the quality of life and reducing mortality and rehospitalization rates. AI-driven virtual rehabilitation, which allows patients to independently complete exercises at home, utilizes AI algorithms to analyze exercise data, providing feedback to patients and updating clinicians on their progress. These programs commonly prescribe a variety of exercise types, leading to a distinct challenge in rehabilitation exercise assessment datasets: while abundant in overall training samples, these datasets often have a limited number of samples for each individual exercise type. This disparity hampers the ability of existing approaches to train generalizable models with such a small sample size per exercise type. Addressing this issue, this paper introduces a novel supervised contrastive learning framework with hard and soft negative samples that effectively utilizes the entire dataset to train a single model applicable to all exercise types. This model, with a Spatial-Temporal Graph Convolutional Network (ST-GCN) architecture, demonstrated enhanced generalizability across exercises and a decrease in overall complexity. Through extensive experiments on three publicly available rehabilitation exercise assessment datasets, UI-PRMD, IRDS, and KIMORE, our method has proven to surpass existing methods, setting a new benchmark in rehabilitation exercise quality assessment.

Rehabilitation Exercise Quality Assessment through Supervised Contrastive Learning with Hard and Soft Negatives

TL;DR

This work tackles cross-exercise-type rehabilitation exercise quality assessment when per-type data are scarce by proposing a single Spatial-Temporal Graph Convolutional Network (ST-GCN) trained with supervised contrastive learning using hard and soft negatives to leverage all available data. A reference representation per exercise type enables inference through cosine similarity, yielding robust, generalizable quality assessments across UI-PRMD, IRDS, and KIMORE. The approach achieves state-of-the-art accuracy and AUC while significantly reducing model count compared to per-type models, and it demonstrates effective transfer learning to KIMORE. These results advance practical home-based virtual rehabilitation by enabling a unified, scalable assessment framework with strong cross-dataset performance and potential for further interpretability and multitask extensions.

Abstract

Exercise-based rehabilitation programs have proven to be effective in enhancing the quality of life and reducing mortality and rehospitalization rates. AI-driven virtual rehabilitation, which allows patients to independently complete exercises at home, utilizes AI algorithms to analyze exercise data, providing feedback to patients and updating clinicians on their progress. These programs commonly prescribe a variety of exercise types, leading to a distinct challenge in rehabilitation exercise assessment datasets: while abundant in overall training samples, these datasets often have a limited number of samples for each individual exercise type. This disparity hampers the ability of existing approaches to train generalizable models with such a small sample size per exercise type. Addressing this issue, this paper introduces a novel supervised contrastive learning framework with hard and soft negative samples that effectively utilizes the entire dataset to train a single model applicable to all exercise types. This model, with a Spatial-Temporal Graph Convolutional Network (ST-GCN) architecture, demonstrated enhanced generalizability across exercises and a decrease in overall complexity. Through extensive experiments on three publicly available rehabilitation exercise assessment datasets, UI-PRMD, IRDS, and KIMORE, our method has proven to surpass existing methods, setting a new benchmark in rehabilitation exercise quality assessment.
Paper Structure (19 sections, 6 equations, 4 figures, 5 tables)

This paper contains 19 sections, 6 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: (a) Variety of samples in a rehabilitation exercise training mini-batch, featuring different exercise types where each sample may be correct or incorrect, with the leftmost sample designated as the anchor. (b) From left to right, a positive sample pair and its corresponding hard negative sample pair and two soft negative sample pairs.
  • Figure 2: (a) Using all training exercise samples with the mini-batches as described in Fig. \ref{['fig:fig1']} and the supervised contrastive loss function in Equation \ref{['method:supcon']}, the spatial-temporal graph convolutional network encoder $f(\cdot)$ and fully-connected projection head $g(\cdot)$ are trained. (b) Trained $f(\cdot)$ and $g(\cdot)$ are used to generate the learned representations for all correct type $c$ training exercise samples. Weighted averaging of the learned representations results in an exercise-type-specific reference representation for type $c$ exercise. (c) Inference making by calculating the similarity between the learned representation of a test exercise sample of type $c$ with the reference representation for type $c$ exercise.
  • Figure 3: t-SNE visualization of representations learned through the proposed supervised contrastive learning approach for (a) UI-PRMD and (b) IRDS datasets. Representations are color-coded on an exercise basis, and "+" denotes the reference representations, i.e., cluster centers.
  • Figure 4: Training and validation Mean Squared Error (MSE) loss curves of the proposed method across consecutive epochs for the five exercise types in the KIMORE dataset: (a) lifting of arms, (b) trunk lateral tilt, (c) trunk rotation, (d) pelvis rotation, and (e) squatting. The training was performed in two different settings: training an untrained ST-GCN encoder from scratch, and fine-tuning an ST-GCN encoder pre-trained on the IRDS dataset.