Table of Contents
Fetching ...

Rehabilitation Exercise Quality Assessment and Feedback Generation Using Large Language Models with Prompt Engineering

Jessica Tang, Ali Abedi, Tracey J. F. Colella, Shehroz S. Khan

TL;DR

This work addresses the challenge of delivering actionable rehabilitation feedback in home-based settings by leveraging pre-trained large language models (LLMs) guided through carefully designed prompts. It introduces a framework that fuses exercise-specific joint features with zero- to few-shot prompting, reasoning elicitation (Chain-of-Thought, certainty, probability), and role-play prompts to enable GPT-4o to assess movement quality and generate natural-language feedback. Experiments on UI-PRMD and REHAB24-6 show that feature-based prompts and three-shot configurations yield strong classification performance, with reasoning-enabled prompts further improving interpretability, though LLM overconfidence remains an issue. The study demonstrates the practical potential of LLM-driven feedback within virtual rehabilitation platforms, while acknowledging limitations such as the lack of ground-truth textual feedback datasets and reproducibility challenges, and outlining directions for data collection and potential model fine-tuning to enhance reliability and applicability.

Abstract

Exercise-based rehabilitation improves quality of life and reduces morbidity, mortality, and rehospitalization, though transportation constraints and staff shortages lead to high dropout rates from rehabilitation programs. Virtual platforms enable patients to complete prescribed exercises at home, while AI algorithms analyze performance, deliver feedback, and update clinicians. Although many studies have developed machine learning and deep learning models for exercise quality assessment, few have explored the use of large language models (LLMs) for feedback and are limited by the lack of rehabilitation datasets containing textual feedback. In this paper, we propose a new method in which exercise-specific features are extracted from the skeletal joints of patients performing rehabilitation exercises and fed into pre-trained LLMs. Using a range of prompting techniques, such as zero-shot, few-shot, chain-of-thought, and role-play prompting, LLMs are leveraged to evaluate exercise quality and provide feedback in natural language to help patients improve their movements. The method was evaluated through extensive experiments on two publicly available rehabilitation exercise assessment datasets (UI-PRMD and REHAB24-6) and showed promising results in exercise assessment, reasoning, and feedback generation. This approach can be integrated into virtual rehabilitation platforms to help patients perform exercises correctly, support recovery, and improve health outcomes.

Rehabilitation Exercise Quality Assessment and Feedback Generation Using Large Language Models with Prompt Engineering

TL;DR

This work addresses the challenge of delivering actionable rehabilitation feedback in home-based settings by leveraging pre-trained large language models (LLMs) guided through carefully designed prompts. It introduces a framework that fuses exercise-specific joint features with zero- to few-shot prompting, reasoning elicitation (Chain-of-Thought, certainty, probability), and role-play prompts to enable GPT-4o to assess movement quality and generate natural-language feedback. Experiments on UI-PRMD and REHAB24-6 show that feature-based prompts and three-shot configurations yield strong classification performance, with reasoning-enabled prompts further improving interpretability, though LLM overconfidence remains an issue. The study demonstrates the practical potential of LLM-driven feedback within virtual rehabilitation platforms, while acknowledging limitations such as the lack of ground-truth textual feedback datasets and reproducibility challenges, and outlining directions for data collection and potential model fine-tuning to enhance reliability and applicability.

Abstract

Exercise-based rehabilitation improves quality of life and reduces morbidity, mortality, and rehospitalization, though transportation constraints and staff shortages lead to high dropout rates from rehabilitation programs. Virtual platforms enable patients to complete prescribed exercises at home, while AI algorithms analyze performance, deliver feedback, and update clinicians. Although many studies have developed machine learning and deep learning models for exercise quality assessment, few have explored the use of large language models (LLMs) for feedback and are limited by the lack of rehabilitation datasets containing textual feedback. In this paper, we propose a new method in which exercise-specific features are extracted from the skeletal joints of patients performing rehabilitation exercises and fed into pre-trained LLMs. Using a range of prompting techniques, such as zero-shot, few-shot, chain-of-thought, and role-play prompting, LLMs are leveraged to evaluate exercise quality and provide feedback in natural language to help patients improve their movements. The method was evaluated through extensive experiments on two publicly available rehabilitation exercise assessment datasets (UI-PRMD and REHAB24-6) and showed promising results in exercise assessment, reasoning, and feedback generation. This approach can be integrated into virtual rehabilitation platforms to help patients perform exercises correctly, support recovery, and improve health outcomes.

Paper Structure

This paper contains 16 sections, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Either body joint data or exercise-specific features extracted from body joints, combined with engineered prompts and exercise type, are fed into a pre-trained LLM. The LLM then generates exercise quality assessments and provides textual feedback.
  • Figure 2: Rehabilitation exercise quality classification accuracy varies with the number of labeled examples included in the prompts, evaluated on body joint data from UI-PRMD (orange dashed line), feature sequences extracted from UI-PRMD (orange solid line), and feature sequences extracted from REHAB24-6 (blue solid line).
  • Figure 3: Role-playing prompts for feedback generation following classification and reasoning, applied to (a) shoulder abduction, (b) leg lunge, and (c) squat exercise. The corresponding classification results are presented in Table \ref{['tab:classification_results']}. The LLM generates textual feedback in the specified role’s style, incorporating suggestions derived from data trends (highlighted in green) and insights from its prior knowledge (highlighted in red).