Table of Contents
Fetching ...

Enhancing Box and Block Test with Computer Vision for Post-Stroke Upper Extremity Motor Evaluation

David Robinson, Animesh Gupta, Elizabeth Clark, Olga Melnik, Qiushi Fu, Mubarak Shah

Abstract

Standard clinical assessments of upper-extremity motor function after stroke either rely on ordinal scoring, which lacks sensitivity, or time-based task metrics, which do not capture movement quality. In this work, we present a computer vision-based framework for analysis of upper-extremity movement during the Box and Block Test (BBT) through world-aligned joint angles of fingers, arm, and trunk without depth sensors or calibration objects. We apply this framework to a dataset of 136 BBT recordings collected from 48 healthy individuals and 7 individuals post stroke. Using unsupervised dimensionality reduction of joint-angle features, we analyze movement patterns without relying on expert clinical labels. The resulting embeddings show separation between healthy movement patterns and stroke-related movement deviations. Importantly, some patients with the same BBT scores can be separated with different postural patterns. These results show that world-aligned joint angles can capture meaningful information of upper-extremity functions beyond standard time-based BBT scores, with no effort from the clinician other than monocular video recordings of the patient using a phone or camera. This work highlights the potential of a camera-based, calibration-free framework to measure movement quality in clinical assessments without changing the widely adopted clinical routine.

Enhancing Box and Block Test with Computer Vision for Post-Stroke Upper Extremity Motor Evaluation

Abstract

Standard clinical assessments of upper-extremity motor function after stroke either rely on ordinal scoring, which lacks sensitivity, or time-based task metrics, which do not capture movement quality. In this work, we present a computer vision-based framework for analysis of upper-extremity movement during the Box and Block Test (BBT) through world-aligned joint angles of fingers, arm, and trunk without depth sensors or calibration objects. We apply this framework to a dataset of 136 BBT recordings collected from 48 healthy individuals and 7 individuals post stroke. Using unsupervised dimensionality reduction of joint-angle features, we analyze movement patterns without relying on expert clinical labels. The resulting embeddings show separation between healthy movement patterns and stroke-related movement deviations. Importantly, some patients with the same BBT scores can be separated with different postural patterns. These results show that world-aligned joint angles can capture meaningful information of upper-extremity functions beyond standard time-based BBT scores, with no effort from the clinician other than monocular video recordings of the patient using a phone or camera. This work highlights the potential of a camera-based, calibration-free framework to measure movement quality in clinical assessments without changing the widely adopted clinical routine.

Paper Structure

This paper contains 11 sections, 10 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of our world-aligned 3D joint angle estimation pipeline. Video frames from the Box and Block Test are segmented to recover the box orientation, which is used to estimate camera pitch through surface normals. Monocular 3D keypoints are then extracted and rotated to align with gravity, enabling measurement of finger and arm joint angles independent of camera orientation.
  • Figure 2: Comparison of 3D body pose estimation methods during the grasp and transport phases of the Box and Block Test. Side-view visualizations of 3D keypoints are shown for PromptHMR, SMPLer-X, and SAM 3D Body, illustrating differences in depth consistency and joint articulation relative to the observed arm pose in the image.
  • Figure 3: UMAP embeddings of joint angles. Healthy embeddings are shown in all figures to provide a common reference for visual comparison. The separation between healthy and patient distributions is observed without using supervision or clinical labels.
  • Figure 4: Comparison of 3D hand pose estimation methods during the grasp and transport phases of the Box and Block Test. Visualizations of 3D keypoints are shown for WiLoR and SAM 3D Body, illustrating differences in joint articulation relative to the observed hand pose in the image.