Table of Contents
Fetching ...

Egocentric Video: A New Tool for Capturing Hand Use of Individuals with Spinal Cord Injury at Home

Jirapat Likitlersuang, Elizabeth R. Sumitro, Tianshi Cao, Ryan J. Visee, Sukhvinder Kalsi-Ryan, Jose Zariffa

TL;DR

This work addresses the shortage of quantitative hand-function measures in home and community settings for individuals with cervical spinal cord injury (cSCI) by introducing a wearable egocentric camera system. A three-stage computer-vision pipeline detects hands, segments hand regions, and identifies hand–object interactions during activities of daily living, producing frame-by-frame interaction decisions. Evaluation on nine participants with cSCI yields a mean F1-score of $0.74 \pm 0.15$ for the left hand and $0.73 \pm 0.15$ for the right hand, with three derived functional metrics showing moderate correlations to manual labels ($\rho = 0.40$, $0.54$, $0.55$). These findings demonstrate the feasibility of capturing objective, home-based measures of hand use, enabling new outcome metrics to assess independence in everyday tasks and supporting future home-environment validation and algorithmic improvements.

Abstract

Current upper extremity outcome measures for persons with cervical spinal cord injury (cSCI) lack the ability to directly collect quantitative information in home and community environments. A wearable first-person (egocentric) camera system is presented that can monitor functional hand use outside of clinical settings. The system is based on computer vision algorithms that detect the hand, segment the hand outline, distinguish the user's left or right hand, and detect functional interactions of the hand with objects during activities of daily living. The algorithm was evaluated using egocentric video recordings from 9 participants with cSCI, obtained in a home simulation laboratory. The system produces a binary hand-object interaction decision for each video frame, based on features reflecting motion cues of the hand, hand shape and colour characteristics of the scene. This output was compared with a manual labelling of the video, yielding F1-scores of 0.74 $\pm$ 0.15 for the left hand and 0.73 $\pm$ 0.15 for the right hand. From the resulting frame-by-frame binary data, functional hand use measures were extracted: the amount of total interaction as a percentage of testing time, the average duration of interactions in seconds, and the number of interactions per hour. Moderate and significant correlations were found when comparing these output measures to the results of the manual labelling, with $ρ$ = 0.40, 0.54 and 0.55 respectively. These results demonstrate the potential of a wearable egocentric camera for capturing quantitative measures of hand use at home.

Egocentric Video: A New Tool for Capturing Hand Use of Individuals with Spinal Cord Injury at Home

TL;DR

This work addresses the shortage of quantitative hand-function measures in home and community settings for individuals with cervical spinal cord injury (cSCI) by introducing a wearable egocentric camera system. A three-stage computer-vision pipeline detects hands, segments hand regions, and identifies hand–object interactions during activities of daily living, producing frame-by-frame interaction decisions. Evaluation on nine participants with cSCI yields a mean F1-score of for the left hand and for the right hand, with three derived functional metrics showing moderate correlations to manual labels (, , ). These findings demonstrate the feasibility of capturing objective, home-based measures of hand use, enabling new outcome metrics to assess independence in everyday tasks and supporting future home-environment validation and algorithmic improvements.

Abstract

Current upper extremity outcome measures for persons with cervical spinal cord injury (cSCI) lack the ability to directly collect quantitative information in home and community environments. A wearable first-person (egocentric) camera system is presented that can monitor functional hand use outside of clinical settings. The system is based on computer vision algorithms that detect the hand, segment the hand outline, distinguish the user's left or right hand, and detect functional interactions of the hand with objects during activities of daily living. The algorithm was evaluated using egocentric video recordings from 9 participants with cSCI, obtained in a home simulation laboratory. The system produces a binary hand-object interaction decision for each video frame, based on features reflecting motion cues of the hand, hand shape and colour characteristics of the scene. This output was compared with a manual labelling of the video, yielding F1-scores of 0.74 0.15 for the left hand and 0.73 0.15 for the right hand. From the resulting frame-by-frame binary data, functional hand use measures were extracted: the amount of total interaction as a percentage of testing time, the average duration of interactions in seconds, and the number of interactions per hour. Moderate and significant correlations were found when comparing these output measures to the results of the manual labelling, with = 0.40, 0.54 and 0.55 respectively. These results demonstrate the potential of a wearable egocentric camera for capturing quantitative measures of hand use at home.

Paper Structure

This paper contains 23 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: A simplified flowchart of the algorithmic framework showing the developed sequential preprocessing steps as well as input and output format for each step
  • Figure 2: Example frames describing the methodology in each of the processing steps. (a) Hand detection step, where the left image is the output bounding box of the hand from the R-CNN, the centre image is the Haar-like feature rotating around the bounding box centroid, and the right image is the final detection output. (b) Hand segmentation step, where the left image is the hand contour identification generated by combining skin colour information (in black and white) with edge detection of hand contours (in purple), and the right image shows the re-centering and selection of the final hand contour. (c) Regions involved in the interaction detection step, where the left image is the hand region, the centre image is the boxed neighbourhood of the hand, and the right image is the background region
  • Figure 3: Hand use metrics. Scatter plots comparing the interaction metrics predicted from the algorithm (y-axis) with the actual value from the human observer (x-axis), for each of the three proposed metrics in both hands (left and right hand). (a) Proportion of interaction over total recording time, (b) average duration of interactions (seconds), and (c) number of interactions per hour. The result of a Pearson correlation is shown for (a) and (c) because the data were normally distributed, while (b) was calculated with a Spearman correlation
  • Figure 4: Example binary hand-object interaction graphs of 3 participants. The graphs compare the predicted interactions from the algorithm output to the actual interactions from the manually labeled data, after applying the moving average filter. Example frames of the activities in different segments of videos are shown underneath. a) Participant # 2. b) Participant # 5. c) Participant # 9. Note that in some cases the videos were briefly paused in between the activities shown