Table of Contents
Fetching ...

Reducing Robotic Upper-Limb Assessment Time While Maintaining Precision: A Time Series Foundation Model Approach

Faranak Akbarifar, Nooshin Maghsoodi, Sean P Dukelow, Stephen Scott, Parvin Mousavi

TL;DR

The study addresses the time burden of Kinarm Visually Guided Reaching (VGR) by testing forecast-augmented sessions that replace unrecorded trials with forecasts from time-series foundation models. It compares ARIMA, MOMENT, and Chronos on 461 stroke and 599 control participants across 4- and 8-target protocols, showing that Chronos, when conditioned on movement direction and using Monte Carlo dropout, recovers $ICC(2,1)$ close to full-length references with only a fraction of trials, reducing session time by 75–88%. The results demonstrate substantial improvements in reliability for all four Kinarm parameters, with Chronos outperforming MOMENT and ARIMA across cohorts and protocols. This approach promises efficient, scalable robotic evaluations for assessing motor impairments after stroke, preserving precision while enhancing throughput and patient comfort.

Abstract

Purpose: Visually Guided Reaching (VGR) on the Kinarm robot yields sensitive kinematic biomarkers but requires 40-64 reaches, imposing time and fatigue burdens. We evaluate whether time-series foundation models can replace unrecorded trials from an early subset of reaches while preserving the reliability of standard Kinarm parameters. Methods: We analyzed VGR speed signals from 461 stroke and 599 control participants across 4- and 8-target reaching protocols. We withheld all but the first 8 or 16 reaching trials and used ARIMA, MOMENT, and Chronos models, fine-tuned on 70 percent of subjects, to forecast synthetic trials. We recomputed four kinematic features of reaching (reaction time, movement time, posture speed, maximum speed) on combined recorded plus forecasted trials and compared them to full-length references using ICC(2,1). Results: Chronos forecasts restored ICC >= 0.90 for all parameters with only 8 recorded trials plus forecasts, matching the reliability of 24-28 recorded reaches (Delta ICC <= 0.07). MOMENT yielded intermediate gains, while ARIMA improvements were minimal. Across cohorts and protocols, synthetic trials replaced reaches without materially compromising feature reliability. Conclusion: Foundation-model forecasting can greatly shorten Kinarm VGR assessment time. For the most impaired stroke survivors, sessions drop from 4-5 minutes to about 1 minute while preserving kinematic precision. This forecast-augmented paradigm promises efficient robotic evaluations for assessing motor impairments following stroke.

Reducing Robotic Upper-Limb Assessment Time While Maintaining Precision: A Time Series Foundation Model Approach

TL;DR

The study addresses the time burden of Kinarm Visually Guided Reaching (VGR) by testing forecast-augmented sessions that replace unrecorded trials with forecasts from time-series foundation models. It compares ARIMA, MOMENT, and Chronos on 461 stroke and 599 control participants across 4- and 8-target protocols, showing that Chronos, when conditioned on movement direction and using Monte Carlo dropout, recovers close to full-length references with only a fraction of trials, reducing session time by 75–88%. The results demonstrate substantial improvements in reliability for all four Kinarm parameters, with Chronos outperforming MOMENT and ARIMA across cohorts and protocols. This approach promises efficient, scalable robotic evaluations for assessing motor impairments after stroke, preserving precision while enhancing throughput and patient comfort.

Abstract

Purpose: Visually Guided Reaching (VGR) on the Kinarm robot yields sensitive kinematic biomarkers but requires 40-64 reaches, imposing time and fatigue burdens. We evaluate whether time-series foundation models can replace unrecorded trials from an early subset of reaches while preserving the reliability of standard Kinarm parameters. Methods: We analyzed VGR speed signals from 461 stroke and 599 control participants across 4- and 8-target reaching protocols. We withheld all but the first 8 or 16 reaching trials and used ARIMA, MOMENT, and Chronos models, fine-tuned on 70 percent of subjects, to forecast synthetic trials. We recomputed four kinematic features of reaching (reaction time, movement time, posture speed, maximum speed) on combined recorded plus forecasted trials and compared them to full-length references using ICC(2,1). Results: Chronos forecasts restored ICC >= 0.90 for all parameters with only 8 recorded trials plus forecasts, matching the reliability of 24-28 recorded reaches (Delta ICC <= 0.07). MOMENT yielded intermediate gains, while ARIMA improvements were minimal. Across cohorts and protocols, synthetic trials replaced reaches without materially compromising feature reliability. Conclusion: Foundation-model forecasting can greatly shorten Kinarm VGR assessment time. For the most impaired stroke survivors, sessions drop from 4-5 minutes to about 1 minute while preserving kinematic precision. This forecast-augmented paradigm promises efficient robotic evaluations for assessing motor impairments following stroke.

Paper Structure

This paper contains 21 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Apparatus and task. (a) Participant in the Kinarm Exoskeleton Lab. (b) Virtual workspace for the Visually Guided Reaching (VGR) task: the hand cursor moves from the central start target to one of the eight peripheral targets that appear in random order. (c) Representative control-trial hand-speed trace illustrating computation of the four selected Kinarm parameters—posture speed, reaction time (TARGET ON→Movement Onset), movement time (Movement Onset→Movement Offset), and max speed— with TARGET ON, Movement Onset and Movement Offset marked. (d) Study pipeline: an 8‑trial context is fed into a fine‑tuned foundation model (Chronos or MOMENT) with Monte‑Carlo dropout to generate additional synthetic trials; real + forecasted trials are then used to recompute KST‑defined speed parameters and their intraclass correlation coefficients (ICC).
  • Figure 2: Session duration with all trials vs. first eight, by cohort and protocol. (a) Control and (b) Stroke: normalized histograms of total VGR session time (sum of trial durations), overlaid by protocol (4-target vs. 8-target). Within each panel, solid outlines show all recorded trials and hatched bars show first eight trials only; binning is shared and percentages sum to 100% per cohort. (c) Empirical cumulative distribution (ECDF) of total session time by cohort $\times$ protocol. Solid curves use all trials; dashed curves use the first eight. Across cohorts and protocols, the first-8 condition shifts distributions leftward and steepens the ECDF, indicating markedly shorter sessions, while preserving the expected ordering (8-target $>$ 4-target; stroke $>$ control).
  • Figure 3: Reaction time reliability (ICC) as a function of total trials for the Chronos model. (a) Control, 8-target; (b) Control, 4-target; (c) Stroke, 8-target; ; (d) Stroke, 4-target. Blue curve: ICC between X-trial RT (median of X recorded trials) and complete-trial RT; error bars from subject bootstrap (B=1000). Squares: ICC for forecast-augmented protocols (8 recorded + k forecasts); error bars incorporate subject bootstrap and forecast-selection repeats.
  • Figure 4: Intraclass Correlation Coefficient (ICC) for reaction time (RT), movement time (MT), posture speed (PS), and max speed (MS), plotted as a function of the number of forecasted trials added to a fixed context. Each subplot corresponds to one of four task configurations: (a) 8-trial context, 8-target protocol; (b) 8-trial context, 4-target protocol; (c) 16-trial context, 8-target protocol; (d) 16-trial context, 4-target protocol. Solid lines with circular markers represent control participants; dashed lines with square markers represent participants with stroke. Colored lines indicate different kinematic parameters: blue = RT, orange = MT, green = PS, red = MS. The x-axis spans from 0 (context only) to 56 forecasted trials added in increments of 8, maintaining a consistent scale across all panels. Y-axis values denote ICC calculated from real + forecasted trials, with error bars representing standard deviation. The plots illustrate how forecasted trials affect reliability across groups, parameters, and task designs, with higher ICC generally achieved as more forecasted trials are included—though the extent of improvement varies by parameter and cohort.