Table of Contents
Fetching ...

Fine-tuning Myoelectric Control through Reinforcement Learning in a Game Environment

Kilian Freitag, Yiannis Karayiannidis, Jan Zbinden, Rita Laezza

TL;DR

This work investigates the potential of Reinforcement Learning (RL) to further improve the decoding of human motion intent by incorporating usage-based data and achieves significant improvements in accuracy and robustness.

Abstract

Objective: Enhancing the reliability of myoelectric controllers that decode motor intent is a pressing challenge in the field of bionic prosthetics. State-of-the-art research has mostly focused on Supervised Learning (SL) techniques to tackle this problem. However, obtaining high-quality labeled data that accurately represents muscle activity during daily usage remains difficult. We investigate the potential of Reinforcement Learning (RL) to further improve the decoding of human motion intent by incorporating usage-based data. Methods: The starting point of our method is a SL control policy, pretrained on a static recording of electromyographic (EMG) ground truth data. We then apply RL to fine-tune the pretrained classifier with dynamic EMG data obtained during interaction with a game environment developed for this work. We conducted real-time experiments to evaluate our approach and achieved significant improvements in human-in-the-loop performance. Results: The method effectively predicts simultaneous finger movements, leading to a two-fold increase in decoding accuracy during gameplay and a 39\% improvement in a separate motion test. Conclusion: By employing RL and incorporating usage-based EMG data during fine-tuning, our method achieves significant improvements in accuracy and robustness. Significance: These results showcase the potential of RL for enhancing the reliability of myoelectric controllers, of particular importance for advanced bionic limbs. See our project page for visual demonstrations: https://sites.google.com/view/bionic-limb-rl

Fine-tuning Myoelectric Control through Reinforcement Learning in a Game Environment

TL;DR

This work investigates the potential of Reinforcement Learning (RL) to further improve the decoding of human motion intent by incorporating usage-based data and achieves significant improvements in accuracy and robustness.

Abstract

Objective: Enhancing the reliability of myoelectric controllers that decode motor intent is a pressing challenge in the field of bionic prosthetics. State-of-the-art research has mostly focused on Supervised Learning (SL) techniques to tackle this problem. However, obtaining high-quality labeled data that accurately represents muscle activity during daily usage remains difficult. We investigate the potential of Reinforcement Learning (RL) to further improve the decoding of human motion intent by incorporating usage-based data. Methods: The starting point of our method is a SL control policy, pretrained on a static recording of electromyographic (EMG) ground truth data. We then apply RL to fine-tune the pretrained classifier with dynamic EMG data obtained during interaction with a game environment developed for this work. We conducted real-time experiments to evaluate our approach and achieved significant improvements in human-in-the-loop performance. Results: The method effectively predicts simultaneous finger movements, leading to a two-fold increase in decoding accuracy during gameplay and a 39\% improvement in a separate motion test. Conclusion: By employing RL and incorporating usage-based EMG data during fine-tuning, our method achieves significant improvements in accuracy and robustness. Significance: These results showcase the potential of RL for enhancing the reliability of myoelectric controllers, of particular importance for advanced bionic limbs. See our project page for visual demonstrations: https://sites.google.com/view/bionic-limb-rl

Paper Structure

This paper contains 25 sections, 11 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Selected finger movements, grouped by number of simultaneous DOFs. Top row consists of finger extension movements, while bottom row consists of finger flexion movements. Each movement is labeled as $m_i$ with $i=0,\ldots,12$ and with $m_0$ referring to 'Rest'.
  • Figure 2: EMG recording setup, with sliding window over 8 input channels from surface electrodes. The Hudgins features hudgins1993new — mean absolute value (MAV), waveform length in time-domain (TWL), number of zero crossings (ZC) and slope changes (SLPCH) — are extracted for each recorded channel and stacked in a one-dimensional vector.
  • Figure 3: The proposed RL framework consists of obtaining EMG signals from users, that are given to the policy to perform actions in an environment. This environment then gives a reward based on how successful an action was. Note that before interacting with the environment, the policy is pretrained through SL, using data from a recording session. Further note that the reward signal is only used during RL training and not once the policy is deployed.
  • Figure 4: Game interface. Each vertical line refers to a controlled DOF: Thumb (red), Index (yellow) and Middle (blue). The arrows pointing up or down refer to extension and flexion, respectively. Desired movements are shown along the vertical lines, whereas predictions are displayed on the diamonds by short arrows, indicating the direction and DOF that is activated. When the agent executes the desired movement, a green arrow appears over the diamond of the specific DOF (left). Conversely, when the movement is incorrect the arrows are shown in white (right).
  • Figure 5: Normalized average cumulative reward over all subjects for RL training repetitions. The first and last repetition is done with the initial pretrained SL policy $\pi_0$, so RL training is only done between repetitions 0 and 8 using the most recent policy $\pi_i$. For one participant our method did not seem to find patterns and thus performed poorly. The lower outliers in most repetitions belong to this participant. The outliers in repetition 7 and 8 belong to another participant who's initial policy was underperforming, but RL did increase performance. Additionally, there are some over-performing outliers. For one participant pretraining worked exceptionally well as seen in repetition 0 and 9. The outliers in repetition 1 and 2 belong to another participant where RL training improved motor decoding faster than usual. The normalized average cumulative reward significantly increases in most repetitions. The differences between repetitions 3-4, 5-6, and 7-8 were not statistically significant.
  • ...and 8 more figures