Generating Realistic Arm Movements in Reinforcement Learning: A Quantitative Comparison of Reward Terms and Task Requirements

Jhon P. F. Charaja; Isabell Wochner; Pierre Schumacher; Winfried Ilg; Martin Giese; Christophe Maufroy; Andreas Bulling; Syn Schmitt; Georg Martius; Daniel F. B. Haeufle

Generating Realistic Arm Movements in Reinforcement Learning: A Quantitative Comparison of Reward Terms and Task Requirements

Jhon P. F. Charaja, Isabell Wochner, Pierre Schumacher, Winfried Ilg, Martin Giese, Christophe Maufroy, Andreas Bulling, Syn Schmitt, Georg Martius, Daniel F. B. Haeufle

TL;DR

This study addresses how to generate realistic human-like arm movements in reinforcement learning by systematically examining three factors: task requirements, execution noise, and optimality principles. Using a musculoskeletal two-DoF arm in MuJoCo, the authors train 48 RL agents across configurations that combine these factors and evaluate them with four movement- realism metrics, including straight-line accuracy, bell-shaped velocity, triphasic muscle patterns, and Fitts's law. They find that imposing velocity and acceleration constraints (pos-vel-acc), coupled with rewards that minimize mechanical work, hand jerk, and muscle stimulation, and allowing execution noise, yields the most human-like movements across multiple difficulty levels, with higher IDs further sharpening velocity profiles and extending the third muscle phase. Overall, the work highlights how careful combination of task demands, stochasticity, and multi-term optimality drives the emergence of realistic kinematics and muscle activation patterns, offering guidance for predictive models used in wearable assistive devices.

Abstract

The mimicking of human-like arm movement characteristics involves the consideration of three factors during control policy synthesis: (a) chosen task requirements, (b) inclusion of noise during movement execution and (c) chosen optimality principles. Previous studies showed that when considering these factors (a-c) individually, it is possible to synthesize arm movements that either kinematically match the experimental data or reproduce the stereotypical triphasic muscle activation pattern. However, to date no quantitative comparison has been made on how realistic the arm movement generated by each factor is; as well as whether a partial or total combination of all factors results in arm movements with human-like kinematic characteristics and a triphasic muscle pattern. To investigate this, we used reinforcement learning to learn a control policy for a musculoskeletal arm model, aiming to discern which combination of factors (a-c) results in realistic arm movements according to four frequently reported stereotypical characteristics. Our findings indicate that incorporating velocity and acceleration requirements into the reaching task, employing reward terms that encourage minimization of mechanical work, hand jerk, and control effort, along with the inclusion of noise during movement, leads to the emergence of realistic human arm movements in reinforcement learning. We expect that the gained insights will help in the future to better predict desired arm movements and corrective forces in wearable assistive devices.

Generating Realistic Arm Movements in Reinforcement Learning: A Quantitative Comparison of Reward Terms and Task Requirements

TL;DR

Abstract

Paper Structure (12 sections, 4 equations, 6 figures, 1 table)

This paper contains 12 sections, 4 equations, 6 figures, 1 table.

Introduction
Methods
Muscle stimulation commands
Simulation of human arm dynamics
Reward formulation
Metrics for goal-oriented movements
Results
Straight line deviation ($p_\mathrm{line}$)
Bell-shaped velocity profile ($v_\mathrm{bell}$)
Triphasic muscle pattern ($u_\text{triphasic}$)
Fitts's law ($R_F$)
Discussion

Figures (6)

Figure 1: Framework to systematically combine three factors that generate arm movements: (a) different task requirements, (b) inclusion of noise during movement execution, and (c) optimality principles grounded on the minimization of mechanical work, hand jerk and muscle stimulation command (effort). Each combination creates a unique learning environment with distinctive challenges and movement priorities: execution noise modifies the control commands, while optimality principles and additional task requirements shape the reward. Shown below are the metrics for documented stereotypical characteristics of human arm movement: (i) roughly straight hand trajectory, (ii) bell-shaped velocity profile, (iii) triphasic muscle activation pattern, and (iv) Fitts's law.
Figure 2: Hand trajectories generated by all models, considering the three possible task requirements and difficulty index $\mathrm{ID}=5$.
Figure 3: Velocity profiles generated by each model, considering velocity and acceleration requirements into main task and difficulty index $\mathrm{ID}=5$. The dashed line represents the fitted Gaussian model.
Figure 4: Muscle activation pattern of the hybrid model, considering the three possible task requirements. The agonist-antagonist muscle pair of the arm are denoted as: Monoarticular shoulder (S), Biarticular elbow-shoulder (B) and Monoarticular elbow muscle (E). Blue and Red lines represent muscle activation of agonist and antagonist muscles, respectively.
Figure 5: Muscle pattern of Monoarticular elbow muscle (E) for hybrid model with pos-vel-acc task requirement. The third phase duration increases with larger index of difficulty (ID), i.e., higher endpoint accuracy.
...and 1 more figures

Generating Realistic Arm Movements in Reinforcement Learning: A Quantitative Comparison of Reward Terms and Task Requirements

TL;DR

Abstract

Generating Realistic Arm Movements in Reinforcement Learning: A Quantitative Comparison of Reward Terms and Task Requirements

Authors

TL;DR

Abstract

Table of Contents

Figures (6)