Generating Realistic Arm Movements in Reinforcement Learning: A Quantitative Comparison of Reward Terms and Task Requirements
Jhon P. F. Charaja, Isabell Wochner, Pierre Schumacher, Winfried Ilg, Martin Giese, Christophe Maufroy, Andreas Bulling, Syn Schmitt, Georg Martius, Daniel F. B. Haeufle
TL;DR
This study addresses how to generate realistic human-like arm movements in reinforcement learning by systematically examining three factors: task requirements, execution noise, and optimality principles. Using a musculoskeletal two-DoF arm in MuJoCo, the authors train 48 RL agents across configurations that combine these factors and evaluate them with four movement- realism metrics, including straight-line accuracy, bell-shaped velocity, triphasic muscle patterns, and Fitts's law. They find that imposing velocity and acceleration constraints (pos-vel-acc), coupled with rewards that minimize mechanical work, hand jerk, and muscle stimulation, and allowing execution noise, yields the most human-like movements across multiple difficulty levels, with higher IDs further sharpening velocity profiles and extending the third muscle phase. Overall, the work highlights how careful combination of task demands, stochasticity, and multi-term optimality drives the emergence of realistic kinematics and muscle activation patterns, offering guidance for predictive models used in wearable assistive devices.
Abstract
The mimicking of human-like arm movement characteristics involves the consideration of three factors during control policy synthesis: (a) chosen task requirements, (b) inclusion of noise during movement execution and (c) chosen optimality principles. Previous studies showed that when considering these factors (a-c) individually, it is possible to synthesize arm movements that either kinematically match the experimental data or reproduce the stereotypical triphasic muscle activation pattern. However, to date no quantitative comparison has been made on how realistic the arm movement generated by each factor is; as well as whether a partial or total combination of all factors results in arm movements with human-like kinematic characteristics and a triphasic muscle pattern. To investigate this, we used reinforcement learning to learn a control policy for a musculoskeletal arm model, aiming to discern which combination of factors (a-c) results in realistic arm movements according to four frequently reported stereotypical characteristics. Our findings indicate that incorporating velocity and acceleration requirements into the reaching task, employing reward terms that encourage minimization of mechanical work, hand jerk, and control effort, along with the inclusion of noise during movement, leads to the emergence of realistic human arm movements in reinforcement learning. We expect that the gained insights will help in the future to better predict desired arm movements and corrective forces in wearable assistive devices.
