Learning Human Reaching Optimality Principles from Minimal Observation Inverse Reinforcement Learning

Sarmad Mehrdad; Maxime Sabbah; Vincent Bonnet; Ludovic Righetti

Learning Human Reaching Optimality Principles from Minimal Observation Inverse Reinforcement Learning

Sarmad Mehrdad, Maxime Sabbah, Vincent Bonnet, Ludovic Righetti

TL;DR

The paper addresses how humans dynamically balance multiple cost criteria during reaching by moving beyond fixed-cost models. It extends Minimal Observation Inverse Reinforcement Learning (MO-IRL) to learn time-varying cost weights across movement phases, represented by a weight matrix $\boldsymbol{\omega}\in\mathbb{R}^{N_\Φ\times N_w}$, and solved on a planar $2$-link arm with velocity-aware features. Using a direct optimal control formulation with seven candidate costs and phase segmentation, the method achieves RMSEs of about $6.4^{\circ}$ and $5.6^{\circ}$ for 6- and 8-section weights (vs $10.4^{\circ}$ for a single section) and ~$8^{\circ}$ in inter-subject validation, indicating robust generalization and data-efficient learning. These results highlight acceleration minimization at movement onset and end as a key feature of human reaching, with potential applications for bio-inspired robotics and rehabilitation.

Abstract

This paper investigates the application of Minimal Observation Inverse Reinforcement Learning (MO-IRL) to model and predict human arm-reaching movements with time-varying cost weights. Using a planar two-link biomechanical model and high-resolution motion-capture data from subjects performing a pointing task, we segment each trajectory into multiple phases and learn phase-specific combinations of seven candidate cost functions. MO-IRL iteratively refines cost weights by scaling observed and generated trajectories in the maximum entropy IRL formulation, greatly reducing the number of required demonstrations and convergence time compared to classical IRL approaches. Training on ten trials per posture yields average joint-angle Root Mean Squared Errors (RMSE) of 6.4 deg and 5.6 deg for six- and eight-segment weight divisions, respectively, versus 10.4 deg using a single static weight. Cross-validation on remaining trials and, for the first time, inter-subject validation on an unseen subject's 20 trials, demonstrates comparable predictive accuracy, around 8 deg RMSE, indicating robust generalization. Learned weights emphasize joint acceleration minimization during movement onset and termination, aligning with smoothness principles observed in biological motion. These results suggest that MO-IRL can efficiently uncover dynamic, subject-independent cost structures underlying human motor control, with potential applications for humanoid robots.

Learning Human Reaching Optimality Principles from Minimal Observation Inverse Reinforcement Learning

TL;DR

Abstract

Learning Human Reaching Optimality Principles from Minimal Observation Inverse Reinforcement Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)