Table of Contents
Fetching ...

Learning Human Reaching Optimality Principles from Minimal Observation Inverse Reinforcement Learning

Sarmad Mehrdad, Maxime Sabbah, Vincent Bonnet, Ludovic Righetti

TL;DR

The paper addresses how humans dynamically balance multiple cost criteria during reaching by moving beyond fixed-cost models. It extends Minimal Observation Inverse Reinforcement Learning (MO-IRL) to learn time-varying cost weights across movement phases, represented by a weight matrix $\boldsymbol{\omega}\in\mathbb{R}^{N_\Φ\times N_w}$, and solved on a planar $2$-link arm with velocity-aware features. Using a direct optimal control formulation with seven candidate costs and phase segmentation, the method achieves RMSEs of about $6.4^{\circ}$ and $5.6^{\circ}$ for 6- and 8-section weights (vs $10.4^{\circ}$ for a single section) and ~$8^{\circ}$ in inter-subject validation, indicating robust generalization and data-efficient learning. These results highlight acceleration minimization at movement onset and end as a key feature of human reaching, with potential applications for bio-inspired robotics and rehabilitation.

Abstract

This paper investigates the application of Minimal Observation Inverse Reinforcement Learning (MO-IRL) to model and predict human arm-reaching movements with time-varying cost weights. Using a planar two-link biomechanical model and high-resolution motion-capture data from subjects performing a pointing task, we segment each trajectory into multiple phases and learn phase-specific combinations of seven candidate cost functions. MO-IRL iteratively refines cost weights by scaling observed and generated trajectories in the maximum entropy IRL formulation, greatly reducing the number of required demonstrations and convergence time compared to classical IRL approaches. Training on ten trials per posture yields average joint-angle Root Mean Squared Errors (RMSE) of 6.4 deg and 5.6 deg for six- and eight-segment weight divisions, respectively, versus 10.4 deg using a single static weight. Cross-validation on remaining trials and, for the first time, inter-subject validation on an unseen subject's 20 trials, demonstrates comparable predictive accuracy, around 8 deg RMSE, indicating robust generalization. Learned weights emphasize joint acceleration minimization during movement onset and termination, aligning with smoothness principles observed in biological motion. These results suggest that MO-IRL can efficiently uncover dynamic, subject-independent cost structures underlying human motor control, with potential applications for humanoid robots.

Learning Human Reaching Optimality Principles from Minimal Observation Inverse Reinforcement Learning

TL;DR

The paper addresses how humans dynamically balance multiple cost criteria during reaching by moving beyond fixed-cost models. It extends Minimal Observation Inverse Reinforcement Learning (MO-IRL) to learn time-varying cost weights across movement phases, represented by a weight matrix , and solved on a planar -link arm with velocity-aware features. Using a direct optimal control formulation with seven candidate costs and phase segmentation, the method achieves RMSEs of about and for 6- and 8-section weights (vs for a single section) and ~ in inter-subject validation, indicating robust generalization and data-efficient learning. These results highlight acceleration minimization at movement onset and end as a key feature of human reaching, with potential applications for bio-inspired robotics and rehabilitation.

Abstract

This paper investigates the application of Minimal Observation Inverse Reinforcement Learning (MO-IRL) to model and predict human arm-reaching movements with time-varying cost weights. Using a planar two-link biomechanical model and high-resolution motion-capture data from subjects performing a pointing task, we segment each trajectory into multiple phases and learn phase-specific combinations of seven candidate cost functions. MO-IRL iteratively refines cost weights by scaling observed and generated trajectories in the maximum entropy IRL formulation, greatly reducing the number of required demonstrations and convergence time compared to classical IRL approaches. Training on ten trials per posture yields average joint-angle Root Mean Squared Errors (RMSE) of 6.4 deg and 5.6 deg for six- and eight-segment weight divisions, respectively, versus 10.4 deg using a single static weight. Cross-validation on remaining trials and, for the first time, inter-subject validation on an unseen subject's 20 trials, demonstrates comparable predictive accuracy, around 8 deg RMSE, indicating robust generalization. Learned weights emphasize joint acceleration minimization during movement onset and termination, aligning with smoothness principles observed in biological motion. These results suggest that MO-IRL can efficiently uncover dynamic, subject-independent cost structures underlying human motor control, with potential applications for humanoid robots.

Paper Structure

This paper contains 9 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: (a) Biomechanical model definition, showing the beginning and the end of the pointing task. (b) Five different initial postures for the pointing task berret2011.
  • Figure 2: Illustration of MO-IRL prediction against the actual human task execution. The cost weights are divided into 8 sections. The training data for both joint positions and velocities are shown in yellow. The dotted lines are the real trajectory performed by the human, and the solid lines are the MO-IRL predictions.
  • Figure 3: Normalized weights learned by MO-IRL for each posture given 1, 6, and 8 sections.
  • Figure 4: Inter-Subject Cross-Validation of the learned weights (8 sections) by MO-IRL for initial postures 2 and 4. The top row shows the overlayed measured and predicted joint values from the second subject that are not used for the MO-IRL training, where $q_1$ and $q_2$ are shown by blue and green, respectively. Predictions (DOC solutions) and measured trajectories are shown with solid and dashed lines, respectively. The bottom row shows the corresponding joint velocities of the top row trajectories. The trajectories are normalized in length for clearer presentation.