Foundational Policy Acquisition via Multitask Learning for Motor Skill Generation
Satoshi Yamamori, Jun Morimoto
TL;DR
The paper tackles rapid motor-skill generation under implicitly changing tasks by introducing a three-phase multitask reinforcement learning framework that learns a foundational policy via encoder-based context representation. It formalizes contextual MDPs through a variational RL lens, linking entropy regularization to KL-divergence minimization and leveraging a dedicated three-stage workflow: foundational policy acquisition, policy selection, and skill generation, with latent variable optimization via derivative-free methods. Empirical results show superior performance against established meta-RL baselines on standard multi-locomotion tasks and successful novel skill generation on a monopod heading task, including an overhead kicking capability not explicitly trained. The work demonstrates how latent context embedding and policy selection enable efficient adaptation to unseen tasks and environments, offering a path toward scalable, transferable motor skills in robotics, with potential extensions to multi-agent scenarios.
Abstract
In this study, we propose a multitask reinforcement learning algorithm for foundational policy acquisition to generate novel motor skills. \textcolor{\hcolor}{Learning the rich representation of the multitask policy is a challenge in dynamic movement generation tasks because the policy needs to cope with changes in goals or environments with different reward functions or physical parameters. Inspired by human sensorimotor adaptation mechanisms, we developed the learning pipeline to construct the encoder-decoder networks and network selection to facilitate foundational policy acquisition under multiple situations. First, we compared the proposed method with previous multitask reinforcement learning methods in the standard multi-locomotion tasks. The results showed that the proposed approach outperformed the baseline methods. Then, we applied the proposed method to the ball heading task using a monopod robot model to evaluate skill generation performance. The results showed that the proposed method was able to adapt to novel target positions or inexperienced ball restitution coefficients but to acquire a foundational policy network, originally learned for heading motion, which can generate an entirely new overhead kicking skill.
