AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning
Lucas N. Alegre, Agon Serifi, Ruben Grandia, David Müller, Espen Knoop, Moritz Bächer
TL;DR
AMOR introduces a context- and weight-conditioned multi-objective reinforcement learning framework that trains a single policy to track varied reference motions without retraining. By sampling reward weights from the simplex $\Delta^m$ and using a multi-objective PPO update, AMOR captures a Pareto front of behaviors that can be tuned post-training; a motion-context vector via a VAE enables task-specific adaptation. A hierarchical extension employs a high-level policy to adjust weights in real time using a discriminator-based reward, enabling fine-grained control and improving realism. The approach demonstrates improved sim-to-real transfer for dynamic robot motions and provides interpretable, adaptable control suitable for complex, multi-motion tasks. Overall, AMOR advances adaptive physics-based control by combining context-conditioned MORL with post-training weight tuning and hierarchical decision-making.
Abstract
Reinforcement learning (RL) has significantly advanced the control of physics-based and robotic characters that track kinematic reference motion. However, methods typically rely on a weighted sum of conflicting reward functions, requiring extensive tuning to achieve a desired behavior. Due to the computational cost of RL, this iterative process is a tedious, time-intensive task. Furthermore, for robotics applications, the weights need to be chosen such that the policy performs well in the real world, despite inevitable sim-to-real gaps. To address these challenges, we propose a multi-objective reinforcement learning framework that trains a single policy conditioned on a set of weights, spanning the Pareto front of reward trade-offs. Within this framework, weights can be selected and tuned after training, significantly speeding up iteration time. We demonstrate how this improved workflow can be used to perform highly dynamic motions with a robot character. Moreover, we explore how weight-conditioned policies can be leveraged in hierarchical settings, using a high-level policy to dynamically select weights according to the current task. We show that the multi-objective policy encodes a diverse spectrum of behaviors, facilitating efficient adaptation to novel tasks.
