Table of Contents
Fetching ...

AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning

Lucas N. Alegre, Agon Serifi, Ruben Grandia, David Müller, Espen Knoop, Moritz Bächer

TL;DR

AMOR introduces a context- and weight-conditioned multi-objective reinforcement learning framework that trains a single policy to track varied reference motions without retraining. By sampling reward weights from the simplex $\Delta^m$ and using a multi-objective PPO update, AMOR captures a Pareto front of behaviors that can be tuned post-training; a motion-context vector via a VAE enables task-specific adaptation. A hierarchical extension employs a high-level policy to adjust weights in real time using a discriminator-based reward, enabling fine-grained control and improving realism. The approach demonstrates improved sim-to-real transfer for dynamic robot motions and provides interpretable, adaptable control suitable for complex, multi-motion tasks. Overall, AMOR advances adaptive physics-based control by combining context-conditioned MORL with post-training weight tuning and hierarchical decision-making.

Abstract

Reinforcement learning (RL) has significantly advanced the control of physics-based and robotic characters that track kinematic reference motion. However, methods typically rely on a weighted sum of conflicting reward functions, requiring extensive tuning to achieve a desired behavior. Due to the computational cost of RL, this iterative process is a tedious, time-intensive task. Furthermore, for robotics applications, the weights need to be chosen such that the policy performs well in the real world, despite inevitable sim-to-real gaps. To address these challenges, we propose a multi-objective reinforcement learning framework that trains a single policy conditioned on a set of weights, spanning the Pareto front of reward trade-offs. Within this framework, weights can be selected and tuned after training, significantly speeding up iteration time. We demonstrate how this improved workflow can be used to perform highly dynamic motions with a robot character. Moreover, we explore how weight-conditioned policies can be leveraged in hierarchical settings, using a high-level policy to dynamically select weights according to the current task. We show that the multi-objective policy encodes a diverse spectrum of behaviors, facilitating efficient adaptation to novel tasks.

AMOR: Adaptive Character Control through Multi-Objective Reinforcement Learning

TL;DR

AMOR introduces a context- and weight-conditioned multi-objective reinforcement learning framework that trains a single policy to track varied reference motions without retraining. By sampling reward weights from the simplex and using a multi-objective PPO update, AMOR captures a Pareto front of behaviors that can be tuned post-training; a motion-context vector via a VAE enables task-specific adaptation. A hierarchical extension employs a high-level policy to adjust weights in real time using a discriminator-based reward, enabling fine-grained control and improving realism. The approach demonstrates improved sim-to-real transfer for dynamic robot motions and provides interpretable, adaptable control suitable for complex, multi-motion tasks. Overall, AMOR advances adaptive physics-based control by combining context-conditioned MORL with post-training weight tuning and hierarchical decision-making.

Abstract

Reinforcement learning (RL) has significantly advanced the control of physics-based and robotic characters that track kinematic reference motion. However, methods typically rely on a weighted sum of conflicting reward functions, requiring extensive tuning to achieve a desired behavior. Due to the computational cost of RL, this iterative process is a tedious, time-intensive task. Furthermore, for robotics applications, the weights need to be chosen such that the policy performs well in the real world, despite inevitable sim-to-real gaps. To address these challenges, we propose a multi-objective reinforcement learning framework that trains a single policy conditioned on a set of weights, spanning the Pareto front of reward trade-offs. Within this framework, weights can be selected and tuned after training, significantly speeding up iteration time. We demonstrate how this improved workflow can be used to perform highly dynamic motions with a robot character. Moreover, we explore how weight-conditioned policies can be leveraged in hierarchical settings, using a high-level policy to dynamically select weights according to the current task. We show that the multi-objective policy encodes a diverse spectrum of behaviors, facilitating efficient adaptation to novel tasks.

Paper Structure

This paper contains 30 sections, 9 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: AMOR Overview. AMOR optimizes for multiple objectives conditioned on state, motion-context, and reward weights, where reward weights are sampled from a multi-dimensional simplex. The environment provides a vector of rewards, which are then used by a multi-objective PPO algorithm together with the weights to update the policy.
  • Figure 2: High-Level Policy Overview. We learn a high-level policy (HLP) that generates reward weights for a pretrained AMOR based on the current motion context. In this stage, a discriminator is trained to distinguish between reference and simulated motions, with its output serving as an implicit reward for the HLP.
  • Figure 3: Pareto Fronts (PFs). Visualization of selected PFs generated by tracking three distinct motion types—Idle, Walking, and Dancing—using the humanoid controlled by AMOR's policy $\underline{\pi}$. x-markers indicate performance under equal weight configuration, corresponding to a fixed-reward policy.
  • Figure 4: Weight Influence. (Top) Kinematic reference. (Middle) Visual performance when prioritizing tracking reward weights. (Bottom) Visual performance when prioritizing smoothness reward terms.
  • Figure 5: Prioritizing Objectives. Distributions of the unweighted cumulative smoothness reward for three distinct weight values, each evaluated on 32768 episodes.
  • ...and 4 more figures