Table of Contents
Fetching ...

PCHC: Enabling Preference Conditioned Humanoid Control via Multi-Objective Reinforcement Learning

Huanyu Li, Dewei Wang, Xinmiao Wang, Xinzhe Liu, Peng Liu, Chenjia Bai, Xuelong Li

Abstract

Humanoid robots often need to balance competing objectives, such as maximizing speed while minimizing energy consumption. While current reinforcement learning (RL) methods can master complex skills like fall recovery and perceptive locomotion, they are constrained by fixed weighting strategies that produce a single suboptimal policy, rather than providing a diverse set of solutions for sophisticated multi-objective control. In this paper, we propose a novel framework leveraging Multi-Objective Reinforcement Learning (MORL) to achieve Preference-Conditioned Humanoid Control (PCHC). Unlike conventional methods that require training a series of policies to approximate the Pareto front, our framework enables a single, preference-conditioned policy to exhibit a wide spectrum of diverse behaviors. To effectively integrate these requirements, we introduce a Beta distribution-based alignment mechanism based on preference vectors modulating a Mixture-of-Experts (MoE) module. We validated our approach on two representative humanoid tasks. Extensive simulations and real-world experiments demonstrate that the proposed framework allows the robot to adaptively shift its objective priorities in real-time based on the input preference condition.

PCHC: Enabling Preference Conditioned Humanoid Control via Multi-Objective Reinforcement Learning

Abstract

Humanoid robots often need to balance competing objectives, such as maximizing speed while minimizing energy consumption. While current reinforcement learning (RL) methods can master complex skills like fall recovery and perceptive locomotion, they are constrained by fixed weighting strategies that produce a single suboptimal policy, rather than providing a diverse set of solutions for sophisticated multi-objective control. In this paper, we propose a novel framework leveraging Multi-Objective Reinforcement Learning (MORL) to achieve Preference-Conditioned Humanoid Control (PCHC). Unlike conventional methods that require training a series of policies to approximate the Pareto front, our framework enables a single, preference-conditioned policy to exhibit a wide spectrum of diverse behaviors. To effectively integrate these requirements, we introduce a Beta distribution-based alignment mechanism based on preference vectors modulating a Mixture-of-Experts (MoE) module. We validated our approach on two representative humanoid tasks. Extensive simulations and real-world experiments demonstrate that the proposed framework allows the robot to adaptively shift its objective priorities in real-time based on the input preference condition.
Paper Structure (21 sections, 15 equations, 8 figures, 2 tables)

This paper contains 21 sections, 15 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: PCHC integrates MORL with humanoid control tasks which enables the robot perform different multi-objective preference-aligned behaviors given preference vector $\bm \lambda$. For example, PCHC enables the robot to balance stride length and disturbance resistance in the humanoid walking task.
  • Figure 2: Overview of the proposed PCHC framework. PCHC adopts multiple critics and employs a Preference Condition Injection module based on a preference-parameterized Beta distribution achieving multi-objective control on two humanoid control tasks.
  • Figure 3: In two-objective space: (a) Hypervolume is represented with the shaded area bounded by the Pareto points and the reference point. (b) Sparsity measures the average square distance between consecutive points.
  • Figure 4: The performance of our policy with different preference vector $\bm \lambda$ on Fall Recovery and Walking task.
  • Figure 5: Demonstration of dynamic preference switching during task execution. The preference vector $\bm \lambda$ is instantaneously adjusted from [0.0, 1.0] to [1.0, 0.0].
  • ...and 3 more figures