Table of Contents
Fetching ...

CMoE: Contrastive Mixture of Experts for Motion Control and Terrain Adaptation of Humanoid Robots

Shihao Ma, Hongjin Chen, Zijun Xu, Yi Zhao, Ke Wu, Ruichen Yang, Leyao Zou, Zhongxue Gan, Wenchao Ding

TL;DR

CMoE, a novel single-stage reinforcement learning framework that integrates contrastive learning to refine expert activation distributions, maximizes the consistency of expert activations within the same terrain while minimizing their similarity across different terrains, thereby encouraging experts to specialize in distinct terrain types.

Abstract

For effective deployment in real-world environments, humanoid robots must autonomously navigate a diverse range of complex terrains with abrupt transitions. While the Vanilla mixture of experts (MoE) framework is theoretically capable of modeling diverse terrain features, in practice, the gating network exhibits nearly uniform expert activations across different terrains, weakening the expert specialization and limiting the model's expressive power. To address this limitation, we introduce CMoE, a novel single-stage reinforcement learning framework that integrates contrastive learning to refine expert activation distributions. By imposing contrastive constraints, CMoE maximizes the consistency of expert activations within the same terrain while minimizing their similarity across different terrains, thereby encouraging experts to specialize in distinct terrain types. We validated our approach on the Unitree G1 humanoid robot through a series of challenging experiments. Results demonstrate that CMoE enables the robot to traverse continuous steps up to 20 cm high and gaps up to 80 cm wide, while achieving robust and natural gait across diverse mixed terrains, surpassing the limits of existing methods. To support further research and foster community development, we release our code publicly.

CMoE: Contrastive Mixture of Experts for Motion Control and Terrain Adaptation of Humanoid Robots

TL;DR

CMoE, a novel single-stage reinforcement learning framework that integrates contrastive learning to refine expert activation distributions, maximizes the consistency of expert activations within the same terrain while minimizing their similarity across different terrains, thereby encouraging experts to specialize in distinct terrain types.

Abstract

For effective deployment in real-world environments, humanoid robots must autonomously navigate a diverse range of complex terrains with abrupt transitions. While the Vanilla mixture of experts (MoE) framework is theoretically capable of modeling diverse terrain features, in practice, the gating network exhibits nearly uniform expert activations across different terrains, weakening the expert specialization and limiting the model's expressive power. To address this limitation, we introduce CMoE, a novel single-stage reinforcement learning framework that integrates contrastive learning to refine expert activation distributions. By imposing contrastive constraints, CMoE maximizes the consistency of expert activations within the same terrain while minimizing their similarity across different terrains, thereby encouraging experts to specialize in distinct terrain types. We validated our approach on the Unitree G1 humanoid robot through a series of challenging experiments. Results demonstrate that CMoE enables the robot to traverse continuous steps up to 20 cm high and gaps up to 80 cm wide, while achieving robust and natural gait across diverse mixed terrains, surpassing the limits of existing methods. To support further research and foster community development, we release our code publicly.
Paper Structure (21 sections, 9 equations, 7 figures, 3 tables)

This paper contains 21 sections, 9 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: We propose a mixture-of-experts model-based architecture that enables humanoid robots to simultaneously navigate a variety of challenging terrains. We validate our strategy on complex mixed terrains and non-training environments.
  • Figure 2: t-SNE visualization of the experts activation of Vanilla MoE and our method across different terrains. ("*" indicates a simple version of this terrain)
  • Figure 3: Overview of our framework. We encode historical information and the elevation map into explicit and implicit representations, which are fused with the current observation to form the system observation. These are then fed into the MoE structure, consisting of multiple experts and a gating network, which outputs expert activations and performs contrastive learning with the encoded environmental data.
  • Figure 4: The training curve during the multi-terrain training phase, depicting the reward curve and terrain level change with training iterations.
  • Figure 5: The robot passes through four types of terrain: hurdles, uphill, downstairs, and gaps. The Vanilla MoE method and our method show the changes in the expert activation level over time.
  • ...and 2 more figures