Table of Contents
Fetching ...

PPF: Pre-training and Preservative Fine-tuning of Humanoid Locomotion via Model-Assumption-based Regularization

Hyunyoung Jung, Zhaoyuan Gu, Ye Zhao, Hae-Won Park, Sehoon Ha

TL;DR

The paper tackles robust humanoid locomotion under hybrid, high-dimensional dynamics and sim-to-real transfer challenges. It introduces PPF, a two-stage framework that pre-trains by imitating a model-based controller and then fine-tunes with reinforcement learning, augmented by model-assumption-based regularization (MAR) to prevent forgetting and adapt when model assumptions are violated. MAR dynamically weights the regularization based on state-wise model-assumption violations, enabling preservation of learned motion while allowing improvements in challenging scenarios. Extensive sim-to-real validation on the Digit humanoid shows that PPF achieves a forward speed up to $1.5$ m/s across diverse terrains, outperforming baselines and demonstrating robust, adaptable locomotion with zero-shot deployment.

Abstract

Humanoid locomotion is a challenging task due to its inherent complexity and high-dimensional dynamics, as well as the need to adapt to diverse and unpredictable environments. In this work, we introduce a novel learning framework for effectively training a humanoid locomotion policy that imitates the behavior of a model-based controller while extending its capabilities to handle more complex locomotion tasks, such as more challenging terrain and higher velocity commands. Our framework consists of three key components: pre-training through imitation of the model-based controller, fine-tuning via reinforcement learning, and model-assumption-based regularization (MAR) during fine-tuning. In particular, MAR aligns the policy with actions from the model-based controller only in states where the model assumption holds to prevent catastrophic forgetting. We evaluate the proposed framework through comprehensive simulation tests and hardware experiments on a full-size humanoid robot, Digit, demonstrating a forward speed of 1.5 m/s and robust locomotion across diverse terrains, including slippery, sloped, uneven, and sandy terrains.

PPF: Pre-training and Preservative Fine-tuning of Humanoid Locomotion via Model-Assumption-based Regularization

TL;DR

The paper tackles robust humanoid locomotion under hybrid, high-dimensional dynamics and sim-to-real transfer challenges. It introduces PPF, a two-stage framework that pre-trains by imitating a model-based controller and then fine-tunes with reinforcement learning, augmented by model-assumption-based regularization (MAR) to prevent forgetting and adapt when model assumptions are violated. MAR dynamically weights the regularization based on state-wise model-assumption violations, enabling preservation of learned motion while allowing improvements in challenging scenarios. Extensive sim-to-real validation on the Digit humanoid shows that PPF achieves a forward speed up to m/s across diverse terrains, outperforming baselines and demonstrating robust, adaptable locomotion with zero-shot deployment.

Abstract

Humanoid locomotion is a challenging task due to its inherent complexity and high-dimensional dynamics, as well as the need to adapt to diverse and unpredictable environments. In this work, we introduce a novel learning framework for effectively training a humanoid locomotion policy that imitates the behavior of a model-based controller while extending its capabilities to handle more complex locomotion tasks, such as more challenging terrain and higher velocity commands. Our framework consists of three key components: pre-training through imitation of the model-based controller, fine-tuning via reinforcement learning, and model-assumption-based regularization (MAR) during fine-tuning. In particular, MAR aligns the policy with actions from the model-based controller only in states where the model assumption holds to prevent catastrophic forgetting. We evaluate the proposed framework through comprehensive simulation tests and hardware experiments on a full-size humanoid robot, Digit, demonstrating a forward speed of 1.5 m/s and robust locomotion across diverse terrains, including slippery, sloped, uneven, and sandy terrains.

Paper Structure

This paper contains 28 sections, 9 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Our Pre-training and Preservative Fine-tuning (PPF)Pre-training and Continual Improvement (PreCi) framework achieves a forward velocity of 1.5 m/s while successfully traversing a whiteboard covered with poppy seeds or olive oil, as well as diverse outdoor terrains, including hills, uneven surfaces, and sand.
  • Figure 2: Motion forgetting example. Unlike MBC, IFM is trained to swing its foot inwards initially and step with narrower foot placement, optimizing for lateral tracking accuracy and reduced energy consumption.to narrow its foot in the swing phase to optimize the tracking reward in the lateral direction.
  • Figure 3: Overview of Model-Assumption-based Regularization (MAR). Our framework automatically adjusts the supervised loss regularization based on the assumption violation of MBC.
  • Figure 4: Testing terrains for the robustness tests in MuJoCo.
  • Figure 5: Terrain level reached by each controller in the MuJoCo robustness test. PPF successfully traverses all terrains within the time limit. IFM stumbles over its own feet on uneven terrain. FullReg fails to complete the final terrain due to poor tracking performance.
  • ...and 3 more figures