PPF: Pre-training and Preservative Fine-tuning of Humanoid Locomotion via Model-Assumption-based Regularization

Hyunyoung Jung; Zhaoyuan Gu; Ye Zhao; Hae-Won Park; Sehoon Ha

PPF: Pre-training and Preservative Fine-tuning of Humanoid Locomotion via Model-Assumption-based Regularization

Hyunyoung Jung, Zhaoyuan Gu, Ye Zhao, Hae-Won Park, Sehoon Ha

TL;DR

The paper tackles robust humanoid locomotion under hybrid, high-dimensional dynamics and sim-to-real transfer challenges. It introduces PPF, a two-stage framework that pre-trains by imitating a model-based controller and then fine-tunes with reinforcement learning, augmented by model-assumption-based regularization (MAR) to prevent forgetting and adapt when model assumptions are violated. MAR dynamically weights the regularization based on state-wise model-assumption violations, enabling preservation of learned motion while allowing improvements in challenging scenarios. Extensive sim-to-real validation on the Digit humanoid shows that PPF achieves a forward speed up to $1.5$ m/s across diverse terrains, outperforming baselines and demonstrating robust, adaptable locomotion with zero-shot deployment.

Abstract

Humanoid locomotion is a challenging task due to its inherent complexity and high-dimensional dynamics, as well as the need to adapt to diverse and unpredictable environments. In this work, we introduce a novel learning framework for effectively training a humanoid locomotion policy that imitates the behavior of a model-based controller while extending its capabilities to handle more complex locomotion tasks, such as more challenging terrain and higher velocity commands. Our framework consists of three key components: pre-training through imitation of the model-based controller, fine-tuning via reinforcement learning, and model-assumption-based regularization (MAR) during fine-tuning. In particular, MAR aligns the policy with actions from the model-based controller only in states where the model assumption holds to prevent catastrophic forgetting. We evaluate the proposed framework through comprehensive simulation tests and hardware experiments on a full-size humanoid robot, Digit, demonstrating a forward speed of 1.5 m/s and robust locomotion across diverse terrains, including slippery, sloped, uneven, and sandy terrains.

PPF: Pre-training and Preservative Fine-tuning of Humanoid Locomotion via Model-Assumption-based Regularization

TL;DR

Abstract

PPF: Pre-training and Preservative Fine-tuning of Humanoid Locomotion via Model-Assumption-based Regularization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)