Table of Contents
Fetching ...

Learning Smooth Humanoid Locomotion through Lipschitz-Constrained Policies

Zixuan Chen, Xialin He, Yen-Jen Wang, Qiayuan Liao, Yanjie Ze, Zhongyu Li, S. Shankar Sastry, Jiajun Wu, Koushil Sreenath, Saurabh Gupta, Xue Bin Peng

TL;DR

This work tackles the challenge of transferring robust, smooth locomotion policies from simulation to real humanoid robots. It introduces Lipschitz-Constrained Policies (LCP), a differentiable gradient-penalty regularizer that enforces a Lipschitz constraint on the policy with respect to observations, providing an alternative to non-differentiable smoothing methods. Through extensive simulation and real-world experiments across multiple platforms, LCP achieves smooth, robust walking and shows competitive task performance without heavy manual tuning. The results suggest significant practical impact for generalizable, smooth sim-to-real locomotion, with open-source code and demonstrations for broader adoption.

Abstract

Reinforcement learning combined with sim-to-real transfer offers a general framework for developing locomotion controllers for legged robots. To facilitate successful deployment in the real world, smoothing techniques, such as low-pass filters and smoothness rewards, are often employed to develop policies with smooth behaviors. However, because these techniques are non-differentiable and usually require tedious tuning of a large set of hyperparameters, they tend to require extensive manual tuning for each robotic platform. To address this challenge and establish a general technique for enforcing smooth behaviors, we propose a simple and effective method that imposes a Lipschitz constraint on a learned policy, which we refer to as Lipschitz-Constrained Policies (LCP). We show that the Lipschitz constraint can be implemented in the form of a gradient penalty, which provides a differentiable objective that can be easily incorporated with automatic differentiation frameworks. We demonstrate that LCP effectively replaces the need for smoothing rewards or low-pass filters and can be easily integrated into training frameworks for many distinct humanoid robots. We extensively evaluate LCP in both simulation and real-world humanoid robots, producing smooth and robust locomotion controllers. All simulation and deployment code, along with complete checkpoints, is available on our project page: https://lipschitz-constrained-policy.github.io.

Learning Smooth Humanoid Locomotion through Lipschitz-Constrained Policies

TL;DR

This work tackles the challenge of transferring robust, smooth locomotion policies from simulation to real humanoid robots. It introduces Lipschitz-Constrained Policies (LCP), a differentiable gradient-penalty regularizer that enforces a Lipschitz constraint on the policy with respect to observations, providing an alternative to non-differentiable smoothing methods. Through extensive simulation and real-world experiments across multiple platforms, LCP achieves smooth, robust walking and shows competitive task performance without heavy manual tuning. The results suggest significant practical impact for generalizable, smooth sim-to-real locomotion, with open-source code and demonstrations for broader adoption.

Abstract

Reinforcement learning combined with sim-to-real transfer offers a general framework for developing locomotion controllers for legged robots. To facilitate successful deployment in the real world, smoothing techniques, such as low-pass filters and smoothness rewards, are often employed to develop policies with smooth behaviors. However, because these techniques are non-differentiable and usually require tedious tuning of a large set of hyperparameters, they tend to require extensive manual tuning for each robotic platform. To address this challenge and establish a general technique for enforcing smooth behaviors, we propose a simple and effective method that imposes a Lipschitz constraint on a learned policy, which we refer to as Lipschitz-Constrained Policies (LCP). We show that the Lipschitz constraint can be implemented in the form of a gradient penalty, which provides a differentiable objective that can be easily incorporated with automatic differentiation frameworks. We demonstrate that LCP effectively replaces the need for smoothing rewards or low-pass filters and can be easily integrated into training frameworks for many distinct humanoid robots. We extensively evaluate LCP in both simulation and real-world humanoid robots, producing smooth and robust locomotion controllers. All simulation and deployment code, along with complete checkpoints, is available on our project page: https://lipschitz-constrained-policy.github.io.

Paper Structure

This paper contains 31 sections, 10 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Lipschitz continuity is a method of quantifying the smoothness functions. A Lipschitz continuous function is a function whose rate-of-change is bounded by a constant $K$.
  • Figure 2: Gradient of policies trained with and without smoothness rewards. Policies with smoother behaviors also exhibit smaller gradient magnitudes.
  • Figure 3: Smoothness metrics recorded over the course of training. LCP produces smooth behaviors that are comparable to policies that are trained with explicit smoothness rewards.
  • Figure 4: Task returns of different smoothing methods. LCP provides an effective alternative to other techniques.
  • Figure 5: Task returns of LCP with different $\lambda_{\text{gp}}$. Excessively large $\lambda_{\text{gp}}$ may hinder policy learning.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Definition III.1: Lipschitz Continuity