Learning H-Infinity Locomotion Control

Junfeng Long; Wenye Yu; Quanyi Li; Zirui Wang; Dahua Lin; Jiangmiao Pang

Learning H-Infinity Locomotion Control

Junfeng Long, Wenye Yu, Quanyi Li, Zirui Wang, Dahua Lin, Jiangmiao Pang

TL;DR

This work tackles robust locomotion for quadruped robots by addressing the limitation of fixed disturbance distributions in training. It introduces a state conditioned disturber and an $H_{\infty}$-inspired constraint to stabilize the adversarial interaction between the locomotion policy and disturbances, optimizing via PPO with dual gradient descent. Theoretical support is provided through an $\eta$-optimality condition, and extensive experiments in simulation and on Unitree robots demonstrate superior robustness, including real world deployment. The approach shows promise for safer, more reliable locomotion in uncertain environments and may extend to other robust control tasks.

Abstract

Stable locomotion in precipitous environments is an essential task for quadruped robots, requiring the ability to resist various external disturbances. Recent neural policies enhance robustness against disturbances by learning to resist external forces sampled from a fixed distribution in the simulated environment. However, the force generation process doesn't consider the robot's current state, making it difficult to identify the most effective direction and magnitude that can push the robot to the most unstable but recoverable state. Thus, challenging cases in the buffer are insufficient to optimize robustness. In this paper, we propose to model the robust locomotion learning process as an adversarial interaction between the locomotion policy and a learnable disturbance that is conditioned on the robot state to generate appropriate external forces. To make the joint optimization stable, our novel $H_{\infty}$ constraint mandates the bound of the ratio between the cost and the intensity of the external forces. We verify the robustness of our approach in both simulated environments and real-world deployment, on quadrupedal locomotion tasks and a more challenging task where the quadruped performs locomotion merely on hind legs. Training and deployment code will be made public.

Learning H-Infinity Locomotion Control

TL;DR

This work tackles robust locomotion for quadruped robots by addressing the limitation of fixed disturbance distributions in training. It introduces a state conditioned disturber and an

-inspired constraint to stabilize the adversarial interaction between the locomotion policy and disturbances, optimizing via PPO with dual gradient descent. Theoretical support is provided through an

-optimality condition, and extensive experiments in simulation and on Unitree robots demonstrate superior robustness, including real world deployment. The approach shows promise for safer, more reliable locomotion in uncertain environments and may extend to other robust control tasks.

Abstract

constraint mandates the bound of the ratio between the cost and the intensity of the external forces. We verify the robustness of our approach in both simulated environments and real-world deployment, on quadrupedal locomotion tasks and a more challenging task where the quadruped performs locomotion merely on hind legs. Training and deployment code will be made public.

Paper Structure (21 sections, 1 theorem, 14 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 21 sections, 1 theorem, 14 equations, 8 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
Learning $H_{\infty}$ Locomotion Control
Problem Definition
Method
$\eta$-optimality
Practical Implementations
Experimental Results
Can our method and its variants handle continuous disturbances as well as the baseline?
Can all methods handle the challenges of sudden extreme disturbances?
Can all methods resist deliberate disturbances that intentionally attack the policy?
Is our method applicable to other tasks that require stronger robustness?
Can our method be deployed to real robots?
Conclusion
...and 6 more sections

Key Result

Theorem 1

If $\mathbf{C}_{\pi}(s) - \eta\|\mathbf{d}(s)\|_2 < \mathbb{E}_{s' \sim P(\cdot|\pi, s)}(V_{\pi}^{cost}(s) - V_{\pi}^{cost}(s'))$ for $s \in \mathbf{S}$ with $\beta_{\pi}(s) > 0$, the policy $\pi$ is $\eta$-optimal.

Figures (8)

Figure 1: We deploy the policy trained by our method to real robots. Whether in quadrupedal or bipedal states, the robots successfully resist disturbances under various conditions.
Figure 2: Overview of $H_{\infty}$ locomotion control method. At every time step during the training process, we perform a simulation step based on the robot's action and the external force generated by the disturber. The agent thus moves towards the rewarded direction and resists the disturbance. During the optimization process, values are calculated for batched training samples and carry out $H_{\infty}$ policy gradient by optimizing the PPO loss of the actor while taking into consideration the novel constraint $L^{H_{inf}}$. Value estimators (Critic) are also updated to approximate the state value.
Figure 3: Tracking curve of our method and baselines under continuous random forces.
Figure 4: Tracking curve of our method and baselines under sudden large forces.
Figure 5: Tracking curve for all methods tested with disturbers trained to intentionally attack them.
...and 3 more figures

Theorems & Definitions (1)

Theorem 1

Learning H-Infinity Locomotion Control

TL;DR

Abstract

Learning H-Infinity Locomotion Control

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (1)