Real-World Humanoid Locomotion with Reinforcement Learning

Ilija Radosavovic; Tete Xiao; Bike Zhang; Trevor Darrell; Jitendra Malik; Koushil Sreenath

Real-World Humanoid Locomotion with Reinforcement Learning

Ilija Radosavovic, Tete Xiao, Bike Zhang, Trevor Darrell, Jitendra Malik, Koushil Sreenath

TL;DR

A fully learning-based approach for real-world humanoid locomotion that takes the history of proprioceptive observations and actions as input and predicts the next action, and is a causal transformer that enables real-world humanoid locomotion.

Abstract

Humanoid robots that can autonomously operate in diverse environments have the potential to help address labour shortages in factories, assist elderly at homes, and colonize new planets. While classical controllers for humanoid robots have shown impressive results in a number of settings, they are challenging to generalize and adapt to new environments. Here, we present a fully learning-based approach for real-world humanoid locomotion. Our controller is a causal transformer that takes the history of proprioceptive observations and actions as input and predicts the next action. We hypothesize that the observation-action history contains useful information about the world that a powerful transformer model can use to adapt its behavior in-context, without updating its weights. We train our model with large-scale model-free reinforcement learning on an ensemble of randomized environments in simulation and deploy it to the real world zero-shot. Our controller can walk over various outdoor terrains, is robust to external disturbances, and can adapt in context.

Real-World Humanoid Locomotion with Reinforcement Learning

TL;DR

Abstract

Paper Structure (34 sections, 2 equations, 8 figures)

This paper contains 34 sections, 2 equations, 8 figures.

Introduction
Results
Discussion
Materials and Methods
Acknowledgments

Figures (8)

Figure 1: Deployment to outdoor environments. We deploy our model to a number of outdoor environments. Example videos are shown in https://youtu.be/Wd1q8KaNuME. We find that our controller is able to traverse a range of everyday environments including plazas, side walks, tracks, and grass fields.
Figure 2: Indoor and simulation experiments. We test the robustness of our controller to (A) external disturbances, (B) different terrains, and (C) payloads. Videos are shown in https://youtu.be/cdbWFNvT72c. We find that our controller is able to tackle of the scenarios successfully, including those that are considerably out of the training distribution. (D) We find that our controller outperforms the state-of-the-art company controller across three different settings in simulation. The gains are larger for harder terrains, like steps and unstable ground. We replicate a subset of the scenarios on hardware and observe consistent behaviors, which can be seen in examples from https://youtu.be/MUgey-1j5tE.
Figure 3: Omnidirectional walking. Our learning-based controller is able to accurately follow a range of velocity commands to perform omni-directional locomotion, including (A) walking forward, (B) backward, and (C) turning. Video examples are shown in https://youtu.be/7bChPZWTAig.
Figure 4: Arm swing and fast walking. (A) The learned humanoid locomotion in our experiments exhibits human-like arm swing behaviors in coordination with leg movements, i.e., a contralateral relationship between the arms and the legs. (B) Our controller is able to perform fast walking on hardware. The video is shown in https://youtu.be/gD9Y-hvfBic.
Figure 5: Gait changes based on terrain type. (A) We command the robot to walk forward over a course consisting of three sections: flat, downward slope, and flat again. We observe that our controller adapts its behavior based on terrain, changing the gait from natural walking on flat terrain, to small steps on downward slope, to natural walking on flat terrain again. Video is shown in https://youtu.be/ByEk-D3TevM. This type of adaptation based on context is emergent and has not been pre-specified during training. (B) We analyze the hidden state of the last layer of our neural network controller and find that certain neuron responses correlate with the gait patterns observed over different terrain sections. (C) In addition, some of the neuron responses correlate changes in the terrain and are high for flat sections and low for the slope section. (D) To analyze the neural responses in aggregate, we project the 192-dimensional hidden states to two dimensions using PCA and t-SNE. Each data point corresponds to one timestep and is color-coded by the terrain section. We see that the hidden states get grouped into clear clusters based on the terrain type.
...and 3 more figures

Real-World Humanoid Locomotion with Reinforcement Learning

TL;DR

Abstract

Real-World Humanoid Locomotion with Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)