Table of Contents
Fetching ...

Learning to Walk from Three Minutes of Real-World Data with Semi-structured Dynamics Models

Jacob Levy, Tyler Westenbroek, David Fridovich-Keil

TL;DR

This work develops an ensemble of probabilistic models to estimate external forces, conditioned on historical observations and actions, and integrates these predictions using known Lagrangian dynamics, and proposes Semi-Structured Reinforcement Learning (SSRL) a simple model-based learning framework which pushes the sample complexity boundary for real-world learning.

Abstract

Traditionally, model-based reinforcement learning (MBRL) methods exploit neural networks as flexible function approximators to represent $\textit{a priori}$ unknown environment dynamics. However, training data are typically scarce in practice, and these black-box models often fail to generalize. Modeling architectures that leverage known physics can substantially reduce the complexity of system-identification, but break down in the face of complex phenomena such as contact. We introduce a novel framework for learning semi-structured dynamics models for contact-rich systems which seamlessly integrates structured first principles modeling techniques with black-box auto-regressive models. Specifically, we develop an ensemble of probabilistic models to estimate external forces, conditioned on historical observations and actions, and integrate these predictions using known Lagrangian dynamics. With this semi-structured approach, we can make accurate long-horizon predictions with substantially less data than prior methods. We leverage this capability and propose Semi-Structured Reinforcement Learning ($\texttt{SSRL}$) a simple model-based learning framework which pushes the sample complexity boundary for real-world learning. We validate our approach on a real-world Unitree Go1 quadruped robot, learning dynamic gaits -- from scratch -- on both hard and soft surfaces with just a few minutes of real-world data. Video and code are available at: https://sites.google.com/utexas.edu/ssrl

Learning to Walk from Three Minutes of Real-World Data with Semi-structured Dynamics Models

TL;DR

This work develops an ensemble of probabilistic models to estimate external forces, conditioned on historical observations and actions, and integrates these predictions using known Lagrangian dynamics, and proposes Semi-Structured Reinforcement Learning (SSRL) a simple model-based learning framework which pushes the sample complexity boundary for real-world learning.

Abstract

Traditionally, model-based reinforcement learning (MBRL) methods exploit neural networks as flexible function approximators to represent unknown environment dynamics. However, training data are typically scarce in practice, and these black-box models often fail to generalize. Modeling architectures that leverage known physics can substantially reduce the complexity of system-identification, but break down in the face of complex phenomena such as contact. We introduce a novel framework for learning semi-structured dynamics models for contact-rich systems which seamlessly integrates structured first principles modeling techniques with black-box auto-regressive models. Specifically, we develop an ensemble of probabilistic models to estimate external forces, conditioned on historical observations and actions, and integrate these predictions using known Lagrangian dynamics. With this semi-structured approach, we can make accurate long-horizon predictions with substantially less data than prior methods. We leverage this capability and propose Semi-Structured Reinforcement Learning () a simple model-based learning framework which pushes the sample complexity boundary for real-world learning. We validate our approach on a real-world Unitree Go1 quadruped robot, learning dynamic gaits -- from scratch -- on both hard and soft surfaces with just a few minutes of real-world data. Video and code are available at: https://sites.google.com/utexas.edu/ssrl

Paper Structure

This paper contains 25 sections, 11 equations, 12 figures, 5 tables, 2 algorithms.

Figures (12)

  • Figure 1: Unitree Go1 quadruped learning to walk from scratch using SSRL on hard ground (left) and memory foam (right).
  • Figure 2: The SSRL framework. A deterministic policy is used to collect data from the real world while a stochastic policy is utilized in conjunction with the learned dynamics model to "hallucinate" short synthetic rollouts which branch from this data. The model incorporates Lagrangian dynamics and encodes previous state predictions, which are fed to external torque and noise estimators to predict future states. The synthetic data is used with a model-free RL algorithm to update the policies.
  • Figure 3: Control architecture. The policy takes in a history of observations and outputs parameters to a gait generator and offsets to the gait. The resulting foot positions are sent to an inverse kinematics solver which computes desired joint angles for joint level PD controllers.
  • Figure 4: Real-world results. Left---SSRL efficiently performs policy optimization, even when data is scarce. Center---With our approach, the quadruped steadily learns to walk faster. Right---Predicted and real external vertical force acting on the robot base over one second of real-world data. Real forces are estimated by finite differences. The predictions add noticeable smoothing to the real-world data.
  • Figure 5: Left---SSRL achieves better policy performance compared to a baseline using black-box models. Right---Prediction error for 20-step synthetic rollouts in an unseen environment showcases our method's superior ability to generalize.
  • ...and 7 more figures

Theorems & Definitions (1)

  • Remark 1