Table of Contents
Fetching ...

Learning and Adapting Agile Locomotion Skills by Transferring Experience

Laura Smith, J. Chase Kew, Tianyu Li, Linda Luu, Xue Bin Peng, Sehoon Ha, Jie Tan, Sergey Levine

TL;DR

This work tackles the challenge of learning agile legged locomotion by leveraging transfer learning to bootstrap RL with suboptimal source controllers. The proposed method, TWiRL, integrates source-policy data into off-policy learning via a balanced replay buffer mix and a high update-to-data ratio, enabling learning across tasks and environments without requiring optimal demonstrations. Empirical results on the A1 quadruped show successful jumping over obstacles and hind-leg walking to goals in simulation and real-world trials, with strong sim-to-real transfer and cross-domain adaptability. The approach reduces exploration difficulty, demonstrates practical agility in dynamic scenarios, and highlights avenues for automated curricula and broader task generalization in future work.

Abstract

Legged robots have enormous potential in their range of capabilities, from navigating unstructured terrains to high-speed running. However, designing robust controllers for highly agile dynamic motions remains a substantial challenge for roboticists. Reinforcement learning (RL) offers a promising data-driven approach for automatically training such controllers. However, exploration in these high-dimensional, underactuated systems remains a significant hurdle for enabling legged robots to learn performant, naturalistic, and versatile agility skills. We propose a framework for training complex robotic skills by transferring experience from existing controllers to jumpstart learning new tasks. To leverage controllers we can acquire in practice, we design this framework to be flexible in terms of their source -- that is, the controllers may have been optimized for a different objective under different dynamics, or may require different knowledge of the surroundings -- and thus may be highly suboptimal for the target task. We show that our method enables learning complex agile jumping behaviors, navigating to goal locations while walking on hind legs, and adapting to new environments. We also demonstrate that the agile behaviors learned in this way are graceful and safe enough to deploy in the real world.

Learning and Adapting Agile Locomotion Skills by Transferring Experience

TL;DR

This work tackles the challenge of learning agile legged locomotion by leveraging transfer learning to bootstrap RL with suboptimal source controllers. The proposed method, TWiRL, integrates source-policy data into off-policy learning via a balanced replay buffer mix and a high update-to-data ratio, enabling learning across tasks and environments without requiring optimal demonstrations. Empirical results on the A1 quadruped show successful jumping over obstacles and hind-leg walking to goals in simulation and real-world trials, with strong sim-to-real transfer and cross-domain adaptability. The approach reduces exploration difficulty, demonstrates practical agility in dynamic scenarios, and highlights avenues for automated curricula and broader task generalization in future work.

Abstract

Legged robots have enormous potential in their range of capabilities, from navigating unstructured terrains to high-speed running. However, designing robust controllers for highly agile dynamic motions remains a substantial challenge for roboticists. Reinforcement learning (RL) offers a promising data-driven approach for automatically training such controllers. However, exploration in these high-dimensional, underactuated systems remains a significant hurdle for enabling legged robots to learn performant, naturalistic, and versatile agility skills. We propose a framework for training complex robotic skills by transferring experience from existing controllers to jumpstart learning new tasks. To leverage controllers we can acquire in practice, we design this framework to be flexible in terms of their source -- that is, the controllers may have been optimized for a different objective under different dynamics, or may require different knowledge of the surroundings -- and thus may be highly suboptimal for the target task. We show that our method enables learning complex agile jumping behaviors, navigating to goal locations while walking on hind legs, and adapting to new environments. We also demonstrate that the agile behaviors learned in this way are graceful and safe enough to deploy in the real world.
Paper Structure (29 sections, 9 equations, 13 figures, 3 tables, 1 algorithm)

This paper contains 29 sections, 9 equations, 13 figures, 3 tables, 1 algorithm.

Figures (13)

  • Figure 1: Agile skills learned with our proposed method enable the A1 robot to jump repeatedly (left) and walk to a goal location on its hind legs (right).
  • Figure 2: Overview of three different applications of our system. We identify that several fundamental challenges in learning agile locomotion skills can be ameliorated by casting them as transfer learning problems, then applying a simple, generic method that involves a simple modification to off-the-shelf off-policy algorithms. We show that our framework is versatile---with the same method, we can (top) generalize a policy trained to track a reference motion of a jumping dog to learn to jump over randomly placed obstacles; (middle) take a policy that is trained to kick up onto the robot's hind legs to then use bipedal locomotion to navigate to randomly sampled goals; and (bottom) enable efficient fine-tuning in new environments.
  • Figure 3: Examples of our policy (outlined in yellow), which incorporates both online training and data from a motion imitation policy, compared to two policies (outlined in blue) trained from scratch with the same reward function. While naïvely optimizing for the task either exploits the simulator to learn an unnatural motion (middle) or fails completely (bottom), the policy trained by incorporating prior data exhibits a graceful jump.
  • Figure 4: Examples of our policy (outlined in yellow), trained with data from a robot that can already stand on its hind legs, compared to a baseline policy (blue) trained from scratch. Without this added bias, the baseline policy learns to scoot toward the goal on its knees. Our policy gracefully kicks up to standing and navigates to the goal on 2 legs.
  • Figure 5: Illustrating the diverse environments to which we adapt the source policy (trained in the environment labeled 'none' to indicate no modification). In clockwise order: the default, non-randomized environment; a sloped terrain; bumpy terrain; a low-gravity environment; a stochastic environment simulating motor weakening; a simulated ice rink.
  • ...and 8 more figures