Table of Contents
Fetching ...

Training Directional Locomotion for Quadrupedal Low-Cost Robotic Systems via Deep Reinforcement Learning

Peter Böhm, Archie C. Chapman, Pauline Pounds

TL;DR

This work demonstrates real-world deep reinforcement learning to achieve directional locomotion on a low-cost quadrupedal robot by randomizing the heading at episode resets to promote diverse action–state exploration. The authors introduce a GRU-based sequence encoder (R-TD3) and compare three heading-reset strategies, showing that normally distributed random resets enable robust performance on complex trajectories like figure eights. Training directly on commodity hardware reduces the sim-to-real gap and demonstrates that high-end simulators are not strictly necessary for learning versatile locomotion policies. The approach yields policies capable of following straight lines, circles, and intricate trajectories with minimal human intervention, highlighting practical benefits for scalable, inexpensive legged robots.

Abstract

In this work we present Deep Reinforcement Learning (DRL) training of directional locomotion for low-cost quadrupedal robots in the real world. In particular, we exploit randomization of heading that the robot must follow to foster exploration of action-state transitions most useful for learning both forward locomotion as well as course adjustments. Changing the heading in episode resets to current yaw plus a random value drawn from a normal distribution yields policies able to follow complex trajectories involving frequent turns in both directions as well as long straight-line stretches. By repeatedly changing the heading, this method keeps the robot moving within the training platform and thus reduces human involvement and need for manual resets during the training. Real world experiments on a custom-built, low-cost quadruped demonstrate the efficacy of our method with the robot successfully navigating all validation tests. When trained with other approaches, the robot only succeeds in forward locomotion test and fails when turning is required.

Training Directional Locomotion for Quadrupedal Low-Cost Robotic Systems via Deep Reinforcement Learning

TL;DR

This work demonstrates real-world deep reinforcement learning to achieve directional locomotion on a low-cost quadrupedal robot by randomizing the heading at episode resets to promote diverse action–state exploration. The authors introduce a GRU-based sequence encoder (R-TD3) and compare three heading-reset strategies, showing that normally distributed random resets enable robust performance on complex trajectories like figure eights. Training directly on commodity hardware reduces the sim-to-real gap and demonstrates that high-end simulators are not strictly necessary for learning versatile locomotion policies. The approach yields policies capable of following straight lines, circles, and intricate trajectories with minimal human intervention, highlighting practical benefits for scalable, inexpensive legged robots.

Abstract

In this work we present Deep Reinforcement Learning (DRL) training of directional locomotion for low-cost quadrupedal robots in the real world. In particular, we exploit randomization of heading that the robot must follow to foster exploration of action-state transitions most useful for learning both forward locomotion as well as course adjustments. Changing the heading in episode resets to current yaw plus a random value drawn from a normal distribution yields policies able to follow complex trajectories involving frequent turns in both directions as well as long straight-line stretches. By repeatedly changing the heading, this method keeps the robot moving within the training platform and thus reduces human involvement and need for manual resets during the training. Real world experiments on a custom-built, low-cost quadruped demonstrate the efficacy of our method with the robot successfully navigating all validation tests. When trained with other approaches, the robot only succeeds in forward locomotion test and fails when turning is required.

Paper Structure

This paper contains 21 sections, 1 equation, 5 figures.

Figures (5)

  • Figure 1: Quadrupedal robot platform.
  • Figure 2: Leg consisting of thigh and knee links, actuated by commodity servos.
  • Figure 3: Quasi-circle and figure 8: trajectories used to validate the trained policies.
  • Figure 4: Learning curves using different strategies for heading resets. Showing the last 100k steps of training.
  • Figure 5: Trajectories recorded by running validation tests. The arrows show points where the robot stopped moving and required adjustment of heading. Neither uniform resets trained nor resets to yaw trained policy succeeded on the figure 8 course.