Table of Contents
Fetching ...

Learning Natural and Robust Hexapod Locomotion over Complex Terrains via Motion Priors based on Deep Reinforcement Learning

Xin Liu, Jinze Wu, Yinghui Li, Chenkun Qi, Yufei Xue, Feng Gao

TL;DR

The paper addresses the challenge of achieving natural, robust hexapod locomotion on complex terrains using only proprioception. It develops a motion-prior–based reinforcement learning framework in which trajectory-optimized priors are used to train an adversarial discriminator that guides an asymmetric Actor-Critic policy, enabling zero-shot sim-to-real transfer. A tripod-style gait is encouraged via the discriminator, with extensive simulations and real-robot experiments demonstrating natural gaits and strong robustness across indoor stairs, slopes, and outdoor uneven surfaces. The work advances practical blind hexapod locomotion by combining motion priors, adversarial imitation, and proprioceptive control for reliable real-world deployment.

Abstract

Multi-legged robots offer enhanced stability to navigate complex terrains with their multiple legs interacting with the environment. However, how to effectively coordinate the multiple legs in a larger action exploration space to generate natural and robust movements is a key issue. In this paper, we introduce a motion prior-based approach, successfully applying deep reinforcement learning algorithms to a real hexapod robot. We generate a dataset of optimized motion priors, and train an adversarial discriminator based on the priors to guide the hexapod robot to learn natural gaits. The learned policy is then successfully transferred to a real hexapod robot, and demonstrate natural gait patterns and remarkable robustness without visual information in complex terrains. This is the first time that a reinforcement learning controller has been used to achieve complex terrain walking on a real hexapod robot.

Learning Natural and Robust Hexapod Locomotion over Complex Terrains via Motion Priors based on Deep Reinforcement Learning

TL;DR

The paper addresses the challenge of achieving natural, robust hexapod locomotion on complex terrains using only proprioception. It develops a motion-prior–based reinforcement learning framework in which trajectory-optimized priors are used to train an adversarial discriminator that guides an asymmetric Actor-Critic policy, enabling zero-shot sim-to-real transfer. A tripod-style gait is encouraged via the discriminator, with extensive simulations and real-robot experiments demonstrating natural gaits and strong robustness across indoor stairs, slopes, and outdoor uneven surfaces. The work advances practical blind hexapod locomotion by combining motion priors, adversarial imitation, and proprioceptive control for reliable real-world deployment.

Abstract

Multi-legged robots offer enhanced stability to navigate complex terrains with their multiple legs interacting with the environment. However, how to effectively coordinate the multiple legs in a larger action exploration space to generate natural and robust movements is a key issue. In this paper, we introduce a motion prior-based approach, successfully applying deep reinforcement learning algorithms to a real hexapod robot. We generate a dataset of optimized motion priors, and train an adversarial discriminator based on the priors to guide the hexapod robot to learn natural gaits. The learned policy is then successfully transferred to a real hexapod robot, and demonstrate natural gait patterns and remarkable robustness without visual information in complex terrains. This is the first time that a reinforcement learning controller has been used to achieve complex terrain walking on a real hexapod robot.

Paper Structure

This paper contains 15 sections, 2 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: The hexapod robot showcases its ability to achieve natural and robust locomotion across diverse terrains.
  • Figure 2: The asymmetric Actor-Critic reinforcement learning framework. We formulate three types of rewards to facilitate tripod gait styles. The style-specific reward is given by the discriminator of adversarial motion priors. During deployment, the desired joint position calculated by summing the policy output with the default joint position is sent to the CSP controller to calculate the torque.
  • Figure 3: Comparison of three policies in terms of ability to track sinusoidal velocity commands in the simulation. (a)-(c) Base velocity tracking in x, y, yaw directions. (d)-(f) Base velocity deviations in z-axis, and orientation deviations along the x, y axes. (g) Locomotion guided by $r^g_t+r^s_t$. (h) Locomotion guided by $r^g_t+r^l_t$. (i) Locomotion guided by $r^g_t+r^s_t+r^l_t$.
  • Figure 4: Comparison of policies in terms of ability to travel different terrains.
  • Figure 5: Success rates of different controllers in different terrains