Table of Contents
Fetching ...

Visual CPG-RL: Learning Central Pattern Generators for Visually-Guided Quadruped Locomotion

Guillaume Bellegarda, Milad Shafiee, Auke Ijspeert

TL;DR

The results show that the CPG, explicit interoscillator couplings, and memory-enabled policy representations are all beneficial for energy efficiency, robustness to noise and sensory delays of 90 ms, and tracking performance for successful sim-to-real transfer for navigation tasks.

Abstract

We present a framework for learning visually-guided quadruped locomotion by integrating exteroceptive sensing and central pattern generators (CPGs), i.e. systems of coupled oscillators, into the deep reinforcement learning (DRL) framework. Through both exteroceptive and proprioceptive sensing, the agent learns to coordinate rhythmic behavior among different oscillators to track velocity commands, while at the same time override these commands to avoid collisions with the environment. We investigate several open robotics and neuroscience questions: 1) What is the role of explicit interoscillator couplings between oscillators, and can such coupling improve sim-to-real transfer for navigation robustness? 2) What are the effects of using a memory-enabled vs. a memory-free policy network with respect to robustness, energy-efficiency, and tracking performance in sim-to-real navigation tasks? 3) How do animals manage to tolerate high sensorimotor delays, yet still produce smooth and robust gaits? To answer these questions, we train our perceptive locomotion policies in simulation and perform sim-to-real transfers to the Unitree Go1 quadruped, where we observe robust navigation in a variety of scenarios. Our results show that the CPG, explicit interoscillator couplings, and memory-enabled policy representations are all beneficial for energy efficiency, robustness to noise and sensory delays of 90 ms, and tracking performance for successful sim-to-real transfer for navigation tasks. Video results can be found at https://youtu.be/wpsbSMzIwgM.

Visual CPG-RL: Learning Central Pattern Generators for Visually-Guided Quadruped Locomotion

TL;DR

The results show that the CPG, explicit interoscillator couplings, and memory-enabled policy representations are all beneficial for energy efficiency, robustness to noise and sensory delays of 90 ms, and tracking performance for successful sim-to-real transfer for navigation tasks.

Abstract

We present a framework for learning visually-guided quadruped locomotion by integrating exteroceptive sensing and central pattern generators (CPGs), i.e. systems of coupled oscillators, into the deep reinforcement learning (DRL) framework. Through both exteroceptive and proprioceptive sensing, the agent learns to coordinate rhythmic behavior among different oscillators to track velocity commands, while at the same time override these commands to avoid collisions with the environment. We investigate several open robotics and neuroscience questions: 1) What is the role of explicit interoscillator couplings between oscillators, and can such coupling improve sim-to-real transfer for navigation robustness? 2) What are the effects of using a memory-enabled vs. a memory-free policy network with respect to robustness, energy-efficiency, and tracking performance in sim-to-real navigation tasks? 3) How do animals manage to tolerate high sensorimotor delays, yet still produce smooth and robust gaits? To answer these questions, we train our perceptive locomotion policies in simulation and perform sim-to-real transfers to the Unitree Go1 quadruped, where we observe robust navigation in a variety of scenarios. Our results show that the CPG, explicit interoscillator couplings, and memory-enabled policy representations are all beneficial for energy efficiency, robustness to noise and sensory delays of 90 ms, and tracking performance for successful sim-to-real transfer for navigation tasks. Video results can be found at https://youtu.be/wpsbSMzIwgM.
Paper Structure (20 sections, 2 equations, 5 figures, 3 tables)

This paper contains 20 sections, 2 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Visual CPG-RL perceptive locomotion on Unitree Go1.
  • Figure 2: (a): Control architecture for learning central pattern generators for visually-guided quadruped locomotion. The observation consists of velocity commands, exteroceptive measurements, proprioceptive measurements, and the current CPG states, which the policy network uses to select CPG parameters $\mu_x$, $\mu_y$, and $\omega$ for each leg $i$ (Front Left (FL), Front Right (FR), Hind Left (HL), Hind Right (HR)). The resulting CPG states are mapped to desired foot positions $\bm{p}_d$, which are then converted to desired joint angles with inverse kinematics, and finally tracked with joint PD control to produce torques $\bm{\tau}$. The control policy selects actions at 100 Hz, and all other blocks operate at 1 kHz. (b): Mapping CPG states to Cartesian foot positions. Left: feet path during swing and stance phases. Top right in the (vertical) XZ-plane: ground clearance ($g_c$), ground penetration ($g_p$), max step length ($d_{step}$) are design parameters, whereas CPG states $r_x$ and $\theta$ control amplitude and phase. Bottom right in the (horizontal) XY-plane: coordinating omnidirectional motion in the leg frame (arrow shows swing phase motion) with converged amplitude set points $\mu_x=2$, $\mu_y=1.25$, representing the full $d_{step}$ and $\frac{1}{2}d_{step}$, respectively.
  • Figure 3: Sim-to-real tracking performance of $v_{b,x}^{*} = 0.35$m/s in a wide corridor (dashed line) and on a test navigation environment involving both left and right turns (solid lines), with both MLP and LSTM policies trained with varying coupling weights ($w_{i,j}$ in Equation \ref{['eq:rl_theta2']}). Each data point represents the mean of 10 trials. From top to bottom, we present the mean Cost of Transport (COT), the quadruped base mean velocity, the mean frequency across all oscillators ($\dot{\theta}$), the mean amplitude $r_x$ correlated with the mean step length, and the success rate denoting avoidance of obstacle collisions or falls.
  • Figure 4: CPG states during omnidirectional commands: $v_{b,y}^{*} = 0.4$m/s from 10-14 s, $v_{b,y}^{*} = -0.4$m/s from 14-18 s, $\omega_{b,z}^{*} = 0.7$rad/s from 18-23 s, and $\omega_{b,z}^{*} = -0.7$rad/s from 23-28 s. The $y$ amplitudes $r_y$ produce locomotion for the system for the lateral commands. For turning in place, we observe coordination between both $x$ and $y$ amplitudes.
  • Figure 5: Simulation test environment involving both left and right turns, as well as turning around an obstacle, as in the hardware experiments.