Table of Contents
Fetching ...

Learning Whole-Body Control for a Salamander Robot

Mengze Tian, Qiyuan Fu, Chuanfang Ning, Javier Jia Jie Pey, Auke Ijspeert

Abstract

Amphibious legged robots inspired by salamanders are promising in applications in complex amphibious environments. However, despite the significant success of training controllers that achieve diverse locomotion behaviors in conventional quadrupedal robots, most salamander robots relied on central-pattern-generator (CPG)-based and model-based coordination strategies for locomotion control. Learning unified joint-level whole-body control that reliably transfers from simulation to highly articulated physical salamander robots remains relatively underexplored. In addition, few legged robots have tried learning-based controllers in amphibious environments. In this work, we employ Reinforcement Learning to map proprioceptive observations and commanded velocities to joint-level actions, allowing coordinated locomotor behaviors to emerge. To deploy these policies on hardware, we adopt a system-level real-to-sim matching and sim-to-real transfer strategy. The learned controller achieves stable and coordinated walking on both flat and uneven terrains in the real world. Beyond terrestrial locomotion, the framework enables transitions between walking and swimming in simulation, highlighting a phenomenon of interest for understanding locomotion across distinct physical modes.

Learning Whole-Body Control for a Salamander Robot

Abstract

Amphibious legged robots inspired by salamanders are promising in applications in complex amphibious environments. However, despite the significant success of training controllers that achieve diverse locomotion behaviors in conventional quadrupedal robots, most salamander robots relied on central-pattern-generator (CPG)-based and model-based coordination strategies for locomotion control. Learning unified joint-level whole-body control that reliably transfers from simulation to highly articulated physical salamander robots remains relatively underexplored. In addition, few legged robots have tried learning-based controllers in amphibious environments. In this work, we employ Reinforcement Learning to map proprioceptive observations and commanded velocities to joint-level actions, allowing coordinated locomotor behaviors to emerge. To deploy these policies on hardware, we adopt a system-level real-to-sim matching and sim-to-real transfer strategy. The learned controller achieves stable and coordinated walking on both flat and uneven terrains in the real world. Beyond terrestrial locomotion, the framework enables transitions between walking and swimming in simulation, highlighting a phenomenon of interest for understanding locomotion across distinct physical modes.
Paper Structure (17 sections, 8 equations, 11 figures, 3 tables)

This paper contains 17 sections, 8 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Overview of the learned salamander locomotion across diverse terrains.Top row: traversal from cobblestones to grass. Second row: traversal from stepped terrain, descending a height drop to flat ground. Third row: transition from land to water in simulation. Bottom row: locomotion on slippery sand (left) and rough terrain in the lab (right).
  • Figure 2: Control architecture of the learned joint-space controller. The policy receives commanded planar velocity $\mathbf{v}_{\mathrm{cmd}}$ and a proprioceptive observation $\mathbf{o}_t = (\mathbf{q}, \Delta\mathbf{q}, \boldsymbol{\omega}, \mathbf{g}, \boldsymbol{\phi})$, together with a short history of past actions. The policy outputs joint angle residuals $\Delta \mathbf{q}_{\mathrm{des}}$, which are added to a nominal joint angle $\mathbf{q}_{\mathrm{nominal}}$ to form the desired joint angle $\mathbf{q}_{\mathrm{des}}$. A low-level joint-space PD controller tracks $\mathbf{q}_{\mathrm{des}}$ and produces joint torques $\boldsymbol{\tau}$.
  • Figure 3: Training terrains used for locomotion learning with the robot having a maximum ground clearance of $10.2\,\mathrm{cm}$ when all tibia links are vertical. (i) Hill terrain with the max slope angle of $15^\circ$. (ii) Rugged terrain with the max height variation of $4\,\mathrm{cm}$. (iii) Valley terrain with the max slope angle of $15^\circ$.
  • Figure 4: Kinematic modeling of structural backlash. In addition to the actuated joint rotation about the $z$-axis (blue), two auxiliary passive rotational degrees of freedom about the $x$-axis and $y$-axis (orange and green) are introduced at the front and back girdle connections to model accumulated mechanical clearances. All other joints in the model remain rigid and do not include additional passive degrees of freedom.
  • Figure 5: Training environment for the transition task. Blue regions indicate water areas.
  • ...and 6 more figures