Table of Contents
Fetching ...

Saving the Limping: Fault-tolerant Quadruped Locomotion via Reinforcement Learning

Dikai Liu, Tianwei Zhang, Jianxiong Yin, Simon See

TL;DR

A novel methodology to train and test hardware fault-tolerant controllers for quadruped locomotion, both in the simulation and physical world is proposed, which adopts the teacher-student reinforcement learning framework to train the controller with close-to-reality joint-locking failure in the Simulation.

Abstract

Modern quadrupeds are skillful in traversing or even sprinting on uneven terrains in a remote uncontrolled environment. However, survival in the wild requires not only maneuverability, but also the ability to handle potential critical hardware failures. How to grant such ability to quadrupeds is rarely investigated. In this paper, we propose a novel methodology to train and test hardware fault-tolerant controllers for quadruped locomotion, both in the simulation and physical world. We adopt the teacher-student reinforcement learning framework to train the controller with close-to-reality joint-locking failure in the simulation, which can be zero-shot transferred to the physical robot without any fine-tuning. Extensive experiments show that our fault-tolerant controller can efficiently lead a quadruped stably when it faces joint failures during locomotion.

Saving the Limping: Fault-tolerant Quadruped Locomotion via Reinforcement Learning

TL;DR

A novel methodology to train and test hardware fault-tolerant controllers for quadruped locomotion, both in the simulation and physical world is proposed, which adopts the teacher-student reinforcement learning framework to train the controller with close-to-reality joint-locking failure in the Simulation.

Abstract

Modern quadrupeds are skillful in traversing or even sprinting on uneven terrains in a remote uncontrolled environment. However, survival in the wild requires not only maneuverability, but also the ability to handle potential critical hardware failures. How to grant such ability to quadrupeds is rarely investigated. In this paper, we propose a novel methodology to train and test hardware fault-tolerant controllers for quadruped locomotion, both in the simulation and physical world. We adopt the teacher-student reinforcement learning framework to train the controller with close-to-reality joint-locking failure in the simulation, which can be zero-shot transferred to the physical robot without any fine-tuning. Extensive experiments show that our fault-tolerant controller can efficiently lead a quadruped stably when it faces joint failures during locomotion.
Paper Structure (17 sections, 5 equations, 8 figures, 2 tables)

This paper contains 17 sections, 5 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Physical robot and its simulated counterpart. Unitree A1 is equipped with our joint locking mechanism. Its official URDF model is used in the Isaac Gym simulator makoviychuk2021isaac with body links of the locked joint showing in red.
  • Figure 2: Overview of our methodology. We adopt the reinforcement learning architecture with the teacher-student framework from kumar2021rmamargolisyang2022rapid to train the policy. The architecture consists of a teacher network $\mu$, a student network $\phi$, and a policy network $\pi$. During training, synthetic data from the simulator are used to compute the latent representation $z_t$ and $\hat{z}_t$ of the teacher and the student, respectively. By fusing the latent information, we train all three networks jointly for fast convergence in the early stage and then an optimized student policy in the end. The policy and student model will be directly deployed on the physical robot without any further offline training or fine-tuning. During deployment, policy network takes only $\hat{z}_t$ from student network as the latent representation.
  • Figure 3: The 3D-printed joint locking mechanism assembled in the physical device \ref{['fig:locking_assemble']}, containing two mounts for thigh link and calf link \ref{['fig:locking_3d']} for rod connection to form a locking situation.
  • Figure 4: Reward return in training different teacher and student policies.
  • Figure 5: Joint distribution of failure joint in the worst cases of FailureEnv agent and BaseEnv agent
  • ...and 3 more figures