Table of Contents
Fetching ...

AcL: Action Learner for Fault-Tolerant Quadruped Locomotion Control

Tianyu Xu, Yaoyu Cheng, Pinxi Shen, Lin Zhao

TL;DR

This work tackles fault-tolerant quadruped locomotion under multiple joint faults by introducing Action Learner (AcL), a teacher-student reinforcement learning framework. It trains multiple fault-specific teacher policies and distills their guidance into a single encoder–decoder student policy, using style rewards derived from teacher actions and regularization rewards to ensure robust, smooth gait transitions. The encoder identifies fault conditions from history, enabling autonomous switching between normal and limping gaits, while the decoder generates actions; the approach supports up to four faulty joints and maintains stability under disturbances. Real-world tests on a Unitree Go2 validate fault-tolerant walking, seamless gait transitions, and resilience to external perturbations, demonstrating practical viability and potential applicability to broader terrain-adaptive tasks.

Abstract

Quadrupedal robots can learn versatile locomotion skills but remain vulnerable when one or more joints lose power. In contrast, dogs and cats can adopt limping gaits when injured, demonstrating their remarkable ability to adapt to physical conditions. Inspired by such adaptability, this paper presents Action Learner (AcL), a novel teacher-student reinforcement learning framework that enables quadrupeds to autonomously adapt their gait for stable walking under multiple joint faults. Unlike conventional teacher-student approaches that enforce strict imitation, AcL leverages teacher policies to generate style rewards, guiding the student policy without requiring precise replication. We train multiple teacher policies, each corresponding to a different fault condition, and subsequently distill them into a single student policy with an encoder-decoder architecture. While prior works primarily address single-joint faults, AcL enables quadrupeds to walk with up to four faulty joints across one or two legs, autonomously switching between different limping gaits when faults occur. We validate AcL on a real Go2 quadruped robot under single- and double-joint faults, demonstrating fault-tolerant, stable walking, smooth gait transitions between normal and lamb gaits, and robustness against external disturbances.

AcL: Action Learner for Fault-Tolerant Quadruped Locomotion Control

TL;DR

This work tackles fault-tolerant quadruped locomotion under multiple joint faults by introducing Action Learner (AcL), a teacher-student reinforcement learning framework. It trains multiple fault-specific teacher policies and distills their guidance into a single encoder–decoder student policy, using style rewards derived from teacher actions and regularization rewards to ensure robust, smooth gait transitions. The encoder identifies fault conditions from history, enabling autonomous switching between normal and limping gaits, while the decoder generates actions; the approach supports up to four faulty joints and maintains stability under disturbances. Real-world tests on a Unitree Go2 validate fault-tolerant walking, seamless gait transitions, and resilience to external perturbations, demonstrating practical viability and potential applicability to broader terrain-adaptive tasks.

Abstract

Quadrupedal robots can learn versatile locomotion skills but remain vulnerable when one or more joints lose power. In contrast, dogs and cats can adopt limping gaits when injured, demonstrating their remarkable ability to adapt to physical conditions. Inspired by such adaptability, this paper presents Action Learner (AcL), a novel teacher-student reinforcement learning framework that enables quadrupeds to autonomously adapt their gait for stable walking under multiple joint faults. Unlike conventional teacher-student approaches that enforce strict imitation, AcL leverages teacher policies to generate style rewards, guiding the student policy without requiring precise replication. We train multiple teacher policies, each corresponding to a different fault condition, and subsequently distill them into a single student policy with an encoder-decoder architecture. While prior works primarily address single-joint faults, AcL enables quadrupeds to walk with up to four faulty joints across one or two legs, autonomously switching between different limping gaits when faults occur. We validate AcL on a real Go2 quadruped robot under single- and double-joint faults, demonstrating fault-tolerant, stable walking, smooth gait transitions between normal and lamb gaits, and robustness against external disturbances.

Paper Structure

This paper contains 20 sections, 5 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: We propose a deep reinforcement learning controller that ensures stable and robust locomotion for a quadruped robot, even when multiple joints experience control failures. The joint faults are enabled by setting the control torque to zero (the thigh and knee joints of the left-rear leg for the figure's case). (a) The quadruped is capable of seamlessly transitioning between a normal gait and a fault-tolerant limping gait, depending on whether the affected joints are functioning or faulted. (b) Quadruped can achieve adaptively transition not only between normal and limping gaits but also among different limping gaits. (c) The limping gait demonstrates smooth walking that keeps an unattached load stable. (d) This approach also demonstrates strong adaptability and robustness even in the presence of external disturbances during the limping gait.
  • Figure 2: Overview of the proposed AcL framework — a teacher-student approach for learning multiple gaits within a single policy network. (a) The teacher policies are trained separately, each corresponding to a different fault scenario. (b) The student policy backbone uses an encoder-decoder architecture, with both the encoder and decoder trained online, but with separate parameter updates. The encoder is pre-trained using a supervised method with the datasets collected from training the teacher policies, while the decoder is pre-trained via reinforcement learning. Both encoder and decoder will be trained together online to further improve the performance. (c) The rewards consist of two components: style rewards, based on the similarity between the teacher and student policies, and regularization rewards, aimed at ensuring robust locomotion. The trained agent can autonomously and smoothly switch between different fault scenarios.
  • Figure 3: Quadruped with teacher policies deployed in Gazebo for all 11 cases.
  • Figure 4: Style reward evolution during locomotion tasks under fault conditions, indicating the switching of gait patterns. Blue curves: style rewards evaluated between the learned policy and the teacher police of the corresponding faulty case; Red curves: style rewards evaluated between the learned policy and the teacher policy of the normal case. Faults are introduced at step 300 and removed at step 600. For each leg, with one or two faulty joints, there are six possible fault scenarios. Reward values are generated for all scenarios, and the mean values are represented by bold curves.
  • Figure 5: Reward design for teacher policy training. The color intensity represents the relative significance of each reward in shaping the gait — darker shades indicate higher importance and larger weights.
  • ...and 2 more figures