TOLEBI: Learning Fault-Tolerant Bipedal Locomotion via Online Status Estimation and Fallibility Rewards

Hokyun Lee; Woo-Jeong Baek; Junhyeok Cha; Jaeheung Park

TOLEBI: Learning Fault-Tolerant Bipedal Locomotion via Online Status Estimation and Fallibility Rewards

Hokyun Lee, Woo-Jeong Baek, Junhyeok Cha, Jaeheung Park

TL;DR

TOLEBI presents the first learning-based fault-tolerant framework for bipedal locomotion by integrating online joint-status estimation, fault-injected training, curriculum learning, and phase-modulation to adapt gait timing under faults. The method achieves robust sim-to-real transfer through domain and dynamics randomization and demonstrates practical fault tolerance on the TOCABI humanoid during flat-ground walking and stair descent. Key contributions include a GRU-based online fault estimator, a fallibility reward that preserves nominal gait while enabling fault resilience, and an effective training curriculum that bridges simulation to real-world deployment. The results show substantial performance gains over baselines across joint-locking and power-loss fault scenarios, highlighting TOLEBI's potential for resilient humanoid locomotion in dynamic, real-world environments.

Abstract

With the growing employment of learning algorithms in robotic applications, research on reinforcement learning for bipedal locomotion has become a central topic for humanoid robotics. While recently published contributions achieve high success rates in locomotion tasks, scarce attention has been devoted to the development of methods that enable to handle hardware faults that may occur during the locomotion process. However, in real-world settings, environmental disturbances or sudden occurrences of hardware faults might yield severe consequences. To address these issues, this paper presents TOLEBI (A faulT-tOlerant Learning framEwork for Bipedal locomotIon) that handles faults on the robot during operation. Specifically, joint locking, power loss and external disturbances are injected in simulation to learn fault-tolerant locomotion strategies. In addition to transferring the learned policy to the real robot via sim-to-real transfer, an online joint status module incorporated. This module enables to classify joint conditions by referring to the actual observations at runtime under real-world conditions. The validation experiments conducted both in real-world and simulation with the humanoid robot TOCABI highlight the applicability of the proposed approach. To our knowledge, this manuscript provides the first learning-based fault-tolerant framework for bipedal locomotion, thereby fostering the development of efficient learning methods in this field.

TOLEBI: Learning Fault-Tolerant Bipedal Locomotion via Online Status Estimation and Fallibility Rewards

TL;DR

Abstract

Paper Structure (31 sections, 8 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 31 sections, 8 equations, 5 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Reinforcement Learning for Bipedal Locomotion
Fault-tolerant Locomotion
Model-based Methods
Learning-based Methods
Preliminaries
Reinforcement Learning
Motor Failures
TOLEBI - A Fault-Tolerant Learning Framework for Bipedal Locomotion
Overview
Reinforcement Learning for Biped Locomotion
State Space
Action Space
Reward Function
...and 16 more sections

Figures (5)

Figure 1: TOLEBI, A framework for learning fault-tolerant bipedal locomotion. First, the motor failure events are injected during training in simulation to learn a fault-tolerant locomotion policy. Next, the policy is transferred to the real humanoid robot.
Figure 2: Schematic description of the framework TOLEBI (faulT-tOlerant Learning framEwork for Bipedal locomotIon). A joint status estimator processes proprioceptive observations to infer joint status, storing the results in the observation history for policy training. During simulation, motor failure scenarios mask the corresponding actions, enabling robust fault-tolerant policy learning. The trained policy is deployed on the real humanoid robot for fault-tolerant locomotion.
Figure 3: Effect of the fallibility rewards. The reward mitigates early-contact impacts under motor failures and reduces impulsive forces that can reach up to 2000 N on the 100 kg humanoid robot TOCABI in real-world experiments.
Figure 4: Comparison of linear and angular velocity tracking performance in real-world experiments under different motor failure scenarios. The plots show base linear velocity (left) and base angular velocity (right) over time for the commanded velocity, healthy case, and motor failure cases (locked joint, power loss) with and without the proposed method. The results demonstrate that the proposed approach maintains stable velocity tracking even under motor failures.
Figure 5: Validation of stair descent under motor failure conditions. TOLEBI enables the humanoid robot TOCABI to successfully perform stair descent in both MuJoCo simulation and real-world experiments.

TOLEBI: Learning Fault-Tolerant Bipedal Locomotion via Online Status Estimation and Fallibility Rewards

TL;DR

Abstract

TOLEBI: Learning Fault-Tolerant Bipedal Locomotion via Online Status Estimation and Fallibility Rewards

Authors

TL;DR

Abstract

Table of Contents

Figures (5)