TOLEBI: Learning Fault-Tolerant Bipedal Locomotion via Online Status Estimation and Fallibility Rewards
Hokyun Lee, Woo-Jeong Baek, Junhyeok Cha, Jaeheung Park
TL;DR
TOLEBI presents the first learning-based fault-tolerant framework for bipedal locomotion by integrating online joint-status estimation, fault-injected training, curriculum learning, and phase-modulation to adapt gait timing under faults. The method achieves robust sim-to-real transfer through domain and dynamics randomization and demonstrates practical fault tolerance on the TOCABI humanoid during flat-ground walking and stair descent. Key contributions include a GRU-based online fault estimator, a fallibility reward that preserves nominal gait while enabling fault resilience, and an effective training curriculum that bridges simulation to real-world deployment. The results show substantial performance gains over baselines across joint-locking and power-loss fault scenarios, highlighting TOLEBI's potential for resilient humanoid locomotion in dynamic, real-world environments.
Abstract
With the growing employment of learning algorithms in robotic applications, research on reinforcement learning for bipedal locomotion has become a central topic for humanoid robotics. While recently published contributions achieve high success rates in locomotion tasks, scarce attention has been devoted to the development of methods that enable to handle hardware faults that may occur during the locomotion process. However, in real-world settings, environmental disturbances or sudden occurrences of hardware faults might yield severe consequences. To address these issues, this paper presents TOLEBI (A faulT-tOlerant Learning framEwork for Bipedal locomotIon) that handles faults on the robot during operation. Specifically, joint locking, power loss and external disturbances are injected in simulation to learn fault-tolerant locomotion strategies. In addition to transferring the learned policy to the real robot via sim-to-real transfer, an online joint status module incorporated. This module enables to classify joint conditions by referring to the actual observations at runtime under real-world conditions. The validation experiments conducted both in real-world and simulation with the humanoid robot TOCABI highlight the applicability of the proposed approach. To our knowledge, this manuscript provides the first learning-based fault-tolerant framework for bipedal locomotion, thereby fostering the development of efficient learning methods in this field.
