Table of Contents
Fetching ...

FRASA: An End-to-End Reinforcement Learning Agent for Fall Recovery and Stand Up of Humanoid Robots

Clément Gaspard, Marc Duclusaud, Grégoire Passault, Mélodie Daniel, Olivier Ly

TL;DR

FRASA presents a unified deep reinforcement learning agent for simultaneous fall recovery and stand up in humanoid robots, addressing the limitations of MPC and traditional key-framed approaches. The method uses CrossQ-enabled end-to-end learning in a symmetry-exploiting, sim-to-real framework with extensive domain randomization, achieving rapid training and robust performance on Sigmaban. Empirical results show FRASA outperforming a RoboCup Rhoban KFB baseline in both stand-up speed and disturbance rejection, while maintaining safe, adaptable behaviors. The work demonstrates practical impact by delivering a fast, transferable recovery-and-stand-up policy that reduces reliance on expert tuning and enables more resilient humanoid locomotion.

Abstract

Humanoid robotics faces significant challenges in achieving stable locomotion and recovering from falls in dynamic environments. Traditional methods, such as Model Predictive Control (MPC) and Key Frame Based (KFB) routines, either require extensive fine-tuning or lack real-time adaptability. This paper introduces FRASA, a Deep Reinforcement Learning (DRL) agent that integrates fall recovery and stand up strategies into a unified framework. Leveraging the Cross-Q algorithm, FRASA significantly reduces training time and offers a versatile recovery strategy that adapts to unpredictable disturbances. Comparative tests on Sigmaban humanoid robots demonstrate FRASA superior performance against the KFB method deployed in the RoboCup 2023 by the Rhoban Team, world champion of the KidSize League.

FRASA: An End-to-End Reinforcement Learning Agent for Fall Recovery and Stand Up of Humanoid Robots

TL;DR

FRASA presents a unified deep reinforcement learning agent for simultaneous fall recovery and stand up in humanoid robots, addressing the limitations of MPC and traditional key-framed approaches. The method uses CrossQ-enabled end-to-end learning in a symmetry-exploiting, sim-to-real framework with extensive domain randomization, achieving rapid training and robust performance on Sigmaban. Empirical results show FRASA outperforming a RoboCup Rhoban KFB baseline in both stand-up speed and disturbance rejection, while maintaining safe, adaptable behaviors. The work demonstrates practical impact by delivering a fast, transferable recovery-and-stand-up policy that reduces reliance on expert tuning and enables more resilient humanoid locomotion.

Abstract

Humanoid robotics faces significant challenges in achieving stable locomotion and recovering from falls in dynamic environments. Traditional methods, such as Model Predictive Control (MPC) and Key Frame Based (KFB) routines, either require extensive fine-tuning or lack real-time adaptability. This paper introduces FRASA, a Deep Reinforcement Learning (DRL) agent that integrates fall recovery and stand up strategies into a unified framework. Leveraging the Cross-Q algorithm, FRASA significantly reduces training time and offers a versatile recovery strategy that adapts to unpredictable disturbances. Comparative tests on Sigmaban humanoid robots demonstrate FRASA superior performance against the KFB method deployed in the RoboCup 2023 by the Rhoban Team, world champion of the KidSize League.

Paper Structure

This paper contains 23 sections, 4 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: FRASA adaptative response to a backwards and a frontwards disturbances on the Sigmaban platform. The recovery behavior using the arms accelerates the return to a stable position while minimizing the risk of damage.
  • Figure 2: Left: target posture for the recovery movement. Right: pose vector $\psi_{target}$ components, including the trunk pitch $\theta$ and the 5 explicited DoFs.
  • Figure 3: Approximation of the robot using primitive shapes dedicated to simulating collisions
  • Figure 4: Reward variability observed during the training of 40 distinct FRASA agents, each trained over 575,000 steps within 37 minutes using the CrossQ+SAC algorithm.
  • Figure 5: Experimental setup for inducing repeatable disturbances
  • ...and 1 more figures