Table of Contents
Fetching ...

Learning to Recover: Dynamic Reward Shaping with Wheel-Leg Coordination for Fallen Robots

Boyuan Deng, Luca Rossini, Jin Wang, Weijie Wang, Dimitrios Kanoulas, Nikolaos Tsagarakis

TL;DR

Robust post-fall recovery in wheeled-legged robots is challenging due to discontinuous contacts and terrain variability. This work introduces Episode-based Dynamic Reward Shaping (ED) with curriculum learning inside an asymmetric PPO framework that leverages privileged critic information, along with explicit wheel–leg coordination to accelerate training and enhance robustness. ED enables broad exploration in early training and precise posture refinement later, achieving recovery success rates above $97\%$ across KYON and Unitree platforms and reducing joint torques by approximately $15$–$26\%$ due to energy transfer from wheel rolling, with successful sim-to-real transfer. The results demonstrate effective cross-platform generalization and practical potential for autonomous post-fall recovery in diverse environments.

Abstract

Adaptive recovery from fall incidents are essential skills for the practical deployment of wheeled-legged robots, which uniquely combine the agility of legs with the speed of wheels for rapid recovery. However, traditional methods relying on preplanned recovery motions, simplified dynamics or sparse rewards often fail to produce robust recovery policies. This paper presents a learning-based framework integrating Episode-based Dynamic Reward Shaping and curriculum learning, which dynamically balances exploration of diverse recovery maneuvers with precise posture refinement. An asymmetric actor-critic architecture accelerates training by leveraging privileged information in simulation, while noise-injected observations enhance robustness against uncertainties. We further demonstrate that synergistic wheel-leg coordination reduces joint torque consumption by 15.8% and 26.2% and improves stabilization through energy transfer mechanisms. Extensive evaluations on two distinct quadruped platforms achieve recovery success rates up to 99.1% and 97.8% without platform-specific tuning. The supplementary material is available at https://boyuandeng.github.io/L2R-WheelLegCoordination/

Learning to Recover: Dynamic Reward Shaping with Wheel-Leg Coordination for Fallen Robots

TL;DR

Robust post-fall recovery in wheeled-legged robots is challenging due to discontinuous contacts and terrain variability. This work introduces Episode-based Dynamic Reward Shaping (ED) with curriculum learning inside an asymmetric PPO framework that leverages privileged critic information, along with explicit wheel–leg coordination to accelerate training and enhance robustness. ED enables broad exploration in early training and precise posture refinement later, achieving recovery success rates above across KYON and Unitree platforms and reducing joint torques by approximately due to energy transfer from wheel rolling, with successful sim-to-real transfer. The results demonstrate effective cross-platform generalization and practical potential for autonomous post-fall recovery in diverse environments.

Abstract

Adaptive recovery from fall incidents are essential skills for the practical deployment of wheeled-legged robots, which uniquely combine the agility of legs with the speed of wheels for rapid recovery. However, traditional methods relying on preplanned recovery motions, simplified dynamics or sparse rewards often fail to produce robust recovery policies. This paper presents a learning-based framework integrating Episode-based Dynamic Reward Shaping and curriculum learning, which dynamically balances exploration of diverse recovery maneuvers with precise posture refinement. An asymmetric actor-critic architecture accelerates training by leveraging privileged information in simulation, while noise-injected observations enhance robustness against uncertainties. We further demonstrate that synergistic wheel-leg coordination reduces joint torque consumption by 15.8% and 26.2% and improves stabilization through energy transfer mechanisms. Extensive evaluations on two distinct quadruped platforms achieve recovery success rates up to 99.1% and 97.8% without platform-specific tuning. The supplementary material is available at https://boyuandeng.github.io/L2R-WheelLegCoordination/

Paper Structure

This paper contains 17 sections, 4 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Wheel and joint coordination for joint reset and Center-of-Mass adjustment during the recovery process.
  • Figure 2: (a)-(c) correspond to the initial states of different episodes. KYON Legged-wheeled Robot model: The presented model corresponds to a new robot under development.
  • Figure 3: Asymmetric PPO-Based Reinforcement Learning Training Framework
  • Figure 4: Recovery processes under the same initial posture using two different strategies. (a)–(d) shows the ED strategy successfully recovery. (e)–(h) illustrates the baseline strategy, which adjusts joint positions and base height but fails to optimize base orientation, becoming trapped in a local optimum and not fully recovering. Full videos could be found on the webpage.
  • Figure 5: PCA-based comparison of single-episode action distributions for ED-policy and baseline-policy across 2048 environments.
  • ...and 8 more figures