Learning to Recover: Dynamic Reward Shaping with Wheel-Leg Coordination for Fallen Robots
Boyuan Deng, Luca Rossini, Jin Wang, Weijie Wang, Dimitrios Kanoulas, Nikolaos Tsagarakis
TL;DR
Robust post-fall recovery in wheeled-legged robots is challenging due to discontinuous contacts and terrain variability. This work introduces Episode-based Dynamic Reward Shaping (ED) with curriculum learning inside an asymmetric PPO framework that leverages privileged critic information, along with explicit wheel–leg coordination to accelerate training and enhance robustness. ED enables broad exploration in early training and precise posture refinement later, achieving recovery success rates above $97\%$ across KYON and Unitree platforms and reducing joint torques by approximately $15$–$26\%$ due to energy transfer from wheel rolling, with successful sim-to-real transfer. The results demonstrate effective cross-platform generalization and practical potential for autonomous post-fall recovery in diverse environments.
Abstract
Adaptive recovery from fall incidents are essential skills for the practical deployment of wheeled-legged robots, which uniquely combine the agility of legs with the speed of wheels for rapid recovery. However, traditional methods relying on preplanned recovery motions, simplified dynamics or sparse rewards often fail to produce robust recovery policies. This paper presents a learning-based framework integrating Episode-based Dynamic Reward Shaping and curriculum learning, which dynamically balances exploration of diverse recovery maneuvers with precise posture refinement. An asymmetric actor-critic architecture accelerates training by leveraging privileged information in simulation, while noise-injected observations enhance robustness against uncertainties. We further demonstrate that synergistic wheel-leg coordination reduces joint torque consumption by 15.8% and 26.2% and improves stabilization through energy transfer mechanisms. Extensive evaluations on two distinct quadruped platforms achieve recovery success rates up to 99.1% and 97.8% without platform-specific tuning. The supplementary material is available at https://boyuandeng.github.io/L2R-WheelLegCoordination/
