Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey
Lingfan Bao, Joseph Humphreys, Tianhu Peng, Chengxu Zhou
TL;DR
This survey analyzes DRL-based frameworks for bipedal locomotion, comparing end-to-end and hierarchical control to identify fragmentation and the lack of a unified framework. It details end-to-end approaches, split into reference-based and reference-free paradigms, and three hierarchical schemes (deep planning hybrid, feedback DRL control hybrid, and learned hierarchy), highlighting sim-to-real transfer and safety concerns. The authors pinpoint core challenges—generalisation versus precision, the sim-to-real gap, and safety—and propose future directions, including multi-skill learning, perception-conditioned control, motion retargeting, and the use of foundation models. They introduce two conceptual blueprints, Bipedal Foundation Models (BFMs) and Multi-Layer Adaptive Models (MLAMs), as potential pathways toward a generalist, unified locomotion framework with broad real-world impact.
Abstract
Bipedal robots are gaining global recognition due to their potential applications and advancements in artificial intelligence, particularly through Deep Reinforcement Learning (DRL). While DRL has significantly advanced bipedal locomotion, the development of a unified framework capable of handling a wide range of tasks remains an ongoing challenge. This survey systematically categorises, compares, and analyses existing DRL frameworks for bipedal locomotion, organising them into end-to-end and hierarchical control schemes. End-to-end frameworks are evaluated based on their learning approaches, while hierarchical frameworks are examined in terms of layered structures that integrate learning-based or traditional model-based methods. We provide a detailed evaluation of the composition, strengths, limitations, and capabilities of each framework. Additionally, this survey identifies key research gaps and proposes future directions aimed at creating a more integrated and efficient framework for bipedal locomotion, with wide-ranging applications in real-world environments.
