Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey

Lingfan Bao; Joseph Humphreys; Tianhu Peng; Chengxu Zhou

Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey

Lingfan Bao, Joseph Humphreys, Tianhu Peng, Chengxu Zhou

TL;DR

This survey analyzes DRL-based frameworks for bipedal locomotion, comparing end-to-end and hierarchical control to identify fragmentation and the lack of a unified framework. It details end-to-end approaches, split into reference-based and reference-free paradigms, and three hierarchical schemes (deep planning hybrid, feedback DRL control hybrid, and learned hierarchy), highlighting sim-to-real transfer and safety concerns. The authors pinpoint core challenges—generalisation versus precision, the sim-to-real gap, and safety—and propose future directions, including multi-skill learning, perception-conditioned control, motion retargeting, and the use of foundation models. They introduce two conceptual blueprints, Bipedal Foundation Models (BFMs) and Multi-Layer Adaptive Models (MLAMs), as potential pathways toward a generalist, unified locomotion framework with broad real-world impact.

Abstract

Bipedal robots are gaining global recognition due to their potential applications and advancements in artificial intelligence, particularly through Deep Reinforcement Learning (DRL). While DRL has significantly advanced bipedal locomotion, the development of a unified framework capable of handling a wide range of tasks remains an ongoing challenge. This survey systematically categorises, compares, and analyses existing DRL frameworks for bipedal locomotion, organising them into end-to-end and hierarchical control schemes. End-to-end frameworks are evaluated based on their learning approaches, while hierarchical frameworks are examined in terms of layered structures that integrate learning-based or traditional model-based methods. We provide a detailed evaluation of the composition, strengths, limitations, and capabilities of each framework. Additionally, this survey identifies key research gaps and proposes future directions aimed at creating a more integrated and efficient framework for bipedal locomotion, with wide-ranging applications in real-world environments.

Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey

TL;DR

Abstract

Paper Structure (33 sections, 5 figures, 2 tables)

This paper contains 33 sections, 5 figures, 2 tables.

Introduction
End-to-end framework
Reference-based learning
Residual learning
Guided learning
Reference-free learning
Hierarchy framework
Deep planning hybrid scheme
Feedback DRL control hybrid scheme
Learned hierarchy framework
Limitations and Challenges
Generalisation and precision
Challenges in transferring from simulation to reality
Safety-critical locomotion
Future Directions and Opportunities
...and 18 more sections

Figures (5)

Figure 1: Representative bipedal and humanoid robots illustrating the diversity of platforms for locomotion research and development. (a) Cassie: a torque-controlled bipedal robot designed for agile locomotion. (b) Digit: a full-sized humanoid robot evolved from Cassie and actuated by torque control. (c) H1: a full-size, electric, torque-controlled humanoid robot developed by Unitree Robotics. (d) G1: a compact humanoid robot from Unitree featuring lightweight design and high joint backdrivability. (e) Atlas: a fully electric humanoid robot developed by Boston Dynamics.
Figure 2: Classification of DRL-based control schemes. The approaches are broadly categorised into two main paradigms: end-to-end frameworks, which learn a single policy from sensory inputs to motor commands; and hierarchical frameworks, which decompose the control problem into multiple levels. Within the end-to-end paradigm, a key distinction is drawn between reference-free learning (learning from scratch) and reference-based learning (tracking a predefined motion). Hierarchical structures include hybrid control schemes, which synergistically combine learned components with traditional model-based controllers.
Figure 3: Hierarchical control scheme diagram. This figure illustrates a hierarchical control framework for a bipedal robot, comprising a basic scheme and three variations. (1) Basic scheme: The framework begins with a task command, followed by an HL planner and a LL controller, which ultimately drives the robot. Each module can be replaced with a learned policy, introducing adaptability across different control layers. (2) Variations (from left to right): (a) a deep planning hybrid scheme, in which the HL planner is learned; (b) a feedback DRL control hybrid scheme, with a learned LL controller; and (c) a learned hierarchical control scheme, where both layers are learned.
Figure 4: Towards a Unified Framework: This figure illustrates the logical progression from current DRL frameworks to future unified systems. It identifies the current limitations of existing end-to-end and hierarchical approaches, which motivate the exploration of specific Future Pathways. These pathways inform the design of two proposed conceptual models (i) Multi-Layered Adaptive Model (MLAM) and (ii) Bipedal Foundation Model (BFM) which represent potential blueprints for achieving a generalist, unified framework.
Figure 5: Diagram for RL algorithms catalogue

Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey

TL;DR

Abstract

Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (5)