Learning-based Hierarchical Control: Emulating the Central Nervous System for Bio-Inspired Legged Robot Locomotion

Ge Sun; Milad Shafiee; Peizhuo Li; Guillaume Bellegarda; Auke Ijspeert; Guillaume Sartoretti

Learning-based Hierarchical Control: Emulating the Central Nervous System for Bio-Inspired Legged Robot Locomotion

Ge Sun, Milad Shafiee, Peizhuo Li, Guillaume Bellegarda, Auke Ijspeert, Guillaume Sartoretti

TL;DR

The paper presents a two-network hierarchical controller for legged locomotion, featuring a spinal policy that generates basic rhythmic patterns via CPGs and a descending modulation policy that adapts these rhythms to terrain using sensory feedback. Trained with PPO in two phases, the approach demonstrates that flat-terrain locomotion is primarily governed by the spinal policy, while rough-terrain navigation requires the descending policy to perform discrete adjustments such as stairs, jumps, and gap-crossing. Sensorimotor-delay analyses reveal a division of labor: the spinal policy is more robust to delays, whereas the descending policy relies on timely sensory feedback for terrain-specific maneuvers. The framework supports biological hypotheses about spinal-supraspinal interactions and offers a robust platform for bio-inspired locomotion controllers with potential hardware deployment.

Abstract

Animals possess a remarkable ability to navigate challenging terrains, achieved through the interplay of various pathways between the brain, central pattern generators (CPGs) in the spinal cord, and musculoskeletal system. Traditional bioinspired control frameworks often rely on a singular control policy that models both higher (supraspinal) and spinal cord functions. In this work, we build upon our previous research by introducing two distinct neural networks: one tasked with modulating the frequency and amplitude of CPGs to generate the basic locomotor rhythm (referred to as the spinal policy, SCP), and the other responsible for receiving environmental perception data and directly modulating the rhythmic output from the SCP to execute precise movements on challenging terrains (referred to as the descending modulation policy). This division of labor more closely mimics the hierarchical locomotor control systems observed in legged animals, thereby enhancing the robot's ability to navigate various uneven surfaces, including steps, high obstacles, and terrains with gaps. Additionally, we investigate the impact of sensorimotor delays within our framework, validating several biological assumptions about animal locomotion systems. Specifically, we demonstrate that spinal circuits play a crucial role in generating the basic locomotor rhythm, while descending pathways are essential for enabling appropriate gait modifications to accommodate uneven terrain. Notably, our findings also reveal that the multi-layered control inherent in animals exhibits remarkable robustness against time delays. Through these investigations, this paper contributes to a deeper understanding of the fundamental principles of interplay between spinal and supraspinal mechanisms in biological locomotion. It also supports the development of locomotion controllers in parallel to biological structures which are ...

Learning-based Hierarchical Control: Emulating the Central Nervous System for Bio-Inspired Legged Robot Locomotion

TL;DR

Abstract

Paper Structure (15 sections, 2 equations, 5 figures, 1 table)

This paper contains 15 sections, 2 equations, 5 figures, 1 table.

Introduction
Background
LEARNING FRAMEWORK
Spinal Policy
Action Space
Observation Space
Descending Modulation Policy
Action Space
Observation Space
Training Details
Experiments and discussion
Flat Terrain Locomotion
Rough Terrain Locomotion
Sensorimotor Delay
Conclusion

Figures (5)

Figure 1: Learning-based hierarchical control framework that emulates the mechanisms of the locomotor neural circuits in legged mammals.
Figure 2: The control diagram of our hierarchical control framework replicates the mechanisms found within the locomotor neural circuits of legged mammals. The spinal network, emulating the spinal cord, generates basic rhythmic gait patterns through CPGs based on internal states. The descending modulation network, representing high-level brain functions, produces signals (offset components) that refine these rhythmic movements in response to internal and environmental information (terrain height map), thereby enabling the robot to navigate complex terrains.
Figure 3: Flat Terrain Locomotion Experiment: Initially, the robot's movement on flat terrain is governed by both the spinal policy and the descending modulation policy. The descending modulation policy is deactivated at t=5s and reactivated at t=10s. The top figure illustrates the robot's tracking velocity across varying commanded velocities. Meanwhile, the middle section compares the output positions from the spinal policy and the descending modulation policy to the robot's actual executed position (front right leg) in the Z dimension, with a commanded speed of 1.0 m/s. The bottom section presents this comparison in the X dimension. This indicates that the robot's movement on flat terrain is primarily controlled by the spinal policy.
Figure 4: Rough Terrain Locomotion Experiment: The robot is commanded to move at a speed of 0.9 m/s to travel through the environment. The duration for traversing each terrain type is indicated by different colors in the plot: yellow for upstairs, red for downstairs, blue for high obstacles, and purple for gaps. The plot illustrates the output positions from the spinal policy (orange line) and descending modulation policy (blue line), compared to the robot's actual executed position (green dotted line) in both the Z and X dimensions for the front right leg (FR) and the rear left leg (RL). We observe that flat terrain locomotion is mainly governed by the spinal policy, whereas traversal of rough terrains relies mainly on the descending modulation policy.
Figure 5: Success rates for sensorimotor delay experiments: The robot is commanded to travel at 0.9 m/s across six types of terrain—flat, uneven, upstairs, downstairs, high obstacles, and gaps. Each terrain tests the impact of varying sensory delays on two control policies: the spinal policy and the descending modulation policy. Each experiment involves five trials, activating both policies simultaneously, with the robot initiating each trial from distinct starting positions (for example, a success rate of 0.6 indicates three successes and two failures). It's important to note that both policies tested were trained without sensorimotor delays. Additionally, the figure displays the delay time and its corresponding percentage of a single gait cycle.

Learning-based Hierarchical Control: Emulating the Central Nervous System for Bio-Inspired Legged Robot Locomotion

TL;DR

Abstract

Learning-based Hierarchical Control: Emulating the Central Nervous System for Bio-Inspired Legged Robot Locomotion

Authors

TL;DR

Abstract

Table of Contents

Figures (5)