Hierarchical learning control for autonomous robots inspired by central nervous system

Pei Zhang; Zhaobo Hua; Jinliang Ding

Hierarchical learning control for autonomous robots inspired by central nervous system

Pei Zhang, Zhaobo Hua, Jinliang Ding

TL;DR

The paper tackles robust autonomous locomotion in heterogeneous, partially observed environments by proposing a central nervous system inspired hierarchical learning framework. It combines a low-level passive CPG module with two active levels: a mid-level skill controller for learning diverse, reusable motions and a high-level controller for rapid multi-task decisions, connected via dual descending pathways. Key contributions include a CPG with independent phases, unsupervised mid-level skill pre-training, and a two-stage high-level learning process with distillation for image-based deployment, all validated on a hexapod PHAGE with demonstrations of obstacle crossing, fault tolerance, and unknown environment adaptation. This hierarchical semi-active design improves robustness, generalization, and data efficiency while reducing dependence on sensing, with clear pathways for further enhancements such as local reflexes and integration with large cognitive models.

Abstract

Mammals can generate autonomous behaviors in various complex environments through the coordination and interaction of activities at different levels of their central nervous system. In this paper, we propose a novel hierarchical learning control framework by mimicking the hierarchical structure of the central nervous system along with their coordination and interaction behaviors. The framework combines the active and passive control systems to improve both the flexibility and reliability of the control system as well as to achieve more diverse autonomous behaviors of robots. Specifically, the framework has a backbone of independent neural network controllers at different levels and takes a three-level dual descending pathway structure, inspired from the functionality of the cerebral cortex, cerebellum, and spinal cord. We comprehensively validated the proposed approach through the simulation as well as the experiment of a hexapod robot in various complex environments, including obstacle crossing and rapid recovery after partial damage. This study reveals the principle that governs the autonomous behavior in the central nervous system and demonstrates the effectiveness of the hierarchical control approach with the salient features of the hierarchical learning control architecture and combination of active and passive control systems.

Hierarchical learning control for autonomous robots inspired by central nervous system

TL;DR

Abstract

Paper Structure (15 sections, 20 equations, 6 figures, 1 table)

This paper contains 15 sections, 20 equations, 6 figures, 1 table.

Results
Central nervous system inspired hierarchical learning control framework
Results of the CPG module's motion generation
Results of the mid-level controller's skill learning and control
Results of the high-level controller's multi-task learning and decision
Results of rapid recovery of movement in case of limb damage
Results of unknown environment adaption
Discussion
Methods
Half-center rhythm generator layer
Pattern formation layer
Skill learning of the mid-level controller
Multi-task reinforcement learning of the high-level controller
Distillation learning of the high-level controller
Acknowledgements

Figures (6)

Figure 1: Overview of the central nervous system and hierarchical learning control framework.a, Mammalian central nervous system structure, the figure contains the cerebral cortex partition, the spinal cord's internal structure, and the double-layer structure of CPGs neural circuits. b, Schematic diagram of the proposed hierarchical control framework. The gray nodes in a and the gray box in b represent the sensing mechanism in the nervous system and control framework, respectively, and are responsible for the acquisition of sensing signals. In the nervous system, S1 and the visual cortex are mainly responsible. In the control frame, it is provided by sensor measurement. The green nodes and boxes in a and b represent the high-level institutions in the nervous system and control framework, respectively, responsible for observing the environment and making decisions. In the nervous system, most cortical regions are responsible for this function. In the control framework, this part is realized by the deep reinforcement learning neural network policy. The yellow nodes and boxes in a and b represent the mid-level institutions responsible for coordinating the limbs and generating various motion patterns. In the nervous system, the cerebellum and primary motor cortex are responsible. In the control framework, this part adopts an unsupervised reinforcement learning algorithm and skill-driven neural network. The purple nodes and boxes in a and b represent the low-level institutions that are responsible for the generation and execution of motion signals. In the nervous system, the brain stem and spinal cord are responsible. In the control framework, it is realized by the CPG module, which contains an oscillator and a desired pose solver to provide the desired joint position and uses the built-in PID feedback loop of the robot to control 18 motors. The solid line in a connects different nerve regions, representing the information flow relationship, and the thin purple solid line on the right represents the ascending and descending spinal nerves. Dotted lines indicate descending pathway feedback of the CPGs. The solid line in b represents the action relationship between the sensor and the control signal, and the black dotted line connects the specific analysis of each module. c, Four different indoor obstacle terrain crossing tasks. d, Various new obstacle terrain crossing tasks have never been learned.
Figure 2: Motion generation effect of the CPG module.a, the classic tripod gait of insects. There are two power strokes in a cycle. Each time, three legs stand on the ground (standing phase, orange legs, black circle), and the other three suspend in the air (swing phase, gray legs, gray circle). The arrow indicates the direction of movement. b, Gait diagram of ideal tripod gait, standing (dark) and swinging (white) stages of each leg, and stroke number. c, The robot moves forward or backward under the action of the CPG module embedded in the tripod gait phase. d, The robot can present different shapes by adjusting the morphological parameters of the CPG module through the lateral loop. e, Through the $\bm{\mu},\bm{\omega}$ feedback of the ventromedial loop, The rhythm signals of the RG layer can be adjusted, and the PF layer can convert them into the poses of the end of legs under Cartesian coordinates to generate smooth joint signals.
Figure 3: Skill learning and regulation effects of the mid-level controller.a, Using unsupervised reinforcement learning to learn the mid-level controller. b, Under the action of different morphological parameters, different skill vectors make the robot generate various XoY plane motion trajectories. c, Under the action of one group of morphological parameters, the four skill vectors with a length of 1 form different gaits; dark color is the stance phase, and white is the swing phase.
Figure 4: Learning and decision results of the high-level controller.a, Multi-task reinforcement learning and distillation learning processes. b, Schematic diagram of four simulation task environments. The robot acquires the height field information around the body through sensors. In an obstacle environment, target points will be randomly generated within a certain range to provide the robot with target heading directions. c, The success rates of different methods in different difficult tasks. Each baseline test uses 100 parallel robots to run for 60 seconds, and the average of 5 experiments is taken as the final success rate. The asterisk indicates recent work bib22. d, Curve of morphological parameters (normalized result) under the action of the high-level policy while passing through the alley.
Figure 5: The control effect of the robot in the case of a broken limb.a, When the front, middle, and hind legs of the robot are broken respectively, the robot can use the rest of the legs to cross the gap (the broken legs are marked in red, and we set the feedback and control signals of all joints of the leg to a fixed value to simulate the fracture). b, Gait map of the healthy and left middle leg broken robot crossing the gap within 30 seconds. c, Changes of the X-coordinate at the end of each leg during process b.
...and 1 more figures

Hierarchical learning control for autonomous robots inspired by central nervous system

TL;DR

Abstract

Hierarchical learning control for autonomous robots inspired by central nervous system

Authors

TL;DR

Abstract

Table of Contents

Figures (6)