Table of Contents
Fetching ...

A Central Motor System Inspired Pre-training Reinforcement Learning for Robotic Control

Pei Zhang, Zhaobo Hua, Jinliang Ding

TL;DR

A novel PRL algorithm based on the central motor system mechanisms, which can discover diverse and dynamic skills without relying on data and expert knowledge, effectively enabling robots to tackle different types of downstream tasks is proposed.

Abstract

The development of intelligent robots requires control policies that can handle dynamic environments and evolving tasks. Pre-training reinforcement learning has emerged as an effective approach to address these demands by enabling robots to acquire reusable motor skills. However, they often rely on large datasets or expert-designed goal spaces, limiting adaptability. Additionally, these methods need help to generate dynamic and diverse skills in high-dimensional state spaces, reducing their effectiveness for downstream tasks. In this paper, we propose CMS-PRL, a pre-training reinforcement learning method inspired by the Central Motor System (CMS). First, we introduce a fusion reward mechanism that combines the basic motor reward with mutual information reward, promoting the discovery of dynamic skills during pre-training without reliance on external data. Second, we design a skill encoding method inspired by the motor program of the basal ganglia, providing rich and continuous skill instructions during pre-training. Finally, we propose a skill activity function to regulate motor skill activity, enabling the generation of skills with different activity levels, thereby enhancing the robot's flexibility in downstream tasks. We evaluate the model on four types of robots in a challenging set of sparse-reward tasks. Experimental results demonstrate that CMS-PRL generates diverse, reusable motor skills to solve various downstream tasks and outperforms baseline methods, particularly in high-degree-of-freedom robots and complex tasks.

A Central Motor System Inspired Pre-training Reinforcement Learning for Robotic Control

TL;DR

A novel PRL algorithm based on the central motor system mechanisms, which can discover diverse and dynamic skills without relying on data and expert knowledge, effectively enabling robots to tackle different types of downstream tasks is proposed.

Abstract

The development of intelligent robots requires control policies that can handle dynamic environments and evolving tasks. Pre-training reinforcement learning has emerged as an effective approach to address these demands by enabling robots to acquire reusable motor skills. However, they often rely on large datasets or expert-designed goal spaces, limiting adaptability. Additionally, these methods need help to generate dynamic and diverse skills in high-dimensional state spaces, reducing their effectiveness for downstream tasks. In this paper, we propose CMS-PRL, a pre-training reinforcement learning method inspired by the Central Motor System (CMS). First, we introduce a fusion reward mechanism that combines the basic motor reward with mutual information reward, promoting the discovery of dynamic skills during pre-training without reliance on external data. Second, we design a skill encoding method inspired by the motor program of the basal ganglia, providing rich and continuous skill instructions during pre-training. Finally, we propose a skill activity function to regulate motor skill activity, enabling the generation of skills with different activity levels, thereby enhancing the robot's flexibility in downstream tasks. We evaluate the model on four types of robots in a challenging set of sparse-reward tasks. Experimental results demonstrate that CMS-PRL generates diverse, reusable motor skills to solve various downstream tasks and outperforms baseline methods, particularly in high-degree-of-freedom robots and complex tasks.
Paper Structure (20 sections, 24 equations, 9 figures, 5 tables, 2 algorithms)

This paper contains 20 sections, 24 equations, 9 figures, 5 tables, 2 algorithms.

Figures (9)

  • Figure 1: Overview of the proposed framework architecture. Panel (a) on the left illustrates the structure of the Central Motor System (CMS), utilizing three colors to represent its three hierarchical control centers. The three gray dashed boxes, labeled sequentially, correspond to the functions of the cerebellum, the early-stage skill learning roles of both the cerebellum and basal ganglia, and the basal ganglia's regulation of voluntary movement. Panel (b) on the right presents the architecture of the proposed CMS-PRL framework, aligned with the three hierarchical levels depicted in panel (a). The dashed boxes indicate components inspired by the cerebellum (motor primitive model's fusion reward function), skill learning (pre-training), and the basal ganglia's regulation of movement (skill activity function).
  • Figure 2: Learning process of the CMS-PRL algorithm. The learning process consists of two phases: In the pre-training phase, a motor primitive module is developed to receive signals from the high-level controller and adjust the robot's movement intentions and posture. During the task training phase, a high-level controller is trained to process external environmental information and generate task-specific skill signals.
  • Figure 3: Schematic of the basal ganglia's internal structure and the skill activation function. Different neurotransmitters within the basal ganglia are represented by three distinct line patterns, with the direct and indirect pathways indicated in red and green, respectively. The skill activation function is depicted as a curve that varies with discrete skills, where different colors and line styles represent the activation function under both healthy and abnormal basal ganglia parameters.
  • Figure 4: Different types of robots in simulation environments. (a) Cheetah. (b) Walker. (c) Quadruped. (d) Humanoid.
  • Figure 5: Different task environments. (a) Gaps: The robot needs to constantly move forward and cross the magma, but once the robot comes into contact with the magma or falls, the task ends. (b) Stairs: The robot needs to cross the steps of ascent and descent. (c) Hurdles: The robot needs to adjust its posture to cross the hurdles. (d) Limbos: The robot needs to lower its center of gravity during movement to pass through the interceptor. (e) V-track: The robot needs to track four random expected speeds and maintain its own speed within a specified deviation range from the expected speed. (f) Goals: The robot needs to touch more target points within a limited time.
  • ...and 4 more figures