Table of Contents
Fetching ...

Fully Spiking Neural Network for Legged Robots

Xiaoyang Jiang, Qiang Zhang, Jingkai Sun, Jiahang Cao, Jingtong Ma, Renjing Xu

TL;DR

This study presents a novel Spiking Neural Network (SNN) for legged robots that can be seamlessly integrated into other learning models, showing exceptional performance in various simulated terrains.

Abstract

Recent advancements in legged robots using deep reinforcement learning have led to significant progress. Quadruped robots can perform complex tasks in challenging environments, while bipedal and humanoid robots have also achieved breakthroughs. Current reinforcement learning methods leverage diverse robot bodies and historical information to perform actions, but previous research has not emphasized the speed and energy consumption of network inference and the biological significance of neural networks. Most networks are traditional artificial neural networks that utilize multilayer perceptrons (MLP). This paper presents a novel Spiking Neural Network (SNN) for legged robots, showing exceptional performance in various simulated terrains. SNNs provide natural advantages in inference speed and energy consumption, and their pulse-form processing enhances biological interpretability. This study presents a highly efficient SNN for legged robots that can be seamless integrated into other learning models.

Fully Spiking Neural Network for Legged Robots

TL;DR

This study presents a novel Spiking Neural Network (SNN) for legged robots that can be seamlessly integrated into other learning models, showing exceptional performance in various simulated terrains.

Abstract

Recent advancements in legged robots using deep reinforcement learning have led to significant progress. Quadruped robots can perform complex tasks in challenging environments, while bipedal and humanoid robots have also achieved breakthroughs. Current reinforcement learning methods leverage diverse robot bodies and historical information to perform actions, but previous research has not emphasized the speed and energy consumption of network inference and the biological significance of neural networks. Most networks are traditional artificial neural networks that utilize multilayer perceptrons (MLP). This paper presents a novel Spiking Neural Network (SNN) for legged robots, showing exceptional performance in various simulated terrains. SNNs provide natural advantages in inference speed and energy consumption, and their pulse-form processing enhances biological interpretability. This study presents a highly efficient SNN for legged robots that can be seamless integrated into other learning models.
Paper Structure (17 sections, 7 equations, 9 figures, 1 table)

This paper contains 17 sections, 7 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Whole-body control on various types of robots through our spike-based approach. This innovative methodology allows us to effectively regulate and coordinate the robots' movements, enhancing their overall performance and versatility. Left: A1 Middle: Cassie Right: MIT Humanoid
  • Figure 2: The observations are initially encoded by the encoder as $n$ independent distributions that are uniformly distributed over the observation range. After encoding, the population processes the distributions, resulting in spike generation. The neurons in the input populations encode each observation dimension and drive a multi-layered, fully connected SNN. During forward timesteps in PopSAN, the activities of each output population are decoded to determine the corresponding action dimension. The neural network receives observations, processes them using the SNN, and decodes the resulting activities to determine the appropriate action for the specific situation.
  • Figure 3: RMA consists of two subsystems: the base policy $\pi$ and the adaptation module $\phi$. The RMA training consists of two phases. Training the Base Policy (Phase 1): In the initial phase, the base policy $\pi$ is trained using PopSAN. The system takes the current state $x_t$, the previous action $\alpha_{t-1}$, and the environmental factors $e_t$ as input. These environmental factors are encoded into a latent extrinsics vector $z_t$ using the environmental factor encoder $\mu$. Training the Adaptation Module (Phase 2):In the second phase, the adaptation module $\phi$ is trained to predict the extrinsics $\widehat{z_t}$ using past states and actions. This training utilizes supervised learning with on-policy data. The adaptation module learns to capture the relationship between the state-action history and the corresponding extrinsics.
  • Figure 4: By leveraging Adversarial Motion Priors and employing PopSAN as a replacement for the policy network during training, the agent is able to generate behaviors that capture the essence of the motion capture dataset.
  • Figure 5: Four graphs illustrate the exceptional performance of the robot in command-following task.
  • ...and 4 more figures