Table of Contents
Fetching ...

Tuning Synaptic Connections instead of Weights by Genetic Algorithm in Spiking Policy Network

Duzhen Zhang, Tielin Zhang, Shuncheng Jia, Qingyu Wang, Bo Xu

TL;DR

Inspired by biological research showing that the brain forms memories by creating new synaptic connections and rewiring these connections based on new experiences, a spiking policy network (SPN) is optimized using a genetic algorithm as an energy-efficient alternative to DRL.

Abstract

Learning from interaction is the primary way that biological agents acquire knowledge about their environment and themselves. Modern deep reinforcement learning (DRL) explores a computational approach to learning from interaction and has made significant progress in solving various tasks. However, despite its power, DRL still falls short of biological agents in terms of energy efficiency. Although the underlying mechanisms are not fully understood, we believe that the integration of spiking communication between neurons and biologically-plausible synaptic plasticity plays a prominent role in achieving greater energy efficiency. Following this biological intuition, we optimized a spiking policy network (SPN) using a genetic algorithm as an energy-efficient alternative to DRL. Our SPN mimics the sensorimotor neuron pathway of insects and communicates through event-based spikes. Inspired by biological research showing that the brain forms memories by creating new synaptic connections and rewiring these connections based on new experiences, we tuned the synaptic connections instead of weights in the SPN to solve given tasks. Experimental results on several robotic control tasks demonstrate that our method can achieve the same level of performance as mainstream DRL methods while exhibiting significantly higher energy efficiency.

Tuning Synaptic Connections instead of Weights by Genetic Algorithm in Spiking Policy Network

TL;DR

Inspired by biological research showing that the brain forms memories by creating new synaptic connections and rewiring these connections based on new experiences, a spiking policy network (SPN) is optimized using a genetic algorithm as an energy-efficient alternative to DRL.

Abstract

Learning from interaction is the primary way that biological agents acquire knowledge about their environment and themselves. Modern deep reinforcement learning (DRL) explores a computational approach to learning from interaction and has made significant progress in solving various tasks. However, despite its power, DRL still falls short of biological agents in terms of energy efficiency. Although the underlying mechanisms are not fully understood, we believe that the integration of spiking communication between neurons and biologically-plausible synaptic plasticity plays a prominent role in achieving greater energy efficiency. Following this biological intuition, we optimized a spiking policy network (SPN) using a genetic algorithm as an energy-efficient alternative to DRL. Our SPN mimics the sensorimotor neuron pathway of insects and communicates through event-based spikes. Inspired by biological research showing that the brain forms memories by creating new synaptic connections and rewiring these connections based on new experiences, we tuned the synaptic connections instead of weights in the SPN to solve given tasks. Experimental results on several robotic control tasks demonstrate that our method can achieve the same level of performance as mainstream DRL methods while exhibiting significantly higher energy efficiency.
Paper Structure (28 sections, 9 equations, 11 figures, 5 tables)

This paper contains 28 sections, 9 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: The correspondence diagram between the sensorimotor neuron pathway and our SPN.
  • Figure 2: Overview of the evolution of our SPN sub-networks.
  • Figure 3: Four robotic control tasks. (a) CartPole-v1: Observation dimension: $n=4$, Action dimension: $m=2$ (discrete), Goal: balance a pole on a cart; (b) HalfCheetah-v2: Observation dimension: $n=17$, Action dimension: $m=6$ (continuous), Goal: make a 2D cheetah robot run as fast as possible; (c) Swimmer-v2: Observation dimension: $n=8$, Action dimension: $m=2$ (continuous), Goal: make a 2D robot swim; (d) HumanoidStandup-v2: Observation dimension: $n=376$, Action dimension: $m=17$ (continuous), Goal: make a 3D two-legged robot standup.
  • Figure 4: Learning curves for the OpenAI Gym robotic control tasks. The purple areas represented the learning curves of SPN-Connections-GA during 100 generations of evolution, where the solid curves correspond to the mean and the shaded region to half a standard deviation over $10$ runs. The red horizontal line represented the performance level of DPN-Weights-PPO schulman2017proximal.
  • Figure 5: Learning curves for the MuJoCo continuous control tasks. The purple areas represented the learning curves of SPN-Connections-GA during 100 generations of evolution, where the solid curves correspond to the mean and the shaded region to half a standard deviation over $10$ runs. The blue areas represented the learning curves of SPN-Weights-GA during 100 generations of evolution, where the solid curves correspond to the mean and the shaded region to half a standard deviation over $10$ runs. The red horizontal line represented the performance level of DPN-Weights-PPO schulman2017proximal.
  • ...and 6 more figures