Biologically-Plausible Topology Improved Spiking Actor Network for Efficient Deep Reinforcement Learning
Duzhen Zhang, Qingyu Wang, Tielin Zhang, Bo Xu
TL;DR
This work tackles the expressivity gap of artificial actors in deep reinforcement learning by introducing the Biologically-Plausible Topology improved Spiking Actor Network (BPT-SAN), which fuses spiking neurons with rich spatial-temporal dynamics and biologically-plausible connectivity. The method encodes continuous states into spikes via population, Poisson, or deterministic coding, and processes them through inter-layer nonlinear dendritic branches combined with intra-layer lateral connections, all learned under a hybrid TD3/SAC framework with pseudo backpropagation. Empirical results on four continuous control tasks from OpenAI Gym MuJoCo show that BPT-SAN outperforms an artificial actor network and a regular spiking actor network, with ablations confirming the value of both nonlinear dendrites and lateral intra-layer interactions. The study highlights how brain-inspired topologies can enhance DRL performance and suggests avenues for energy-efficient and robust decision-making in real-world robotics.
Abstract
The success of Deep Reinforcement Learning (DRL) is largely attributed to utilizing Artificial Neural Networks (ANNs) as function approximators. Recent advances in neuroscience have unveiled that the human brain achieves efficient reward-based learning, at least by integrating spiking neurons with spatial-temporal dynamics and network topologies with biologically-plausible connectivity patterns. This integration process allows spiking neurons to efficiently combine information across and within layers via nonlinear dendritic trees and lateral interactions. The fusion of these two topologies enhances the network's information-processing ability, crucial for grasping intricate perceptions and guiding decision-making procedures. However, ANNs and brain networks differ significantly. ANNs lack intricate dynamical neurons and only feature inter-layer connections, typically achieved by direct linear summation, without intra-layer connections. This limitation leads to constrained network expressivity. To address this, we propose a novel alternative for function approximator, the Biologically-Plausible Topology improved Spiking Actor Network (BPT-SAN), tailored for efficient decision-making in DRL. The BPT-SAN incorporates spiking neurons with intricate spatial-temporal dynamics and introduces intra-layer connections, enhancing spatial-temporal state representation and facilitating more precise biological simulations. Diverging from the conventional direct linear weighted sum, the BPT-SAN models the local nonlinearities of dendritic trees within the inter-layer connections. For the intra-layer connections, the BPT-SAN introduces lateral interactions between adjacent neurons, integrating them into the membrane potential formula to ensure accurate spike firing.
