Synaptic bundle theory for spike-driven sensor-motor system: More than eight independent synaptic bundles collapse reward-STDP learning

Takeshi Kobayashi; Shogo Yonekura; Yasuo Kuniyoshi

Synaptic bundle theory for spike-driven sensor-motor system: More than eight independent synaptic bundles collapse reward-STDP learning

Takeshi Kobayashi, Shogo Yonekura, Yasuo Kuniyoshi

TL;DR

A system that can vary the number of independent synaptic bundles in sensor-to-motor connections, which will make it possible to study the functions of spikes that were previously inaccessible due to the difficulty of learning.

Abstract

Neuronal spikes directly drive muscles and endow animals with agile movements, but applying the spike-based control signals to actuators in artificial sensor-motor systems inevitably causes a collapse of learning. We developed a system that can vary \emph{the number of independent synaptic bundles} in sensor-to-motor connections. This paper demonstrates the following four findings: (i) Learning collapses once the number of motor neurons or the number of independent synaptic bundles exceeds a critical limit. (ii) The probability of learning failure is increased by a smaller number of motor neurons, while (iii) if learning succeeds, a smaller number of motor neurons leads to faster learning. (iv) The number of weight updates that move in the opposite direction of the optimal weight can quantitatively explain these results. The functions of spikes remain largely unknown. Identifying the parameter range in which learning systems using spikes can be constructed will make it possible to study the functions of spikes that were previously inaccessible due to the difficulty of learning.

Synaptic bundle theory for spike-driven sensor-motor system: More than eight independent synaptic bundles collapse reward-STDP learning

TL;DR

Abstract

Paper Structure

This paper contains 6 equations, 4 figures.

Figures (4)

Figure 1: (a) Double-well potential task and network architecture. $m=0.3$ and $\gamma=0.5$ is used. The SNN controller consists of separate sensory and motor neuron pools. The position $x$ and the velocity $v$ are coded by two distinct sensory pools (only one is shown). Each sensory pool contains 30 neurons with tuning curves $f_i(s)=c \exp\!\bigl(k(\cos(s-s_i)-1)\bigr)$, where $c=40$, $k=12.5$, and $s_i$ spans $[-1.5,1.5]$ in 30 equal steps. Motor neurons are divided into pools that generate positive and negative thrust. These pools form a winner-take-all circuit via lateral inhibition. The number of neurons in inhibitory population is $2 N_m / 4$. The synaptic weights from positive or negative neuron population to inhibitory population $W_{EI}$ is $75.0 / N_m$, from inhibitory population to positive or negative neuron population $W_{IE}$ is $150.0 / (2 N_m / 4)$. The synaptic weights of recurrent connection within positive or negative neuron population $W_{EE}$ is $8.25 / N_m$. The synaptic weights within winner-take-all circuit is not subjected to Reward STDP learning. The signed sum of the mean PSPs from the positive and negative pools yields the spike motor command $A(t)$ (Eq. \ref{['eq:psp_readout']}). Sensor-motor synapses are trained by reward-modulated STDP. (b) Learning performance is systematically examined by varying $N_b$, the number of independent synaptic bundles, which is defined as the number of distinct weight values that can be assigned to $N_m$ synapses from a single sensory neuron to a motor pool. When $N_m = 6$ and $N_b = 1$, all six synapses share one weight. When $N_b = 2$, the synapses form two groups of three with identical weights. When $N_b = 6$, each synapse has an independent weight.
Figure 2: (a) The learning success rate for $N_m \le 10$. 25 training runs were performed for each value of $N_m$. A run was considered successful if its final epoch score exceeded 0.65. (b) The effect of $N_m$ on the amplitude (variance) of the spike motor command. For each $N_m$, 1000 simulations of 0.2 seconds were run, with an average firing probability of 0.15 per neuron. The outputs were generated via Eq. (\ref{['eq:psp_readout']}). The shaded band indicates the magnitude of the output variance, and the solid line represents a single randomly selected output trace. (c) Learning curves for $N_b = 1$ with $N_m = \{6, 10, 15, 20, 25, 30\}$. 20 training runs were performed for each $N_m$. The solid line shows the mean and the shaded band shows the standard deviation. (d) Learning curves for $N_b = \{6, 7, 8, 9, 10, 11\}$ with $N_m = N_b$. 20 training runs were performed for each $N_b$. The solid line shows the mean and the shaded band shows the standard deviation.
Figure 3: (a) Relationship between the number of incorrect transition counts (vertical axis), which is the number of times a weight update moves in the opposite direction of the correct synaptic weight, and $N_m$ with $N_b = 1$. (b) The same metric is plotted against $N_b$ for networks with $N_m=N_b$. 15 training runs were performed for each value of $N_m$ (or $N_b$).The solid line shows the mean, and the shaded band shows the standard deviation. The incorrect transition count represents the average number of incorrect transitions per synapse per second.
Figure 4: Scatter plot of learning score versus the incorrect transition count which is the number of times a weight update moves in the opposite direction of the optimal synaptic weight. Learning runs were conducted for parameter combinations in which $N_b$ was varied while $5 <= N_m <= 25$. Each point represents a single training run. The color of each point corresponds to the value of $N_b$. For each parameter combination, $15$ training runs were performed. The incorrect transition count represents the average number of incorrect transitions per synapse per second.