Stimulus-to-Stimulus Learning in RNNs with Cortical Inductive Biases

Pantelis Vafidis; Antonio Rangel

Stimulus-to-Stimulus Learning in RNNs with Cortical Inductive Biases

Pantelis Vafidis, Antonio Rangel

TL;DR

A recurrent neural network model of stimulus substitution is proposed which leverages two forms of inductive bias pervasive in the cortex: representational inductive bias in the form of mixed stimulus representations, and architectural inductive bias in the form of two-compartment pyramidal neurons that have been shown to serve as a fundamental unit of cortical associative learning.

Abstract

Animals learn to predict external contingencies from experience through a process of conditioning. A natural mechanism for conditioning is stimulus substitution, whereby the neuronal response to a stimulus with no prior behavioral significance becomes increasingly identical to that generated by a behaviorally significant stimulus it reliably predicts. We propose a recurrent neural network model of stimulus substitution which leverages two forms of inductive bias pervasive in the cortex: representational inductive bias in the form of mixed stimulus representations, and architectural inductive bias in the form of two-compartment pyramidal neurons that have been shown to serve as a fundamental unit of cortical associative learning. The properties of these neurons allow for a biologically plausible learning rule that implements stimulus substitution, utilizing only information available locally at the synapses. We show that the model generates a wide array of conditioning phenomena, and can learn large numbers of associations with an amount of training commensurate with animal experiments, without relying on parameter fine-tuning for each individual experimental task. In contrast, we show that commonly used Hebbian rules fail to learn generic stimulus-stimulus associations with mixed selectivity, and require task-specific parameter fine-tuning. Our framework highlights the importance of multi-compartment neuronal processing in the cortex, and showcases how it might confer cortical animals the evolutionary edge.

Stimulus-to-Stimulus Learning in RNNs with Cortical Inductive Biases

TL;DR

Abstract

Paper Structure (3 sections, 31 equations, 13 figures, 2 tables)

This paper contains 3 sections, 31 equations, 13 figures, 2 tables.

Acknowledgements
Author contributions
Data and code availability

Figures (13)

Figure 1: Model. (A) Every trial has a duration of $t_\text{trial}$ seconds. Trials start with the presentation of a CS, which disappears after time $t_\text{cs-off}$. The associated US appears at time $t_\text{us-on}$ and stays until the end of the trial. The network has to learn $N_\text{stim}$ unique CS-US pairs. (B) Associative neurons are modeled as an abstraction of a layer-5 cortical pyramidal neuron. $V^\text{s}$ and $V^\text{d}$ denote the voltage in the somatic and dendritic compartments. The somatic compartment receives as input a Boolean vector $r_\text{us}$ representing the US. The dendritic compartment receives as inputs a vector $\hat{r}_\text{cs}$ with a short-term memory representation of the CS, as well as recursive activity from all other neurons in the RNN. The matrices $W_\text{rnn}$, $W_\text{cs}$ and $W_\text{us}$ denote the synaptic weights for the inputs. $W_\text{us}$ is fixed throughout the experiment. $W_\text{rnn}$ and $W_\text{cs}$ are updated over trials with training. (C) Full outline of the model. The associative network is made of $N_\text{rnn}$ associative neurons. The US is presented directly to the associative neurons, whereas the CS is presented to a short-term memory circuit that produces the short-term memory representation $\hat{r}_\text{cs}$. Learning in the associated network is gated by a surprise signal which measures the extent to which the US, or its absence, was anticipated. The surprise signal is computed in three steps. First, throughout the trial a linear decoder is used to obtain an estimate $\hat{r}_\text{us}$ of the US from the population vector of the associative network, denoted by $r_\text{rnn}$. Second, an expectation $E^i$ is formed according for each US based on the similarity between $r^i_\text{us}$ and $\hat{r}_\text{us}$. These expectations determine the level of surprise $S$ associated with the arrival or absence of the US, which then gives rise to neuromodulator dynamics that gate learning in the associative network. (D) Performance of the short-term memory network in a single trial when CSs are presented only for 500 ms. We plot the output of the memory network for several seconds. Each color denotes a different element in $r_\text{cs}$.
Figure 2: Delay conditioning and stimulus substitution. (A) Trial structure. The network is presented with $N_\text{stim}=16$ different CS-US pairs, randomly selected in each trial. (B) The network learns all of the CS-US pairs after 500 training trials ($\approx$ 32 per pair). $r_\text{us}$ denotes the individual components of the Boolean vectors encoding each of the USs. $\hat{r}_\text{us}$ denotes the individual components of the decoded USs, based only on the presentation of the associated CSs, and measured just before the US appears. (C) Evolution of population responses during learning. Colors denote trial number. Each point compares the firing rate of an associate neuron at that stage of learning for a specific CS-US pair when only the US, or only the associated CS are presented. The colored lines are linear regression fits at each stage of learning. (D) Evolution of predicted US during learning. Green curve depicts the average expectation across USs after the network is presented only with the associated CS. Red curve depicts the distance between the true representation of the USs ($r_\text{us}$) and their decoded representation $\hat{r}_\text{us}$ when presented only with the associated CS. Individual pairs are shown in faint thin lines. (E) Number of trials required for the network to reach 80% performance for all pairs (defined as the first time at which the average expectation $E$ across pairs exceeds $0.8$) for different numbers of stimulus pairs. Performance is measured just before the US appears. Error bands denote $\mp$ SD computed across 5 different runs of the experiment. (F) Number of trials required to reach 80% performance for all pairs for different levels of similarity in the encoding of the CS and US input vectors. Error bands denote $\mp$ SD computed across 10 different runs of the experiment.
Figure 3: Trace conditioning. (A) Trial structure. The network is presented with $N_\text{stim}=16$ different CS-US pairs, randomly selected in each trial. (B) After 500 training trials ($\sim 32$ per pair), the network learns all of the CS-US pairs for short $t_\text{delay}$, but struggles for longer delays. $r_\text{us}$ denotes the individual components of the Boolean vectors encoding each of the USs. $\hat{r}_\text{us}$ denotes the individual components of the decoded USs, based only on the presentation of the associated CSs. For comparison purposes, we also show results for delay conditioning ($t_\text{delay} = -1$) (C) Evolution of predicted US during learning. Each curve depicts the expectation for each US after the network is presented only with the associated CS. Line is the mean across all stimulus pairs. Bands represent the $\mp$ SD across stimulus pairs. (D) Network learning performance after 500 training trials for different CS-US delays. Bars denoted $\mp$ SD across stimulus pairs.
Figure 4: Extinction and re-acquisition. (A) Trial structure. In trials where there US is not shown, surprise is computed at $t\approx 6$ seconds. (B) Learning and extinction path for the acquisition of a single CS-US pair. (C) Evolution of population responses during extinction. Colors denote extinction trial number. Each point compares the firing rate of an associate neuron at that stage of learning for a specific CS-US pair when only the US, or only the associated CS are presented. (D) Learning, extinction and re-acquisition path. Blue line involves an experiment in which the same CS-US pair is used in training and re-acquisition. Red line involves an experiment in which a new US is used at the re-acquisition phase.
Figure 5: Blocking, overshadowing, saliency and overexpectation. (A) Model extension to allow for simultaneous presentation of two CSs. Associations for $\textit{CS}_1$ and $\textit{CS}_2$ are represented in separate populations of associative neurons. The activity of each population is used to separately decode the US and to construct expectations $E_\text{cs1}$ and $E_\text{cs2}$. The overall expectation generated by the two CSs is given by $E=E_\text{cs1}+E_\text{cs2}$. Experiments assume that a single association between the US and both CSs has to be learnt. $E_\text{cs1}$ is the prediction generated by $\textit{CS}_1$ alone. $E_\text{cs2}$ is the prediction generated by $\textit{CS}_2$ alone. and $E_\text{cs1} + E_\text{cs2}$ is the prediction generated by both cues together. Since the CSs are present throughout the trial, we omit the short-term memory networks from this exercise. (B) Blocking: $\textit{CS}_1$ is presented in isolation and fully learns to predict the US before $\textit{CS}_2$ is introduced. In this case, $\textit{CS}_2$ is blocked from learning to predict the US. (C) Overshadowing: Both CSs are presented from onset and none of them reaches the same conditioning level as when it was presented alone; instead, the sum $E$ of their expectations learns the full association. (D) Saliency effects: similar to (C), but now the relative salience of $\textit{CS}_1$ has been increased by scaling up its input vector. As a result, the final conditioning level of $\textit{CS}_1$ is consistently higher than the one for $\textit{CS}_2$. (D) Overexpectation: $\textit{CS}_1$ and $\textit{CS}_2$ are conditioned separately. When presented together, $E$ exceeds $1$, which leads to a negative learning rate and unlearning.
...and 8 more figures

Stimulus-to-Stimulus Learning in RNNs with Cortical Inductive Biases

TL;DR

Abstract

Stimulus-to-Stimulus Learning in RNNs with Cortical Inductive Biases

Authors

TL;DR

Abstract

Table of Contents

Figures (13)