Table of Contents
Fetching ...

Modular Boundaries in Recurrent Neural Networks

Jacob Tanner, Sina Mansour L., Ludovico Coletta, Alessandro Gozzi, Richard F. Betzel

TL;DR

This work uses RNNs as a model system to study the character of modular neural populations, using a community detection method from network science known as modularity maximization to partition neurons into distinct modules.

Abstract

Recent theoretical and experimental work in neuroscience has focused on the representational and dynamical character of neural manifolds --subspaces in neural activity space wherein many neurons coactivate. Importantly, neural populations studied under this "neural manifold hypothesis" are continuous and not cleanly divided into separate neural populations. This perspective clashes with the "modular hypothesis" of brain organization, wherein neural elements maintain an "all-or-nothing" affiliation with modules. In line with this modular hypothesis, recent research on recurrent neural networks suggests that multi-task networks become modular across training, such that different modules specialize for task-general dynamical motifs. If the modular hypothesis is true, then it would be important to use a dimensionality reduction technique that captures modular structure. Here, we investigate the features of such a method. We leverage RNNs as a model system to study the character of modular neural populations, using a community detection method from network science known as modularity maximization to partition neurons into distinct modules. These partitions allow us to ask the following question: do these modular boundaries matter to the system? ...

Modular Boundaries in Recurrent Neural Networks

TL;DR

This work uses RNNs as a model system to study the character of modular neural populations, using a community detection method from network science known as modularity maximization to partition neurons into distinct modules.

Abstract

Recent theoretical and experimental work in neuroscience has focused on the representational and dynamical character of neural manifolds --subspaces in neural activity space wherein many neurons coactivate. Importantly, neural populations studied under this "neural manifold hypothesis" are continuous and not cleanly divided into separate neural populations. This perspective clashes with the "modular hypothesis" of brain organization, wherein neural elements maintain an "all-or-nothing" affiliation with modules. In line with this modular hypothesis, recent research on recurrent neural networks suggests that multi-task networks become modular across training, such that different modules specialize for task-general dynamical motifs. If the modular hypothesis is true, then it would be important to use a dimensionality reduction technique that captures modular structure. Here, we investigate the features of such a method. We leverage RNNs as a model system to study the character of modular neural populations, using a community detection method from network science known as modularity maximization to partition neurons into distinct modules. These partitions allow us to ask the following question: do these modular boundaries matter to the system? ...
Paper Structure (3 sections, 15 equations, 20 figures)

This paper contains 3 sections, 15 equations, 20 figures.

Figures (20)

  • Figure 1: Representations/selectivity profiles reflect module boundaries, (a) Schematic describing the perceptual decision-making task and the architecture of the RNN trained to perform it. The input neurons are given stimulus information and a fixation input. The stimuli come from two distributions with different means. The RNN must determine which of the two stimuli come from the distribution with the greater mean while the fixation input has a value of 1. When the fixation input is zero, the decision is made based on which of the output neurons have the greater activity. (b) Correlaton matrix of recurrent neuronal activity during task trials reorganized according to modules. Modularity maximization found four modules, two of which are labeled 'mod. 1' and 'mod. 2'. The other two modules are associated with the fixation input (one activates at the beginning of the fixation period, and the other activates at the end of the fixation period). (c) Mean activity of each population for different fixation periods as well as cumulative $\Delta$ inputs between the stimuli. Notice how the mean activity in each module tracks with this value. (d) Two plots showing the correlation between activity of the modules and the activity of output neurons.(e) Two plots showing that modules 1 and 2 track cases where cumulative $\Delta$ inputs is greater than 0 and less than zero respectively. (f) Schematic showing the process of lesioning the outgoing connections from modules. (g/h) Two sets of boxplots showing the accuracy on different trials following an lesions to module outputs. Accuracy is considered separately for trials where stimulus 1 had the higher mean than stimulus 2, and vice versa.
  • Figure 2: Quadrants/octants and sectors determine the boundaries between modules in small N feed-forward neural networks., (a) Schematic clarifying that these analyses were all performed with Gaussian distribution random input data with the same mean and variance. (b) Correlation matrix produced by running random activity through two input neurons with random connection weights to 100 output neurons. Correlation matrix is reordered so that neurons in the same quadrant are grouped together (color labels for modules correspond to the color of dots in the next panel). (c) Parameter space for the two input neurons connection weights to the 100 output neurons. Each point represents the set of connection weights from each input neuron to a single output neuron, and the points are colored according to the quadrant they belong to. (d) Matrix of the mean correlation values within each population by population block of the correlation matrix in panel b. (e) The parameter space of a feed-forward network with 3 input neurons and 100 output neurons. For clarity, panels show all six sides of the cube representing the 3-dimensional parameter space. Dots correspond to all three input neurons connection weights to a single output neuron. Colors correspond to the octant/sector that a point falls into. (f) Correlation matrix of output neuron activity after providing the input neurons with random activity. Matrix is reordered by the octant/sector that each output neuron belongs to, and the colors labels for the modules correspond to the colored dots in the previous panel. (g) Matrix of the mean correlation values within each population by population block of the correlation matrix in panel f. (h) Plot showing the partition quality ($Q$) of the sector-based partitions of the output correlation matrix as you increase the number of input neurons. Importantly, these partitions are directly inferred from the structural connection weights and evaluated on the correlation matrix. We compare this with the partition quality values for partitions that were optimized directly on the correlation matrix. (i) Plot showing the partition quality ($Q$) of the sector-based partitions of the output correlation matrix as you increase the number of input neurons. Importantly, these partitions are optimized based on the hamming similarity of the structural connection weights and evaluated on the correlation matrix. We compare this with the partition quality values for partitions that were optimized directly on the correlation matrix. (j) Plot showing the partition quality ($Q$) of the cosine similarity partitions of the output correlation matrix as you increase the number of input neurons. Importantly, these partitions are optimized based on the cosine similarity of the structural connection weights and evaluated on the correlation matrix. We compare this with the partition quality values for partitions that were optimized directly on the correlation matrix.
  • Figure 3: Pairwise similarity of Jacobian matrix predicts modular structure., (a) Schematic of showing a trajectory through state space (transient dynamics) and a fixed point. The colors of each correspond to the violin plots found in panel b where we used a Jacobian matrix calculated from either the transient dynamics across a task trajectory (green), or near fixed points (orange). (b) The violin plots show the correlation between the cosine similarity of rows of the Jacobian matrix and elements of the correlation matrix for recurrent neurons across 50 separately trained models on the perceptual decision-making task and the go versus no-go task. (c) Example average Jacobian matrix across 3 task trials on the perceptual decision making task. This matrix was reordered according to the similarity of the rows in this matrix. Lines demonstrate the boundaries between modules shown in panels d and e. (d) Matrix showing the pairwise cosine similarity of the Jacobian reordered according to a modular partition that was optimized on this same matrix. (e) The correlation matrix of recurrent neurons reordered using the modular partition that was optimized on the pairwise cosine similarity of the Jacobian. (f) Plot showing the induced partition quality $Q$ value when applying this modular partition (boundary) to the correlation matrix of recurrent neurons, compared to the null distribution when this partition was randomly permuted 1000 times. (g) Plot showing the Jacobian similarity values plotted against the correlation values between recurrent neurons activity.
  • Figure 4: Input-based moduleboundaries are reflected in recurrent neurons of RNNs and brains., (a) Schematic of input connections onto the recurrent layer. (b) Input connection similarity matrix reordered using modularity maximization. (c) Correlation matrix of recurrent activity reordered by the partition of the . (d) Boxplot showing the null distribution of partition quality (Q) values that we should expect by chance. This was produced by randomly permuting the partition labels from b and applying them to c. The real partition quality (Q) value is in blue. (e) Plot showing the relationship between the cosine similarity of input connection weights and the correlation matrix of recurrent neurons. Dot color and size indicates the number of points that fell in this bin. (f/k) Schematic showing the thalamus and the cortex in mice/humans. (g/l) Thalamocortical input connection weight similarity matrix reordered using modularity maximization (for mice/humans; human matrix down-sampled for plotting). (h/m) Functional connectivity/correlation matrix of the cortex reordered by the partition of the thalamocortical input connection similarity matrix (for mice/humans; human matrix down-sampled for plotting). (i/n) Boxplot showing the null distribution of partition quality (Q) values that we should expect by chance. This was produced by randomly permuting the partition labels from g/l in a way that maintains the spatial autocorrelation in fMRI data and applying them to h/m. The real partition quality (Q) value is in blue. (j/o) Plot showing the relationship between the cosine similarity of input connection weights and the functional connectivity/correlation matrix of recurrent neurons. Dot color and size indicates the number of points that fell in this bin (for mice/humans).
  • Figure 5: Lesioning recurrent connections within moduleboundaries has circumscribed effects on dynamics., (a) Schematic showing how we perturb different input neurons during our lesioning trials. (b) Plot of the recurrent activity of a trained RNN projected into the first two principal components. In red, we plot the fixed/slow points approximated using a gradient-descent based method sussillo2013opening. The activity trajectory is colored according to the correct decision for the trial. Note that these colors sometimes overlap given that the cumulative mean can be artificially higher for the incorrect stimulus early in the trial due to sampling variability. (c) Same as b, but instead of plotting the fixed points, we plot the decision boundary for the network (this is a visual estimate; see main text for how the boundary was calculated). (d) Same as b, but instead of plotting the fixed points, we plot the trajectories of two perturbation trials. In the green trial we perturbed stimulus 1. In the purple trial we perturbed stimulus 2. The state of the system starts at the black star and perturbations result in activity that stably rests at the colored stars. (e) Schematic showing how we lesioned the weights of recurrent connection weights within moduleboundaries. (f) Lesions to population 1 cause the end points of stimulus 1 perturbations to move closer to the decision boundary, whereas the end points for the stimulus 2 perturbation in this example move further away from the decision boundary. (g) Showing a similar but opposite effect as j when lesioning population 2. (h) These plots show results from our four lesioning conditions (as described in main text). When increasingly lesioning population 1, stimulus 1 perturbations move closer to the decision boundary, but stimulus 2 perturbations do not move closer to the decision boundary. The opposite is shown for increasingly lesioning population 2. These lines represent the average distance from the decision boundary across 100 trained RNNs.
  • ...and 15 more figures