Table of Contents
Fetching ...

Modular Growth of Hierarchical Networks: Efficient, General, and Robust Curriculum Learning

Mani Hamidi, Sina Khajehabdollahi, Emmanouil Giannakakis, Tim Schäfer, Anna Levina, Charley M. Wu

TL;DR

The performance and functional dynamics of a modular network trained on a memory task via an iterative growth curriculum are explored and it is found that for a given classical, non-modular recurrent neural network (RNN), an equivalent modular network will perform better across multiple metrics, including training time, generalizability, and robustness to some perturbations.

Abstract

Structural modularity is a pervasive feature of biological neural networks, which have been linked to several functional and computational advantages. Yet, the use of modular architectures in artificial neural networks has been relatively limited despite early successes. Here, we explore the performance and functional dynamics of a modular network trained on a memory task via an iterative growth curriculum. We find that for a given classical, non-modular recurrent neural network (RNN), an equivalent modular network will perform better across multiple metrics, including training time, generalizability, and robustness to some perturbations. We further examine how different aspects of a modular network's connectivity contribute to its computational capability. We then demonstrate that the inductive bias introduced by the modular topology is strong enough for the network to perform well even when the connectivity within modules is fixed and only the connections between modules are trained. Our findings suggest that gradual modular growth of RNNs could provide advantages for learning increasingly complex tasks on evolutionary timescales, and help build more scalable and compressible artificial networks.

Modular Growth of Hierarchical Networks: Efficient, General, and Robust Curriculum Learning

TL;DR

The performance and functional dynamics of a modular network trained on a memory task via an iterative growth curriculum are explored and it is found that for a given classical, non-modular recurrent neural network (RNN), an equivalent modular network will perform better across multiple metrics, including training time, generalizability, and robustness to some perturbations.

Abstract

Structural modularity is a pervasive feature of biological neural networks, which have been linked to several functional and computational advantages. Yet, the use of modular architectures in artificial neural networks has been relatively limited despite early successes. Here, we explore the performance and functional dynamics of a modular network trained on a memory task via an iterative growth curriculum. We find that for a given classical, non-modular recurrent neural network (RNN), an equivalent modular network will perform better across multiple metrics, including training time, generalizability, and robustness to some perturbations. We further examine how different aspects of a modular network's connectivity contribute to its computational capability. We then demonstrate that the inductive bias introduced by the modular topology is strong enough for the network to perform well even when the connectivity within modules is fixed and only the connections between modules are trained. Our findings suggest that gradual modular growth of RNNs could provide advantages for learning increasingly complex tasks on evolutionary timescales, and help build more scalable and compressible artificial networks.
Paper Structure (20 sections, 5 equations, 6 figures)

This paper contains 20 sections, 5 equations, 6 figures.

Figures (6)

  • Figure 1: Network Structures. Non-modular networks consist of a fixed number of neurons ($M = 20, 54, 91, 128$), with connections that are retrained at each curriculum step. After a given accuracy is reached for a task of length $N$, a new readout head (consisting of 2 neurons) for $N + 1$ is added and the network is retrained for all previous $N$. In contrast, the modular network adds a much smaller RNN module ($M_m = 5, 10, 15, 20$) for every readout head that is added in the curriculum. Thus, each readout head is attached to a separate RNN module, rather than one large reservoir as in non-modular networks.
  • Figure 2: Performance comparison of modular and non-modular architectures with different numbers of neurons (color bars) allocated to their recurrent processing unit. For modular networks, the number of neurons is reported per module. For non-modular networks, the number indicates the total number of neurons in the single recurrent core. These numbers were chosen to allow for a comparable total number of learnable parameters for a given task difficulty $N$. (a) Learning curves show that modular architectures with sufficient ($> 5$) neurons can solve a new task at every epoch while the non-modular networks plateau in their learning ability. (b) Pareto frontier of performance after 60 training epochs, shows that the modular architecture always achieves better performance for the same number of parameters. Note that the saturation of performance at $N_{\text{solved}}=60$ for the modular architecture is due to the choice of training epochs. (c) Generalization performance of networks trained on a task difficulty of $N$ and then tested on tasks of $N+K$. Accuracy is measured as the percentage of correct trials and error bars indicate the standard deviation. Results are averages over 4 non-modular and 3 modular networks, and errorbars show STD.
  • Figure 3: Weight perturbations degrade performance in both modular and non-modular networks of different sizes (colours). (a) Modular networks are more robust to perturbation of connections (modular networks: feedforward and recurrent weights). (b) Non-modular networks are more robust to the perturbation of single-neuron trained timescales.
  • Figure 4: Change in trained and effective timescales for different $N_{solved}$. (a) The trained timescale of the modular network stays the same across modules, while the timescales of the non-modular network converge to 1 as the $N_{solved}$ increases. (b) The effective timescales of both networks increase steadily with $N_{solved}$. A modular network with a module size of 15 neurons and a non-modular network with an equal number of trainable parameters were used. Results are averages over 4 networks.
  • Figure 5: Feedforward connections are more sensitive than recurrent connections in modular networks. Here, we use $M_m=15$ but achieve qualitatively similar results for other network sizes. (a) Perturbing feedforward weights affect downstream modules more strongly than earlier modules. (b) The recurrent weights exhibit a similar qualitative pattern but are quantitatively more robust against the same levels of perturbations. (c) Variability of feedforward versus recurrent connection weights across modules, where variability is inversely related to the degree of conservation (i.e., the amount of shared weights from one module to the next). Accuracy is averaged over 5 networks, 10 perturbations, and 1000 continuous evaluations. Error bars show SEM.
  • ...and 1 more figures