Table of Contents
Fetching ...

Structurally Flexible Neural Networks: Evolving the Building Blocks for General Agents

Joachim Winther Pedersen, Erwan Plantec, Eleni Nisioti, Milton Montero, Sebastian Risi

TL;DR

Structural rigidity in RL networks limits cross-domain generalization. The authors propose Structurally Flexible Neural Networks (SFNNs), which combine sparse, parameterized neurons with GRU-based synaptic plasticity and multiple neuron/synapse types to enable a single parameter set to adapt across environments with different input/output shapes. Optimized via CMA-ES across lifetimes in three tasks, SFNNs demonstrate rapid, lifetime-based organization and generalization, outperforming ablations and symmetric baselines. This work suggests a path toward foundation-model-like RL agents capable of operating across diverse tasks without architecture re-engineering. With a network of $32$ neurons and diverse building blocks, SFNNs offer a scalable approach to flexible, environment-agnostic control.

Abstract

Artificial neural networks used for reinforcement learning are structurally rigid, meaning that each optimized parameter of the network is tied to its specific placement in the network structure. It also means that a network only works with pre-defined and fixed input- and output sizes. This is a consequence of having the number of optimized parameters being directly dependent on the structure of the network. Structural rigidity limits the ability to optimize parameters of policies across multiple environments that do not share input and output spaces. Here, we evolve a set of neurons and plastic synapses each represented by a gated recurrent unit (GRU). During optimization, the parameters of these fundamental units of a neural network are optimized in different random structural configurations. Earlier work has shown that parameter sharing between units is important for making structurally flexible neurons We show that it is possible to optimize a set of distinct neuron- and synapse types allowing for a mitigation of the symmetry dilemma. We demonstrate this by optimizing a single set of neurons and synapses to solve multiple reinforcement learning control tasks simultaneously.

Structurally Flexible Neural Networks: Evolving the Building Blocks for General Agents

TL;DR

Structural rigidity in RL networks limits cross-domain generalization. The authors propose Structurally Flexible Neural Networks (SFNNs), which combine sparse, parameterized neurons with GRU-based synaptic plasticity and multiple neuron/synapse types to enable a single parameter set to adapt across environments with different input/output shapes. Optimized via CMA-ES across lifetimes in three tasks, SFNNs demonstrate rapid, lifetime-based organization and generalization, outperforming ablations and symmetric baselines. This work suggests a path toward foundation-model-like RL agents capable of operating across diverse tasks without architecture re-engineering. With a network of neurons and diverse building blocks, SFNNs offer a scalable approach to flexible, environment-agnostic control.

Abstract

Artificial neural networks used for reinforcement learning are structurally rigid, meaning that each optimized parameter of the network is tied to its specific placement in the network structure. It also means that a network only works with pre-defined and fixed input- and output sizes. This is a consequence of having the number of optimized parameters being directly dependent on the structure of the network. Structural rigidity limits the ability to optimize parameters of policies across multiple environments that do not share input and output spaces. Here, we evolve a set of neurons and plastic synapses each represented by a gated recurrent unit (GRU). During optimization, the parameters of these fundamental units of a neural network are optimized in different random structural configurations. Earlier work has shown that parameter sharing between units is important for making structurally flexible neurons We show that it is possible to optimize a set of distinct neuron- and synapse types allowing for a mitigation of the symmetry dilemma. We demonstrate this by optimizing a single set of neurons and synapses to solve multiple reinforcement learning control tasks simultaneously.
Paper Structure (14 sections, 1 equation, 6 figures, 1 table)

This paper contains 14 sections, 1 equation, 6 figures, 1 table.

Figures (6)

  • Figure 1: (A) Building Blocks of Structurally Flexible Neural Networks Depicted is a representation of the flow of activity from one neuron to another through a synapse. Neural activations are small vectors, rather than just scalars. Each neuron is represented by a small linear layer with hyperbolic-tangent non-linearities, similar to pedersen2023learning. The output of the pre-neuron is modulated by the synapse via element-wise multiplication with the current values of the synaptic weights, and the resulting signal arrives as input to the post-neuron (solid black arrows). The output of both pre- and post-neurons plus the global reward signal is then used to compute the weight updates using the GRU (dotted arrows), which is then applied to the synaptic weights. (B) Structurally Flexible Neural Network. The input vector consists of the observation from the environment, which is sent through a reservoir layer with random connectivity. Neurons of a certain type are always pre-neurons to synapses of the same type. Each color corresponds to a distinct type (input, hidden, output) that shares parameters. Each neuron has a linear layer associated with it, the parameters of which are shared by all other neurons of the same color. Likewise, each colored arrow corresponds to a synapse type. Each synapse type has one set of evolved GRU gate parameters associated with it. Importantly, even though the evolved gate parameters are shared between synapses of the same type, there is a unique hidden state associated with each single synapse, allowing for individual, history-dependent updates of each synapse.
  • Figure 2: Environments Used in Experiments From left to right: Acrobot-v1, MountainCar-v0, CartPole-v1
  • Figure 3: Number of evolved and plastic parameters: The approaches differ in their numbers of evolved and plastic parameters. Evolved parameters are the synaptic GRU parameters and the linear layers of the neurons. The plastic parameters are the synaptic weight values that are updated by the GRUs (LSTMs in the case of SymLA). The SFNN_single version has the same number of plastic parameters as SFNN, but fewer evolved ones, as only parameters for a single neuron and synapse type are evolved. The SFNN_fully version has the same number of parameters as SFNN, but has double the amount of plastic parameters, as no synapses are excluded. The SymLA method has many more evolved parameters than plastic, as the synaptic units are much bigger than in SFNN and there are fewer of them, as the SymLA networks are shallow.
  • Figure 4: Training plots: Shown are the average and standard deviations of the population means of five runs for each model. Progress in each of the three environments is shown, as well as the product of each score when the scores are scaled to be between zero and one (top left). Of the four different settings, only the full SFNN can consistently make progress on each of the three environments. The fully connected SFNN shows no improvements at all, while the SFNN with a single neuron and synapse type, and the SymLA method only display modest progress.
  • Figure 5: Adapting Weight Matrices in SFNNs. An example of the same initial weight matrix used in the three environments resulting in different weight matrices after eight episodes in a different task. The sum of the four elements that make up the synapse is depicted. Values are clipped to have a maximum magnitude of 10 for readability. The only difference in the initial matrices between the environments is how many neurons are counted as input/output neurons, to fit the respective environment. Especially the solution for the CartPole environment stands out from the two others with the magnitudes of weights coming from the input neurons being much larger than the other weights.
  • ...and 1 more figures