Table of Contents
Fetching ...

Network bottlenecks and task structure control the evolution of interpretable learning rules in a foraging agent

Emmanouil Giannakakis, Sina Khajehabdollahi, Anna Levina

TL;DR

It is shown that unconstrained meta-learning leads to the emergence of diverse plasticity rules, and regularization and bottlenecks in the model help reduce this variability, resulting in interpretable rules.

Abstract

Developing reliable mechanisms for continuous local learning is a central challenge faced by biological and artificial systems. Yet, how the environmental factors and structural constraints on the learning network influence the optimal plasticity mechanisms remains obscure even for simple settings. To elucidate these dependencies, we study meta-learning via evolutionary optimization of simple reward-modulated plasticity rules in embodied agents solving a foraging task. We show that unconstrained meta-learning leads to the emergence of diverse plasticity rules. However, regularization and bottlenecks to the model help reduce this variability, resulting in interpretable rules. Our findings indicate that the meta-learning of plasticity rules is very sensitive to various parameters, with this sensitivity possibly reflected in the learning rules found in biological networks. When included in models, these dependencies can be used to discover potential objective functions and details of biological learning via comparisons with experimental observations.

Network bottlenecks and task structure control the evolution of interpretable learning rules in a foraging agent

TL;DR

It is shown that unconstrained meta-learning leads to the emergence of diverse plasticity rules, and regularization and bottlenecks in the model help reduce this variability, resulting in interpretable rules.

Abstract

Developing reliable mechanisms for continuous local learning is a central challenge faced by biological and artificial systems. Yet, how the environmental factors and structural constraints on the learning network influence the optimal plasticity mechanisms remains obscure even for simple settings. To elucidate these dependencies, we study meta-learning via evolutionary optimization of simple reward-modulated plasticity rules in embodied agents solving a foraging task. We show that unconstrained meta-learning leads to the emergence of diverse plasticity rules. However, regularization and bottlenecks to the model help reduce this variability, resulting in interpretable rules. Our findings indicate that the meta-learning of plasticity rules is very sensitive to various parameters, with this sensitivity possibly reflected in the learning rules found in biological networks. When included in models, these dependencies can be used to discover potential objective functions and details of biological learning via comparisons with experimental observations.
Paper Structure (21 sections, 11 equations, 7 figures)

This paper contains 21 sections, 11 equations, 7 figures.

Figures (7)

  • Figure 1: Structure of the neural network controlling the embodied foraging agent. a. A diagram of the network controlling the foraging agent. The sensor layer receives inputs at each time step (the ingredients of the nearest food). The output of that network is given as input to the motor network, along with the distance $d$ and angle $\alpha$ to the nearest food, the current velocity $v$, and energy $E$ of the agent. These signals are processed through two hidden layers to the final output of motor commands as the linear and angular acceleration of the agent b. Details of the sensory network. The sensor layer receives inputs representing the quantity of each ingredient of the nearest food at each time step. The agent outputs its assessment of the food's value $y_t$, and when a food particle is consumed it receives the true value $R_t$ as feedback; it finally uses this information to update the weight matrix according to the plasticity rule.
  • Figure 2: Diverse learning rules lead to high performance in the foraging task a. The trajectory of an agent (grey line) in the 2D environment. A well-trained agent will approach and consume food with positive values (green dots) and avoid negative food (red dots). The $\times$ signs denote the locations of food particles consumed by the agent b. The fitness of the best performing agent from 5 evolving populations (Eq. \ref{['eq:fitness']}) increases over generations of the evolutionary algorithm. The dotted gray line indicates the maximum fitness of the "Oracle" non-plastic agents who are given the correct food values as input c. The learned weights of the sensory network (blue dots) correlate strongly with the actual ingredient values (Pearson cc of $0.92 \pm 0.09$). d, e. The evolved plasticity rules across 20 runs for agents with scalar/binary readout, respectively.
  • Figure 3: Agents with binary sensory readouts perform and generalize better than those with scalar readouts. a, b. The histogram of fitnesses of the top-ranking agent in each of the 20 runs over 100 independent environment realizations for scalar/binary networks times, respectively. c. Schematic of swapping networks: Motor and sensor networks are swapped between the fittest agents from each run. d, e. The swapped networks' fitnesses plotted against the mean fitness of the original configurations for the scalar/binary networks, respectively. The mean fitnesses for the swapped agents were significantly different (paired two-sided t-test) for the networks with a scalar readout ($p_{\text{value}} = 0.00087$) but not for the ones with a binary readout ($p_{\text{value}} = 0.1457$).
  • Figure 4: Plasticity rules converge with regularization. a. The fitness distribution for different simulations of scalar (Sc) and binary (Bin) sensory readouts, regularization of the plasticity parameters (Reg.) and subtractive (-) vs divisive (÷) weight normalization. b. The difference in fitness between the original and swapped agents for different simulations. c.The regularized rules developing in networks with scalar sensory readout for subtractive normalization. d. Same for divisive normalization. e. The regularized rules developing in networks with binary sensory readout for subtractive normalization. f. Same for divisive normalization.
  • Figure 5: Trainable readout nonlinearity leads to different rules depending on weight normalization. a. The evolved plasticity rules across 20 runs for agents with trainable sigmoid sensory network outputs and subtractive weight normalization. b. The sigmoid nonlinearities evolve a range of different slopes for subtractive normalization. c. The evolved plasticity rules with divisive weight normalization are similar to the ones developed for the binary networks with fixed nonlinearity, Fig. \ref{['fig:Rules_L1']}. b. The evolved sigmoid nonlinearities are still broad but steeper in networks trained with divisive normalization. e. The fitness distribution for subtractive (-, light green) vs divisive (÷, dark green) weight normalization. f. The fitness difference between the original and swapped agents.
  • ...and 2 more figures