Theories of synaptic memory consolidation and intelligent plasticity for continual learning

Friedemann Zenke; Axel Laborieux

Theories of synaptic memory consolidation and intelligent plasticity for continual learning

Friedemann Zenke, Axel Laborieux

TL;DR

This chapter examines the pivotal role plasticity mechanisms with complex internal synaptic dynamics could play in enabling this ability in neural networks and underscores the significance of synaptic metaplasticity in sustaining continual learning capabilities.

Abstract

Humans and animals learn throughout life. Such continual learning is crucial for intelligence. In this chapter, we examine the pivotal role plasticity mechanisms with complex internal synaptic dynamics could play in enabling this ability in neural networks. By surveying theoretical research, we highlight two fundamental enablers for continual learning. First, synaptic plasticity mechanisms must maintain and evolve an internal state over several behaviorally relevant timescales. Second, plasticity algorithms must leverage the internal state to intelligently regulate plasticity at individual synapses to facilitate the seamless integration of new memories while avoiding detrimental interference with existing ones. Our chapter covers successful applications of these principles to deep neural networks and underscores the significance of synaptic metaplasticity in sustaining continual learning capabilities. Finally, we outline avenues for further research to understand the brain's superb continual learning abilities and harness similar mechanisms for artificial intelligence systems.

Theories of synaptic memory consolidation and intelligent plasticity for continual learning

TL;DR

Abstract

Paper Structure (22 sections, 10 equations, 11 figures)

This paper contains 22 sections, 10 equations, 11 figures.

Introduction
Learning and forgetting in Hopfield networks
Memory capacity and forgetting
The stability-plasticity dilemma
Synaptic memory consolidation and synaptic complexity
Phenomenological models of metaplasticity
Phenomenological models of synaptic tagging and capture
Synaptic consolidation and fundamental limits on memory lifetimes
The Ideal Observer Framework
Limits on memory lifetimes in simple synapses
Memory lifetimes in complex synapses
Continual learning in artificial neural networks
Catastrophic forgetting
Learning from data as an optimization problem
Alleviating catastrophic forgetting through synaptic consolidation
...and 7 more sections

Figures (11)

Figure 1: (a) Snapshot of network connectivity and activity during associative learning and recall in a plastic spiking neural network model with synaptic consolidation zenke_diverse_2015. Subsections of the input and recurrent weight matrix corresponding to excitatory connections after reordering the neurons according to assembly membership (left). Assemblies are visible as blocks on the diagonal. Spike raster of neuronal activity of the corresponding neurons (right). Stimulus input is indicated by colored bars (top). Population firing rate of neurons with specific tuning to one of the three stimuli over time (bottom). After repeated exposure to a set of three external stimuli for 2400s, neurons in the plastic spiking neural network simulation have formed stimulus selective ensembles with strong recurrent connectivity which allows them to maintain elevated firing rates during the delay interval after the stimulus has been removed. (b) Same as panel (a), but without synaptic consolidation. The afferent synaptic weights have largely lost their stimulus selectivity and the recurrent network dynamics decouple from the external world. Despite ongoing external stimulation (top), only the third assembly remains active. Figure reproduced from zenke_diverse_2015.
Figure 2: Power-law forgetting in the Cascade model. (a) Schematic of the Cascade model. Synapses are either in a weak (left column, white circles) or strong (right column, orange circles) efficacy state, but have an internal metaplastic state represented vertically. The transitions between states are governed by probabilities that decrease exponentially with the depth in the cascade. (b) Example forgetting curves, which show the signal to noise ratio as a function of the number of time with storage rate $r$, for an exponential decay (blue), a Cascade model with cascade depth $n=15$ (orange; data from fusi_cascade_2005), and a power-law with exponent $- 1/2$ (gray) aligned to the initial SNR of the exponential case. Experimentally observed forgetting curves typically follow a power-law with exponents smaller than one. (c) The benna_computational_2016 model for continuous-valued synapses. The full state of the synapse is a chain of variables $u_1$, $u_2$, ... interacting through differential equations reminiscent of connected beakers of areas $C_k$ and pipe widths $g_{k, k+1}$. The efficacy of the synapse is the first variable $u_1$. All panels were redrawn from fusi_cascade_2005 and benna_computational_2016 respectively.
Figure 3: Illustration of catastrophic forgetting in ANN. The initially untrained network (top left) is first trained successfully to classify 0 and 1 digits, denoted Task A, which results in task-specific weight updates in purple (bottom left). Then, the network is trained on a second task to classify 2 and 3 digits, denoted Task B, which results in new task-specific weight updates in blue (top right). After training on Task B, the performance on Task A is degraded due to interference between the memory traces associated with both tasks (bottom right).
Figure 4: Catastrophic forgetting in an ANN trained on split-MNIST. The plots show the task accuracy for the different tasks on the Split MNIST problem as a function of the number of tasks the network has seen during training. Split MNIST consists of five consecutive tasks. Task 1 corresponds to classifying zeros and ones from MNIST, Task 2: Twos and Threes, and so forth. Since each task is a binary classification problem, the network performs at chance level on unseen tasks (gray shaded region). After training on Task 1, its accuracy is close to 1. Similarly Task 2 accuracy rises close to 100% after training on Task 2. However, meanwhile Task 1 accuracy drops close to chance level. A similar pattern repeats for the other tasks. The error bars correspond to SEM ($n=10$). Figure adapted from zenke_continual_2017.
Figure 5: Schematic of an ANN optimization landscape. The optimization landscape is defined by the value of the scalar loss function and spanned by the network parameters, i.e., the synaptic weights. Due to the vast number of synapse in ANN the optimization landscape landscape is high-dimensional. Learning in an ANN corresponds to finding minima in this high-dimensional landscape. Here we can only visualize two synapses. Typically this optimization is achieved by following the negative gradient (blue line).
...and 6 more figures

Theories of synaptic memory consolidation and intelligent plasticity for continual learning

TL;DR

Abstract

Theories of synaptic memory consolidation and intelligent plasticity for continual learning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)