Table of Contents
Fetching ...

Continual learning under domain transfer with sparse synaptic bursting

Shawn L. Beaulieu, Jeff Clune, Nick Cheney

TL;DR

A system that can learn sequentially over previously unseen datasets with little forgetting over time is introduced by controlling the activity of weights in a convolutional neural network on the basis of inputs using top-down regulation generated by a second feed-forward neural network.

Abstract

Existing machines are functionally specific tools that were made for easy prediction and control. Tomorrow's machines may be closer to biological systems in their mutability, resilience, and autonomy. But first they must be capable of learning and retaining new information without being exposed to it arbitrarily often. Past efforts to engineer such systems have sought to build or regulate artificial neural networks using disjoint sets of weights that are uniquely sensitive to specific tasks or inputs. This has not yet enabled continual learning over long sequences of previously unseen data without corrupting existing knowledge: a problem known as catastrophic forgetting. In this paper, we introduce a system that can learn sequentially over previously unseen datasets (ImageNet, CIFAR-100) with little forgetting over time. This is done by controlling the activity of weights in a convolutional neural network on the basis of inputs using top-down regulation generated by a second feed-forward neural network. We find that our method learns continually under domain transfer with sparse bursts of activity in weights that are recycled across tasks, rather than by maintaining task-specific modules. Sparse synaptic bursting is found to balance activity and suppression such that new functions can be learned without corrupting extant knowledge, thus mirroring the balance of order and disorder in systems at the edge of chaos. This behavior emerges during a prior pre-training (or 'meta-learning') phase in which regulated synapses are selectively disinhibited, or grown, from an initial state of uniform suppression through prediction error minimization.

Continual learning under domain transfer with sparse synaptic bursting

TL;DR

A system that can learn sequentially over previously unseen datasets with little forgetting over time is introduced by controlling the activity of weights in a convolutional neural network on the basis of inputs using top-down regulation generated by a second feed-forward neural network.

Abstract

Existing machines are functionally specific tools that were made for easy prediction and control. Tomorrow's machines may be closer to biological systems in their mutability, resilience, and autonomy. But first they must be capable of learning and retaining new information without being exposed to it arbitrarily often. Past efforts to engineer such systems have sought to build or regulate artificial neural networks using disjoint sets of weights that are uniquely sensitive to specific tasks or inputs. This has not yet enabled continual learning over long sequences of previously unseen data without corrupting existing knowledge: a problem known as catastrophic forgetting. In this paper, we introduce a system that can learn sequentially over previously unseen datasets (ImageNet, CIFAR-100) with little forgetting over time. This is done by controlling the activity of weights in a convolutional neural network on the basis of inputs using top-down regulation generated by a second feed-forward neural network. We find that our method learns continually under domain transfer with sparse bursts of activity in weights that are recycled across tasks, rather than by maintaining task-specific modules. Sparse synaptic bursting is found to balance activity and suppression such that new functions can be learned without corrupting extant knowledge, thus mirroring the balance of order and disorder in systems at the edge of chaos. This behavior emerges during a prior pre-training (or 'meta-learning') phase in which regulated synapses are selectively disinhibited, or grown, from an initial state of uniform suppression through prediction error minimization.

Paper Structure

This paper contains 2 figures.

Figures (2)

  • Figure 1: (A) Architecture for Tuning Synapses via Allostatic Regulation ("TSAR", Materials and Methods). (B) Task-specific modularity has traditionally be used to overcome catastrophic forgetting. Disjoint subgraphs, or "modules",, prevent prediction interference and unwanted weight changes (C) By initializing meta-learned regulation to be highly suppressant ("Grow", Materials and Methods) we establish a prior on regulation such that no input features are relevant to the recruitment of synapses in the classifier. For performant behavior to emerge, the regulator must learn to disinhibit or "grow" synapses which minimize prediction error. Conversely, (D) if regulation is initialized to be highly permissive ("Sculpt", Materials and Methods) then all input features initially are specified as relevant.
  • Figure 2: Regulation of layer C3 during meta-learning of Grow (A) and Prune (B). We report the $Q^{th}$ percentile of regulation across 250 randomly sampled meta-learning classes. Confidence intervals are computed across runs (lower bound=$20^{th}$ percentile; upper bound=$80^{th}$ percentile). (C) Randomly sampled window of synaptic activity (post-masking) under domain transfer for Grow. (D) Complementary cumulative distribution function (CCDF) for the percent of images in ImageNet that cause a given number of synapses to receive regulation above a threshold (T). We report individual runs (dot) and the mean across runs (dash-dot). Continual learning under domain transfer to Imagenet after meta-learning on 100% (A) or less than 3% of Omniglot classes (B). The Grow treatment (initial regulatory bias=-8) outperforms the Sculpt treatment (bias=0) on the domain transfer task (C) and this difference is exacerbated with data limited meta-learning.