Sequential Learning in the Dense Associative Memory

Hayden McAlister; Anthony Robins; Lech Szymanski

Sequential Learning in the Dense Associative Memory

Hayden McAlister, Anthony Robins, Lech Szymanski

TL;DR

This paper investigates sequential learning in Dense Associative Memory DAM, a modern Hopfield network with memory vectors and an interaction vertex n. It benchmarks a range of sequential learning techniques including naive rehearsal, pseudorehearsal, GEM, A-GEM, and several regularization-based methods on five permuted MNIST tasks, revealing DAM specific transitions as n varies. The findings show strong effectiveness of rehearsal-based approaches, notable instability of gradient-based methods at intermediate vertices, and nuanced performance of regularization strategies that depend on data size and memory regime. These results establish a foundation for understanding DAM behavior under sequential learning and suggest directions for extending DAM to continuous domains and exploring its attractor dynamics under task sequences.

Abstract

Sequential learning involves learning tasks in a sequence, and proves challenging for most neural networks. Biological neural networks regularly conquer the sequential learning challenge and are even capable of transferring knowledge both forward and backwards between tasks. Artificial neural networks often totally fail to transfer performance between tasks, and regularly suffer from degraded performance or catastrophic forgetting on previous tasks. Models of associative memory have been used to investigate the discrepancy between biological and artificial neural networks due to their biological ties and inspirations, of which the Hopfield network is the most studied model. The Dense Associative Memory (DAM), or modern Hopfield network, generalizes the Hopfield network, allowing for greater capacities and prototype learning behaviors, while still retaining the associative memory structure. We give a substantial review of the sequential learning space with particular respect to the Hopfield network and associative memories. We perform foundational benchmarks of sequential learning in the DAM using various sequential learning techniques, and analyze the results of the sequential learning to demonstrate previously unseen transitions in the behavior of the DAM. This paper also discusses the departure from biological plausibility that may affect the utility of the DAM as a tool for studying biological neural networks. We present our findings, including the effectiveness of a range of state-of-the-art sequential learning methods when applied to the DAM, and use these methods to further the understanding of DAM properties and behaviors.

Sequential Learning in the Dense Associative Memory

TL;DR

Abstract

Paper Structure (29 sections, 24 equations, 20 figures, 2 tables, 1 algorithm)

This paper contains 29 sections, 24 equations, 20 figures, 2 tables, 1 algorithm.

Introduction
Literature Review
The Hopfield Network and Dense Associative Memory
Sequential Learning in the Hopfield Network
Sequential Learning Methods
Hyperparameter Tuning and Experiment Design
Dense Associative Memory Formalization and Hyperparameters
Sequential Learning Methods Hyperparameters
Experimental Design
Experiment Results
Sequential Learning Methods Hyperparameter Tuning
Rehearsal Methods
GEM and A-GEM
Regularization-Based Methods
Tuned Sequential Learning Methods
...and 14 more sections

Figures (20)

Figure 1: Hyperparameter search over rehearsal proportion, measuring the average accuracy on the test data split. A rehearsal proportion of $0.0$ corresponds to vanilla learning. For naive rehearsal, a proportion of $1.0$ corresponds to presenting all previous tasks alongside the new task. Different interaction vertices, $n$, are shown by color. A higher average accuracy reflects better performance on sequential learning tasks.
Figure 2: Pseudorehearsal hyperparameter search over rehearsal proportion, measuring the average accuracy on the test data split. Note the legend in these Figures is different from others in this Section. These figures explore finer granularity at low and high interaction vertices.
Figure 3: Gradient Episodic Memories hyperparameter search, measuring the average accuracy on the test data split. A memory proportion of $0.0$ corresponds to vanilla learning, while $1.0$ checks the gradient across all previous task items. A higher average accuracy reflects better performance on sequential learning tasks.
Figure 4: Hyperparameter searches for regularization-based sequential learning methods over the regularization hyperparameter $\lambda$, measuring the average accuracy on the test data split. Individual trials are shown as points, and a moving window of $20$ trials is used to calculate the average (solid lines) and standard deviation (error band). A higher average accuracy reflects better performance on sequential learning tasks.
Figure 5: Elastic Weight Consolidation hyperparameter search using $10000$ items per task. Compare this to Figure \ref{['Fig: EWC Small Data Hyperparameter Search']}, which uses $2000$ items per task.
...and 15 more figures

Sequential Learning in the Dense Associative Memory

TL;DR

Abstract

Sequential Learning in the Dense Associative Memory

Authors

TL;DR

Abstract

Table of Contents

Figures (20)