Table of Contents
Fetching ...

Predictive Attractor Models

Ramy Mounir, Sudeep Sarkar

TL;DR

PAM is a streaming model that learns a sequence in an online, continuous manner by observing each input only once and avoids catastrophic forgetting by uniquely representing past context through lateral inhibition in cortical minicolumns.

Abstract

Sequential memory, the ability to form and accurately recall a sequence of events or stimuli in the correct order, is a fundamental prerequisite for biological and artificial intelligence as it underpins numerous cognitive functions (e.g., language comprehension, planning, episodic memory formation, etc.) However, existing methods of sequential memory suffer from catastrophic forgetting, limited capacity, slow iterative learning procedures, low-order Markov memory, and, most importantly, the inability to represent and generate multiple valid future possibilities stemming from the same context. Inspired by biologically plausible neuroscience theories of cognition, we propose \textit{Predictive Attractor Models (PAM)}, a novel sequence memory architecture with desirable generative properties. PAM is a streaming model that learns a sequence in an online, continuous manner by observing each input \textit{only once}. Additionally, we find that PAM avoids catastrophic forgetting by uniquely representing past context through lateral inhibition in cortical minicolumns, which prevents new memories from overwriting previously learned knowledge. PAM generates future predictions by sampling from a union set of predicted possibilities; this generative ability is realized through an attractor model trained alongside the predictor. We show that PAM is trained with local computations through Hebbian plasticity rules in a biologically plausible framework. Other desirable traits (e.g., noise tolerance, CPU-based learning, capacity scaling) are discussed throughout the paper. Our findings suggest that PAM represents a significant step forward in the pursuit of biologically plausible and computationally efficient sequential memory models, with broad implications for cognitive science and artificial intelligence research.

Predictive Attractor Models

TL;DR

PAM is a streaming model that learns a sequence in an online, continuous manner by observing each input only once and avoids catastrophic forgetting by uniquely representing past context through lateral inhibition in cortical minicolumns.

Abstract

Sequential memory, the ability to form and accurately recall a sequence of events or stimuli in the correct order, is a fundamental prerequisite for biological and artificial intelligence as it underpins numerous cognitive functions (e.g., language comprehension, planning, episodic memory formation, etc.) However, existing methods of sequential memory suffer from catastrophic forgetting, limited capacity, slow iterative learning procedures, low-order Markov memory, and, most importantly, the inability to represent and generate multiple valid future possibilities stemming from the same context. Inspired by biologically plausible neuroscience theories of cognition, we propose \textit{Predictive Attractor Models (PAM)}, a novel sequence memory architecture with desirable generative properties. PAM is a streaming model that learns a sequence in an online, continuous manner by observing each input \textit{only once}. Additionally, we find that PAM avoids catastrophic forgetting by uniquely representing past context through lateral inhibition in cortical minicolumns, which prevents new memories from overwriting previously learned knowledge. PAM generates future predictions by sampling from a union set of predicted possibilities; this generative ability is realized through an attractor model trained alongside the predictor. We show that PAM is trained with local computations through Hebbian plasticity rules in a biologically plausible framework. Other desirable traits (e.g., noise tolerance, CPU-based learning, capacity scaling) are discussed throughout the paper. Our findings suggest that PAM represents a significant step forward in the pursuit of biologically plausible and computationally efficient sequential memory models, with broad implications for cognitive science and artificial intelligence research.
Paper Structure (58 sections, 2 theorems, 20 equations, 16 figures, 5 tables, 2 algorithms)

This paper contains 58 sections, 2 theorems, 20 equations, 16 figures, 5 tables, 2 algorithms.

Key Result

Theorem 1

Assume the likelihood $p(\boldsymbol{x}_t|\boldsymbol{z}_t)$ in eqn eqn:pam-vfe represents multiple possibilities using a Gaussian Mixture Model (GMM) conditioned on the latent state $\boldsymbol{z}_t$, as shown in eqn eqn:pam-ssm. The maximization of such log-likelihood function (i.e., $\frac{\part

Figures (16)

  • Figure 1: State Space Model. (Left): Dynamical system represented by first-order Markov chain of latent states $\boldsymbol{z}$ with transition function $f$ and an emission function $g$ which projects to the observation states $\boldsymbol{x}$. (Right): Gaussian form assumptions for the prior $\hat{\boldsymbol{z}}$ and posterior $\boldsymbol{z}$ latent states, and the Mixture of Gaussian model representing the conditional probability of multiple possibilities $p(\boldsymbol{x}|\boldsymbol{z})$
  • Figure 2: Sequence Generation. (Left): Offline generation by sampling a single possibility (i.e., attractor point) from a union of predicted possibilities. (Right): Online generation by removing noise from an observation using the prior beliefs about the observed state. Markov Blanket separates the agent's latent variables from the world observable states.
  • Figure 3: Quantitative results on (A-B) Offline sequence capacity, (C) Noise robustness, and (D) Time of sequence learning and recall. Qualitative results on highly correlated CIFAR sequence in (E) offline and (F) online settings. The mean and standard deviation of 10 trials are reported for all plots.
  • Figure 4: Qualitative results on (A) synthetic and (B) protein sequences backward transfer, and (C-D) multiple possibilities generation on text datasets. Qualitative results on (E) noise robustness on CLEVRER sequence, and (F) catastrophic forgetting on Moving MNIST dataset. highlights the first frame with significant error. The mean and standard deviation of 10 trials are reported for all plots.
  • Figure 5: Empirical Validation of Theorem \ref{['theorem:iou']}
  • ...and 11 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2