Table of Contents
Fetching ...

Stochastic Engrams for Efficient Continual Learning with Binarized Neural Networks

Isabelle Aguilar, Luis Fernando Herbozo Contreras, Omid Kavehei

TL;DR

Catastrophic forgetting limits continual learning in artificial networks. The authors propose engramBNN, a binarized neural network augmented with stochastic engram gating blocks and a metaplasticity regulator, including the gating mechanism $f_{\rm meta}(m, W_h) = 1 - \tanh^2(m \cdot W_h)$. The method yields strong average performance on domain- and class-incremental tasks (e.g., Split-MNIST and CORe50-NI) while dramatically reducing GPU/RAM usage, making it suitable for edge-enabled lifelong learning. The work highlights the practicality of neuroscience-inspired memory traces for scalable, energy-efficient continual learning and points to future extensions with alternative backbones and real-world deployment.

Abstract

The ability to learn continuously in artificial neural networks (ANNs) is often limited by catastrophic forgetting, a phenomenon in which new knowledge becomes dominant. By taking mechanisms of memory encoding in neuroscience (aka. engrams) as inspiration, we propose a novel approach that integrates stochastically-activated engrams as a gating mechanism for metaplastic binarized neural networks (mBNNs). This method leverages the computational efficiency of mBNNs combined with the robustness of probabilistic memory traces to mitigate forgetting and maintain the model's reliability. Previously validated metaplastic optimization techniques have been incorporated to enhance synaptic stability further. Compared to baseline binarized models and benchmark fully connected continual learning approaches, our method is the only strategy capable of reaching average accuracies over 20% in class-incremental scenarios and achieving comparable domain-incremental results to full precision state-of-the-art methods. Furthermore, we achieve a significant reduction in peak GPU and RAM usage, under 5% and 20%, respectively. Our findings demonstrate (A) an improved stability vs. plasticity trade-off, (B) a reduced memory intensiveness, and (C) an enhanced performance in binarized architectures. By uniting principles of neuroscience and efficient computing, we offer new insights into the design of scalable and robust deep learning systems.

Stochastic Engrams for Efficient Continual Learning with Binarized Neural Networks

TL;DR

Catastrophic forgetting limits continual learning in artificial networks. The authors propose engramBNN, a binarized neural network augmented with stochastic engram gating blocks and a metaplasticity regulator, including the gating mechanism . The method yields strong average performance on domain- and class-incremental tasks (e.g., Split-MNIST and CORe50-NI) while dramatically reducing GPU/RAM usage, making it suitable for edge-enabled lifelong learning. The work highlights the practicality of neuroscience-inspired memory traces for scalable, energy-efficient continual learning and points to future extensions with alternative backbones and real-world deployment.

Abstract

The ability to learn continuously in artificial neural networks (ANNs) is often limited by catastrophic forgetting, a phenomenon in which new knowledge becomes dominant. By taking mechanisms of memory encoding in neuroscience (aka. engrams) as inspiration, we propose a novel approach that integrates stochastically-activated engrams as a gating mechanism for metaplastic binarized neural networks (mBNNs). This method leverages the computational efficiency of mBNNs combined with the robustness of probabilistic memory traces to mitigate forgetting and maintain the model's reliability. Previously validated metaplastic optimization techniques have been incorporated to enhance synaptic stability further. Compared to baseline binarized models and benchmark fully connected continual learning approaches, our method is the only strategy capable of reaching average accuracies over 20% in class-incremental scenarios and achieving comparable domain-incremental results to full precision state-of-the-art methods. Furthermore, we achieve a significant reduction in peak GPU and RAM usage, under 5% and 20%, respectively. Our findings demonstrate (A) an improved stability vs. plasticity trade-off, (B) a reduced memory intensiveness, and (C) an enhanced performance in binarized architectures. By uniting principles of neuroscience and efficient computing, we offer new insights into the design of scalable and robust deep learning systems.

Paper Structure

This paper contains 16 sections, 6 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Continual learning mechanisms in biological and artificial networks. In the brain, experiences and stimuli are inputted into the system, forming engrams that encode and consolidate this information. As incoming memories flood the network, metaplasticity and synaptic plasticity allow the brain to adapt to additional knowledge and form corresponding engrams (green and blue) that interact or activate during memory recall. This process repeats throughout life, allowing continual learning capability in our brains.
  • Figure 2: Engram block architecture. The engram block includes a two-layer encoder, one-layer latent space, one-layer linear layer, and a stochastic activation to output a gating vector to induce engrams for the metaplastic BNN.
  • Figure 3: Test accuracies on Split-MNIST. In Split-MNIST, two additional and unseen classes are introduced to the model for every sequential task. (a) Final test accuracies across five tasks in the Split-MNIST setting. Averaged over six runs, the solid curves are the training results for 20 epochs per task. The shaded areas represent the standard deviation. (b) The average test accuracy over each task.
  • Figure 4: Test accuracies on CORe50-NI.(a) Final test accuracies across eight tasks in CORe50. Averaged over three runs, solid curves represent the test accuracy, where each task has been trained for 20 epochs. Shaded areas represent one standard deviation. (b) Test accuracy of the third task in the CORe50-NI, experiment. The shaded gray area represents the learning phase of the models, which is cut off by a gray dotted line.