Table of Contents
Fetching ...

Probabilistic Metaplasticity for Continual Learning with Memristors

Fatima Tuz Zohora, Vedant Karia, Nicholas Soures, Dhireesha Kudithipudi

TL;DR

Probabilistic metaplasticity is proposed, which consolidates weights by modulating their update probability rather than magnitude, and eliminates high-precision modification to weight magnitudes and, consequently, the need for auxiliary high-precision memory.

Abstract

Edge devices operating in dynamic environments critically need the ability to continually learn without catastrophic forgetting. The strict resource constraints in these devices pose a major challenge to achieve this, as continual learning entails memory and computational overhead. Crossbar architectures using memristor devices offer energy efficiency through compute-in-memory and hold promise to address this issue. However, memristors often exhibit low precision and high variability in conductance modulation, rendering them unsuitable for continual learning solutions that require precise modulation of weight magnitude for consolidation. Current approaches fall short to address this challenge directly and rely on auxiliary high-precision memory, leading to frequent memory access, high memory overhead, and energy dissipation. In this research, we propose probabilistic metaplasticity, which consolidates weights by modulating their update probability rather than magnitude. The proposed mechanism eliminates high-precision modification to weight magnitudes and, consequently, the need for auxiliary high-precision memory. We demonstrate the efficacy of the proposed mechanism by integrating probabilistic metaplasticity into a spiking network trained on an error threshold with low-precision memristor weights. Evaluations of continual learning benchmarks show that probabilistic metaplasticity achieves performance equivalent to state-of-the-art continual learning models with high-precision weights while consuming ~ 67% lower memory for additional parameters and up to ~ 60x lower energy during parameter updates compared to an auxiliary memory-based solution. The proposed model shows potential for energy-efficient continual learning with low-precision emerging devices.

Probabilistic Metaplasticity for Continual Learning with Memristors

TL;DR

Probabilistic metaplasticity is proposed, which consolidates weights by modulating their update probability rather than magnitude, and eliminates high-precision modification to weight magnitudes and, consequently, the need for auxiliary high-precision memory.

Abstract

Edge devices operating in dynamic environments critically need the ability to continually learn without catastrophic forgetting. The strict resource constraints in these devices pose a major challenge to achieve this, as continual learning entails memory and computational overhead. Crossbar architectures using memristor devices offer energy efficiency through compute-in-memory and hold promise to address this issue. However, memristors often exhibit low precision and high variability in conductance modulation, rendering them unsuitable for continual learning solutions that require precise modulation of weight magnitude for consolidation. Current approaches fall short to address this challenge directly and rely on auxiliary high-precision memory, leading to frequent memory access, high memory overhead, and energy dissipation. In this research, we propose probabilistic metaplasticity, which consolidates weights by modulating their update probability rather than magnitude. The proposed mechanism eliminates high-precision modification to weight magnitudes and, consequently, the need for auxiliary high-precision memory. We demonstrate the efficacy of the proposed mechanism by integrating probabilistic metaplasticity into a spiking network trained on an error threshold with low-precision memristor weights. Evaluations of continual learning benchmarks show that probabilistic metaplasticity achieves performance equivalent to state-of-the-art continual learning models with high-precision weights while consuming ~ 67% lower memory for additional parameters and up to ~ 60x lower energy during parameter updates compared to an auxiliary memory-based solution. The proposed model shows potential for energy-efficient continual learning with low-precision emerging devices.
Paper Structure (18 sections, 16 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 16 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Probabilistic metaplasticity with error threshold-based training.a. Spiking network trained on error threshold where the network weights are realized with 1T1R memristor crossbar array. The dendritic compartments in the hidden and output layer neurons integrate the error. When the error reaches the threshold $U_{\text{th}}$, the memristor weights are updated to the next higher (negative error) or lower (positive error) conductance level with probability $p_{\text{update}}$. b. Mean and standard deviation of resistance levels versus programming compliance current for the 1T1R memristor device (shown in inset) adopted in this work liehr2020impact. c. Update probability of weights for different values of the metaplasticity coefficient $m$ and the weights $w$. $m$ is positively associated with the activity level of adjacent neurons. We see that weights with highly active adjacent neurons (high $m$) and weight magnitude (high $|w|$) lead to low update probability.
  • Figure 2: Evolution of task accuracies with sequential training on the split-MNIST benchmark. The x-axis shows the latest task the network has learned. We evaluate the performance of task $n$ only after the network encounters it and denote the accuracies as 0 before that. a. With no probabilistic metaplasticity, the network learns the current task well, but forgets the initial tasks after sequentially learning multiple tasks. b. Probabilistic metaplasticity with low activity threshold leads to high rigidity in the network, so it remembers previous tasks but cannot learn the last task. c. High activity threshold can lead to loss in previous task accuracy while the network remains plastic to learn the new tasks. d. Optimized activity threshold balances plasticity and rigidity such that the network maintains high initial task accuracies while maintaining the ability to learn new tasks.
  • Figure 3: Effect of number of training samples on continual learning. The mean accuracy across tasks shows degradation as the average number of samples per class reduces.
  • Figure 4: Mixed-signal architecture and computational flow for the continual learning mechanisms. We consider a mixed-signal architecture where memristor crossbars carry out the forward pass of the spiking model and the error computation, arithmetic and logical operations are carried out in the digital domain. The figure shows the computational flow for continual learning with probabilistic metaplasticity (left) and activity-dependent metaplasticity with gradient accumulation (right).
  • Figure 5: Energy consumption and stability-plasticity tradeoff.a. Energy consumption per sample and its breakdown during the parameter update phase for activity-dependent metaplasticity with gradient accumulation and probabilistic metaplasticity with individual and shared metaplasicity coefficients. b. Distribution of the product of metaplasticity coefficient $m$ and weight magnitude $|w|$ in the hidden layer with probabilistic metaplasticity. We see that as more weights share $m$, lower fraction of weights assume low $|mw|$ value. This indicates reduced plasticity in the network. c. Mean accuracy across tasks vs. the energy consumption per sample for the split-MNIST task. We observe a tradeoff between continual learning performance and energy consumption as the metaplasticity coefficients are shared across weights.