Table of Contents
Fetching ...

Variation-Resilient FeFET-Based In-Memory Computing Leveraging Probabilistic Deep Learning

Bibhas Manna, Arnob Saha, Zhouhang Jiang, Kai Ni, Abhronil Sengupta

TL;DR

This work tackles reliability challenges in FeFET-based in-memory computing caused by device-scale variations. It introduces a variation-aware Bayesian training framework that integrates an experimentally derived conductance-variation model into a Bayesian Neural Network, enabling robust inference across device sizes and read voltages. The approach preserves near-ideal accuracy for MNIST on shallow networks and shows modest losses for CIFAR10 with deeper networks, demonstrating a practical hardware–software co-design for emerging non-volatile memories. The study highlights the value of device-specific priors in probabilistic learning for crossbar-based acceleration and points to future work on correlated noise and broader hardware integration.

Abstract

Reliability issues stemming from device level non-idealities of non-volatile emerging technologies like ferroelectric field-effect transistors (FeFET), especially at scaled dimensions, cause substantial degradation in the accuracy of In-Memory crossbar-based AI systems. In this work, we present a variation-aware design technique to characterize the device level variations and to mitigate their impact on hardware accuracy employing a Bayesian Neural Network (BNN) approach. An effective conductance variation model is derived from the experimental measurements of cycle-to-cycle (C2C) and device-to-device (D2D) variations performed on FeFET devices fabricated using 28 nm high-$k$ metal gate technology. The variations were found to be a function of different conductance states within the given programming range, which sharply contrasts earlier efforts where a fixed variation dispersion was considered for all conductance values. Such variation characteristics formulated for three different device sizes at different read voltages were provided as prior variation information to the BNN to yield a more exact and reliable inference. Near-ideal accuracy for shallow networks (MLP5 and LeNet models) on the MNIST dataset and limited accuracy decline by $\sim$3.8-16.1% for deeper AlexNet models on CIFAR10 dataset under a wide range of variations corresponding to different device sizes and read voltages, demonstrates the efficacy of our proposed device-algorithm co-design technique.

Variation-Resilient FeFET-Based In-Memory Computing Leveraging Probabilistic Deep Learning

TL;DR

This work tackles reliability challenges in FeFET-based in-memory computing caused by device-scale variations. It introduces a variation-aware Bayesian training framework that integrates an experimentally derived conductance-variation model into a Bayesian Neural Network, enabling robust inference across device sizes and read voltages. The approach preserves near-ideal accuracy for MNIST on shallow networks and shows modest losses for CIFAR10 with deeper networks, demonstrating a practical hardware–software co-design for emerging non-volatile memories. The study highlights the value of device-specific priors in probabilistic learning for crossbar-based acceleration and points to future work on correlated noise and broader hardware integration.

Abstract

Reliability issues stemming from device level non-idealities of non-volatile emerging technologies like ferroelectric field-effect transistors (FeFET), especially at scaled dimensions, cause substantial degradation in the accuracy of In-Memory crossbar-based AI systems. In this work, we present a variation-aware design technique to characterize the device level variations and to mitigate their impact on hardware accuracy employing a Bayesian Neural Network (BNN) approach. An effective conductance variation model is derived from the experimental measurements of cycle-to-cycle (C2C) and device-to-device (D2D) variations performed on FeFET devices fabricated using 28 nm high- metal gate technology. The variations were found to be a function of different conductance states within the given programming range, which sharply contrasts earlier efforts where a fixed variation dispersion was considered for all conductance values. Such variation characteristics formulated for three different device sizes at different read voltages were provided as prior variation information to the BNN to yield a more exact and reliable inference. Near-ideal accuracy for shallow networks (MLP5 and LeNet models) on the MNIST dataset and limited accuracy decline by 3.8-16.1% for deeper AlexNet models on CIFAR10 dataset under a wide range of variations corresponding to different device sizes and read voltages, demonstrates the efficacy of our proposed device-algorithm co-design technique.
Paper Structure (9 sections, 5 equations, 6 figures)

This paper contains 9 sections, 5 equations, 6 figures.

Figures (6)

  • Figure 1: (a) TEM cross-section and schematic representation of FeFET fabricated on 28nm HKMG node with doped $HfO_{2}$ serving as the ferroelectric. (b) Conductance-programming voltage characteristics of FeFET for three different device dimensions at read voltage, $V_{Read}$, of 1.2V.
  • Figure 2: Filled error plot showing mean (solid line) and associated standard deviation (broadening) of conductance states for (a) cycle-to-cycle (C2C) variations measured over 50 consecutive programming cycles for each amplitude of $V_{PRG}$ corresponding to a single device of each device size and (b) device-to-device (D2D) variations recorded over 3 devices of the same size by running a single programming pulse for each amplitude of $V_{PRG}$ at $V_{Read}$ of 1.2V.
  • Figure 3: (a) Simulation results showing D2D variations computed over 200 devices for different number of domains in the FE layer. (b) The standard deviation plotted against mean of the D2D variations after normalizing the conductance programming data to a maximum value of unity.
  • Figure 4: (a) The standard deviation, $\sigma_{com}$, as a function of mean, $\mu_{com}$, of the variation in FeFET programming combining both C2C and D2D measurement data. (b) The severity of variations, $\sigma_{com}$/$\mu_{com}$, has been plotted against different values of mean, $\mu_{com}$, at $V_{Read}$ of 0.6V and 1.2V.
  • Figure 5: Bar-chart comparison of inference accuracy for different network models under variations corresponding to different device sizes (following Eqn. (5)) at $V_{Read}$ of 0.6V, employing (a) proposed Bayesian and (b) Non-Bayesian frameworks. (c) Inference accuracy of network models trained under Bayesian framework but all network weights are subjected to a fixed amount of variation, $\sigma_{F}$, irrespective of programmed conductance state. (d)-(f) Inference performance results for different network models evaluated at $V_{Read}$ of 1.2V applying the same respective schemes as in (a)-(c). The inference outputs in the Bayesian frameworks have been derived by injecting noises (Eqn. (5)) to the trained mean weights and averaging over five runs.
  • ...and 1 more figures