Table of Contents
Fetching ...

IMPACT:InMemory ComPuting Architecture Based on Y-FlAsh Technology for Coalesced Tsetlin Machine Inference

Omar Ghazal, Wei Wang, Shahar Kvatinsky, Farhad Merchant, Alex Yakovlev, Rishad Shafik

TL;DR

The paper addresses data movement and energy bottlenecks in ML by proposing IMPACT, an in-memory computing architecture built on Y-Flash memristors to execute coalesced Tsetlin machine (CoTM) inference. It couples a clause crossbar (Boolean TA-driven clause computation) with a class crossbar (analog weight-based classification), mapping TA actions to HCS/LCS and weights to conductances to realize $V = W \cdot C$ with $Y_i = 1$ if $V_i > 0$. On MNIST, IMPACT achieves 96.3% accuracy with competitive energy efficiency (TOPS/W ≈ 24.56, TOPS/mm$^2$ ≈ 0.17) and low datapoint energy (~67.99 pJ for the clause tile and ~16.22 pJ for the class tile) while demonstrating robust D2D and C2C variability performance. The architecture highlights scalable, energy-aware in-memory inference using two-terminal Y-Flash devices that self-select to suppress sneak paths, supporting future expansion to larger datasets like CIFAR-10 and ImageNet.

Abstract

The increasing demand for processing large volumes of data for machine learning models has pushed data bandwidth requirements beyond the capability of traditional von Neumann architecture. In-memory computing (IMC) has recently emerged as a promising solution to address this gap by enabling distributed data storage and processing at the micro-architectural level, significantly reducing both latency and energy. In this paper, we present the IMPACT: InMemory ComPuting Architecture Based on Y-FlAsh Technology for Coalesced Tsetlin Machine Inference, underpinned on a cutting-edge memory device, Y-Flash, fabricated on a 180 nm CMOS process. Y-Flash devices have recently been demonstrated for digital and analog memory applications, offering high yield, non-volatility, and low power consumption. The IMPACT leverages the Y-Flash array to implement the inference of a novel machine learning algorithm: coalesced Tsetlin machine (CoTM) based on propositional logic. CoTM utilizes Tsetlin automata (TA) to create Boolean feature selections stochastically across parallel clauses. The IMPACT is organized into two computational crossbars for storing the TA and weights. Through validation on the MNIST dataset, IMPACT achieved 96.3% accuracy. The IMPACT demonstrated improvements in energy efficiency, e.g., 2.23X over CNN-based ReRAM, 2.46X over Neuromorphic using NOR-Flash, and 2.06X over DNN-based PCM, suited for modern ML inference applications.

IMPACT:InMemory ComPuting Architecture Based on Y-FlAsh Technology for Coalesced Tsetlin Machine Inference

TL;DR

The paper addresses data movement and energy bottlenecks in ML by proposing IMPACT, an in-memory computing architecture built on Y-Flash memristors to execute coalesced Tsetlin machine (CoTM) inference. It couples a clause crossbar (Boolean TA-driven clause computation) with a class crossbar (analog weight-based classification), mapping TA actions to HCS/LCS and weights to conductances to realize with if . On MNIST, IMPACT achieves 96.3% accuracy with competitive energy efficiency (TOPS/W ≈ 24.56, TOPS/mm ≈ 0.17) and low datapoint energy (~67.99 pJ for the clause tile and ~16.22 pJ for the class tile) while demonstrating robust D2D and C2C variability performance. The architecture highlights scalable, energy-aware in-memory inference using two-terminal Y-Flash devices that self-select to suppress sneak paths, supporting future expansion to larger datasets like CIFAR-10 and ImageNet.

Abstract

The increasing demand for processing large volumes of data for machine learning models has pushed data bandwidth requirements beyond the capability of traditional von Neumann architecture. In-memory computing (IMC) has recently emerged as a promising solution to address this gap by enabling distributed data storage and processing at the micro-architectural level, significantly reducing both latency and energy. In this paper, we present the IMPACT: InMemory ComPuting Architecture Based on Y-FlAsh Technology for Coalesced Tsetlin Machine Inference, underpinned on a cutting-edge memory device, Y-Flash, fabricated on a 180 nm CMOS process. Y-Flash devices have recently been demonstrated for digital and analog memory applications, offering high yield, non-volatility, and low power consumption. The IMPACT leverages the Y-Flash array to implement the inference of a novel machine learning algorithm: coalesced Tsetlin machine (CoTM) based on propositional logic. CoTM utilizes Tsetlin automata (TA) to create Boolean feature selections stochastically across parallel clauses. The IMPACT is organized into two computational crossbars for storing the TA and weights. Through validation on the MNIST dataset, IMPACT achieved 96.3% accuracy. The IMPACT demonstrated improvements in energy efficiency, e.g., 2.23X over CNN-based ReRAM, 2.46X over Neuromorphic using NOR-Flash, and 2.06X over DNN-based PCM, suited for modern ML inference applications.

Paper Structure

This paper contains 12 sections, 8 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: The coalesced Tsetlin machine algorithm data pipeline consisting of (a) the learning element of CoTM, Tsetlin Automata (TA), (b) the interaction of the TA with the related Boolean literals forms the clauses, and (c) the clauses' weight matrix to produce the class output vector (Y).
  • Figure 2: (a) The structure of a Y-Flash cell. (b) The symbol of two terminal Y-Flash devices. (c) The Y-Flash characteristics under negative bias $(V_{DS} < 0\ V)$ and positive bias $(V_{DS} > 0\ V)$ reading cycle where (e) and (f) are the two biasing configurations. (d) The simulated DC behavior for $(I_D - V_D)$ and $(I_{SR} - V_D)$ characteristics of the Y-Flash device during a reading pulse $(V_R = 2\ V)$ showing the two distinct Boolean conductance states HCS and LCS.
  • Figure 3: The simulated analog turnability behavior of the Y-Flash device. (a) Programming from HCS to LCS. (b) Erasing from LCS to HCS. (c) Conductance values were measured at $(V_R=2V)$ for the programming cycles in (a). (d) Conductance values were measured at $(V_R=2V)$ for the erasing cycles in (b).
  • Figure 4: The IMPACT architecture shows the two designed arrays. (a) The clause crossbar tile functions in Boolean conductance mode for clause computation. (b) Class crossbar tile works in analog tunable conductance mode for class computation.
  • Figure 5: Generating the Boolean Clause. (a) Current sense amplifier design. (b) Detecting the include of literal "0" during the reading cycle. Where $TA_1$ in the clause is storing an include action and interacted with input literal "0". TAs $(2\ to\ 1024)$ interacted with literal "1", and TAs $(1025\ to\ 2048)$ storing exclude actions and all interacted with input literal "0". (c) Testing the worst-case scenario during the reading cycle. Where all the TAs in a clause storing exclude actions, half of them received input literals "0" while the other half received input literals "1".
  • ...and 9 more figures