In-Memory Learning Automata Architecture using Y-Flash Cell
Omar Ghazal, Tian Lan, Shalman Ojukwu, Komal Krishnamurthy, Alex Yakovlev, Rishad Shafik
TL;DR
The paper tackles the data-movement bottleneck in machine learning by leveraging in-memory computing with Y-Flash floating-gate memristors to implement Tsetlin Machine learning automata directly in memory. It introduces a mapping where TA states are encoded as analog conductances of a single Y-Flash cell, enabled by its dual-transistor structure and CMOS compatibility, and it validates this approach with device-level characterization and a compact TA-to-conductance mapping. The results show 40 discrete TA states per device (extendable to ~1000) and robust D2D/C2C performance across 100 devices and 250 cycles, with quantifiable read/write energies and times. A XOR-based training demonstration illustrates that TA dynamics can be efficiently realized with a reduced write-pulse budget, indicating scalable, edge-friendly in-memory learning. This work suggests a viable path toward energy-efficient, scalable on-edge TM implementations using standard-CMOS compatible memristors.
Abstract
The modern implementation of machine learning architectures faces significant challenges due to frequent data transfer between memory and processing units. In-memory computing, primarily through memristor-based analog computing, offers a promising solution to overcome this von Neumann bottleneck. In this technology, data processing and storage are located inside the memory. Here, we introduce a novel approach that utilizes floating-gate Y-Flash memristive devices manufactured with a standard 180 nm CMOS process. These devices offer attractive features, including analog tunability and moderate device-to-device variation; such characteristics are essential for reliable decision-making in ML applications. This paper uses a new machine learning algorithm, the Tsetlin Machine (TM), for in-memory processing architecture. The TM's learning element, Automaton, is mapped into a single Y-Flash cell, where the Automaton's range is transferred into the Y-Flash's conductance scope. Through comprehensive simulations, the proposed hardware implementation of the learning automata, particularly for Tsetlin machines, has demonstrated enhanced scalability and on-edge learning capabilities.
