Table of Contents
Fetching ...

A 28 nm AI microcontroller with tightly coupled zero-standby power weight memory featuring standard logic compatible 4 Mb 4-bits/cell embedded flash technology

Daewung Kim, Seong Hwan Jeon, Young Hee Jeon, Kyung-Bae Kwon, Jigon Kim, Yeounghun Choi, Hyunseung Cha, Kitae Kwon, Daesik Park, Jongseuk Lee, Sihwan Kim, Seung-Hwan Song

TL;DR

The paper addresses the challenge of delivering low-power on-device AI inference for battery-powered edge devices by introducing a zero-standby-power weight memory built with 4-bits/cell Embedded FLASH tightly integrated with a Near-Memory Computing Unit. The architecture couples a 32-bit RISC-V core with a 4 Mb weight memory and an NMCU that uses two processing elements per macro, a ping-pong buffer, and efficient dataflow to minimize movement and support matrix-vector multiplies for TinyML. It also introduces a high-voltage generator and an overstress-free WL driver to enable reliable programming and read verification across 16 distinct memory states. Experimental results on a 28 nm prototype show robust operation, achieving 95.58% MNIST accuracy and 0.878 AUC for FC-Autoencoder after baking at 125 °C for 160 hours, demonstrating the viability of low-power edge AI with standard logic processes and tight memory-accelerator coupling.

Abstract

This study introduces a novel AI microcontroller optimized for cost-effective, battery-powered edge AI applications. Unlike traditional single bit/cell memory configurations, the proposed microcontroller integrates zero-standby power weight memory featuring standard logic compatible 4-bits/cell embedded flash technology tightly coupled to a Near-Memory Computing Unit. This architecture enables efficient and low-power AI acceleration. Advanced state mapping and an overstress-free word line (WL) driver circuit extend verify levels, ensuring robust 16 state cell margin. A ping-pong buffer reduces internal data movement while supporting simultaneous multi-bit processing. The fabricated microcontroller demonstrated high reliability, maintaining accuracy after 160 hours of unpowered baking at 125$^\circ$C.

A 28 nm AI microcontroller with tightly coupled zero-standby power weight memory featuring standard logic compatible 4 Mb 4-bits/cell embedded flash technology

TL;DR

The paper addresses the challenge of delivering low-power on-device AI inference for battery-powered edge devices by introducing a zero-standby-power weight memory built with 4-bits/cell Embedded FLASH tightly integrated with a Near-Memory Computing Unit. The architecture couples a 32-bit RISC-V core with a 4 Mb weight memory and an NMCU that uses two processing elements per macro, a ping-pong buffer, and efficient dataflow to minimize movement and support matrix-vector multiplies for TinyML. It also introduces a high-voltage generator and an overstress-free WL driver to enable reliable programming and read verification across 16 distinct memory states. Experimental results on a 28 nm prototype show robust operation, achieving 95.58% MNIST accuracy and 0.878 AUC for FC-Autoencoder after baking at 125 °C for 160 hours, demonstrating the viability of low-power edge AI with standard logic processes and tight memory-accelerator coupling.

Abstract

This study introduces a novel AI microcontroller optimized for cost-effective, battery-powered edge AI applications. Unlike traditional single bit/cell memory configurations, the proposed microcontroller integrates zero-standby power weight memory featuring standard logic compatible 4-bits/cell embedded flash technology tightly coupled to a Near-Memory Computing Unit. This architecture enables efficient and low-power AI acceleration. Advanced state mapping and an overstress-free word line (WL) driver circuit extend verify levels, ensuring robust 16 state cell margin. A ping-pong buffer reduces internal data movement while supporting simultaneous multi-bit processing. The fabricated microcontroller demonstrated high reliability, maintaining accuracy after 160 hours of unpowered baking at 125C.

Paper Structure

This paper contains 8 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: AI microcontroller featuring 4-bits/cell EFLASH technology tightly coupled to a near-memory computing unit
  • Figure 2: Near-Memory Computing Unit for efficient AI acceleration
  • Figure 3: Standard logic compatible high voltage generator for embedded flash program/erase operations
  • Figure 4: Overstress-free WL driver circuit of 4-bits/cell EFLASH with PMOS charging path: (a) for program operation, (b) for a program-verify read operation, and (c) for read operation.
  • Figure 5: (a) 4-bits/cell EFLASH state mapping table, (b) 16 states program-verify sequences, (c) measured VPP1-4 levels from the logic compatible charge pump, and (d) WL driver output signals (PWL/WWL) for verify operations of 4-bits/cell EFLASH cells
  • ...and 3 more figures