Table of Contents
Fetching ...

A Reconfigurable Time-Domain In-Memory Computing Macro using FeFET-Based CAM with Multilevel Delay Calibration in 28 nm CMOS

Jeries Mattar, Mor M. Dahan, Stefan Dunkel, Halid Mulaosmanovic, Gunda Beernink, Sven Beyer, Eilam Yalon, Nicolás Wainstein

TL;DR

This work tackles data movement and energy efficiency bottlenecks in neural-network accelerators by introducing a reconfigurable time-domain nonvolatile IMC macro based on FeFETs. The architecture integrates a CAM array, a cascaded delay-element chain, and a time-to-digital converter in 28 nm CMOS, enabling XOR- and AND-based MAC as well as in-memory Boolean logic with sub-nanosecond delay steps. A key contribution is a bulk-assisted multilevel-state calibration that achieves fine delay tuning (around 100 ps resolution) and resilience to device variations, along with write-disturb prevention via isolated triple-well bulks. Experimental results demonstrate 222.2 MOPS per cell and 1887 TOPS/W at 0.85 V, using a 3×3 FeFET CAM and a 3-stage DE, indicating a practical path toward scalable, energy-efficient TD-nvIMC accelerators for edge AI.

Abstract

Time-domain nonvolatile in-memory computing (TD-nvIMC) offers a promising pathway to reduce data movement and improve energy efficiency by encoding computation in delay rather than voltage or current. This work presents a fully integrated and reconfigurable TD-nvIMC macro, fabricated in 28 nm CMOS, that combines a ferroelectric FET (FeFET)-based content-addressable memory array, a cascaded delay element chain, and a time-to-digital converter. The architecture supports binary multiply-and-accumulate (MAC) operations using XOR- and AND-based matching, as well as in-memory Boolean logic and arithmetic functions. Sub-nanosecond MAC resolution is achieved through experimentally demonstrated 550 ps delay steps, representing a 2000$\times$ improvement over prior FeFET TD-nvIMC work, enabled by multilevel-state calibration with $\leq$ 100 ps resolution. Write-disturb resilience is ensured via isolated triple-well bulks. The proposed macro achieves a measured throughput of 222.2 MOPS/cell and energy efficiency of 1887 TOPS/W at 0.85 V, establishing a viable path toward scalable, energy-efficient TD-nvIMC accelerators.

A Reconfigurable Time-Domain In-Memory Computing Macro using FeFET-Based CAM with Multilevel Delay Calibration in 28 nm CMOS

TL;DR

This work tackles data movement and energy efficiency bottlenecks in neural-network accelerators by introducing a reconfigurable time-domain nonvolatile IMC macro based on FeFETs. The architecture integrates a CAM array, a cascaded delay-element chain, and a time-to-digital converter in 28 nm CMOS, enabling XOR- and AND-based MAC as well as in-memory Boolean logic with sub-nanosecond delay steps. A key contribution is a bulk-assisted multilevel-state calibration that achieves fine delay tuning (around 100 ps resolution) and resilience to device variations, along with write-disturb prevention via isolated triple-well bulks. Experimental results demonstrate 222.2 MOPS per cell and 1887 TOPS/W at 0.85 V, using a 3×3 FeFET CAM and a 3-stage DE, indicating a practical path toward scalable, energy-efficient TD-nvIMC accelerators for edge AI.

Abstract

Time-domain nonvolatile in-memory computing (TD-nvIMC) offers a promising pathway to reduce data movement and improve energy efficiency by encoding computation in delay rather than voltage or current. This work presents a fully integrated and reconfigurable TD-nvIMC macro, fabricated in 28 nm CMOS, that combines a ferroelectric FET (FeFET)-based content-addressable memory array, a cascaded delay element chain, and a time-to-digital converter. The architecture supports binary multiply-and-accumulate (MAC) operations using XOR- and AND-based matching, as well as in-memory Boolean logic and arithmetic functions. Sub-nanosecond MAC resolution is achieved through experimentally demonstrated 550 ps delay steps, representing a 2000 improvement over prior FeFET TD-nvIMC work, enabled by multilevel-state calibration with 100 ps resolution. Write-disturb resilience is ensured via isolated triple-well bulks. The proposed macro achieves a measured throughput of 222.2 MOPS/cell and energy efficiency of 1887 TOPS/W at 0.85 V, establishing a viable path toward scalable, energy-efficient TD-nvIMC accelerators.

Paper Structure

This paper contains 20 sections, 8 equations, 17 figures, 1 table.

Figures (17)

  • Figure 1: Challenges in nvIMC. (a) Comparison between conventional NVM and proposed nvIMC macro showing reduced data movement and power. (b) Normalized margin comparison showing improved throughput and enhanced noise margins in time-domain via calibration. (c) Radar chart benchmarking digital, voltage-domain, and time-domain IMC approaches across key performance metrics.
  • Figure 2: Schematic diagram of TD-nvIMC. The accumulated delay at the output is the sum of the stage delay, whose delay is proportional to $\sum_i W_{i,j}X_i$ for the enabled (EN) row.
  • Figure 3: Representative DE topologies integrated with memory (MEM) elements, where delay encodes the weight–activation multiplication result. (a) Multiplexer-based path selection. (b) Capacitive load modulation. (c) Tail current gating via external memory readout. (d) Programmable tail device for direct delay control..
  • Figure 4: Proposed TD-nvIMC architecture composed of a DE chain with CSI whose tail is implemented by a CAM cell and a leaker driving an inverter, TDC with tunable reference delay line, and I/O for observability.
  • Figure 5: C-AND schematic and write scheme. (a) First, all cells in the selected column are programmed to LVT. (b) Erase is performed by selecting the target cell ($WL=-4~V$) and setting $BuL=0~V$. Write-disturb is prevented on adjacent cells in the column with $BuL=-2~V$.
  • ...and 12 more figures