Table of Contents
Fetching ...

FeReX: A Reconfigurable Design of Multi-bit Ferroelectric Compute-in-Memory for Nearest Neighbor Search

Zhicheng Xu, Che-Kai Liu, Chao Li, Ruibin Mao, Jianyi Yang, Thomas Kämpfe, Mohsen Imani, Can Li, Cheng Zhuo, Xunzhao Yin

TL;DR

FeReX tackles the data-transfer bottleneck in AI by delivering a reconfigurable, FeFET-based compute-in-memory associative memory capable of executing multiple distance metrics. It introduces a CSP-driven encoding pipeline that maps target distance matrices onto a 1FeFET1R crossbar and uses an LTA circuit to perform nearest-neighbor search, enabling Hamming, Manhattan, and Euclidean metrics within a single AM. The approach yields up to 250× speedup and up to 10^4× energy savings over GPU implementations, validated through device-circuit simulations and benchmarks on KNN and hyperdimensional computing tasks. This work provides the first reconfigurable distance-search AM in NVM hardware, with strong implications for versatile, energy-efficient CIM accelerators in ML and AI inference.

Abstract

Rapid advancements in artificial intelligence have given rise to transformative models, profoundly impacting our lives. These models demand massive volumes of data to operate effectively, exacerbating the data-transfer bottleneck inherent in the conventional von-Neumann architecture. Compute-in-memory (CIM), a novel computing paradigm, tackles these issues by seamlessly embedding in-memory search functions, thereby obviating the need for data transfers. However, existing non-volatile memory (NVM)-based accelerators are application specific. During the similarity based associative search operation, they only support a single, specific distance metric, such as Hamming, Manhattan, or Euclidean distance in measuring the query against the stored data, calling for reconfigurable in-memory solutions adaptable to various applications. To overcome such a limitation, in this paper, we present FeReX, a reconfigurable associative memory (AM) that accommodates various distance metrics including Hamming, Manhattan, and Euclidean distances. Leveraging multi-bit ferroelectric field-effect transistors (FeFETs) as the proxy and a hardware-software co-design approach, we introduce a constrained satisfaction problem (CSP)-based method to automate AM search input voltage and stored voltage configurations for different distance based search functions. Device-circuit co-simulations first validate the effectiveness of the proposed FeReX methodology for reconfigurable search distance functions. Then, we benchmark FeReX in the context of k-nearest neighbor (KNN) and hyperdimensional computing (HDC), which highlights the robustness of FeReX and demonstrates up to 250x speedup and 10^4 energy savings compared with GPU.

FeReX: A Reconfigurable Design of Multi-bit Ferroelectric Compute-in-Memory for Nearest Neighbor Search

TL;DR

FeReX tackles the data-transfer bottleneck in AI by delivering a reconfigurable, FeFET-based compute-in-memory associative memory capable of executing multiple distance metrics. It introduces a CSP-driven encoding pipeline that maps target distance matrices onto a 1FeFET1R crossbar and uses an LTA circuit to perform nearest-neighbor search, enabling Hamming, Manhattan, and Euclidean metrics within a single AM. The approach yields up to 250× speedup and up to 10^4× energy savings over GPU implementations, validated through device-circuit simulations and benchmarks on KNN and hyperdimensional computing tasks. This work provides the first reconfigurable distance-search AM in NVM hardware, with strong implications for versatile, energy-efficient CIM accelerators in ML and AI inference.

Abstract

Rapid advancements in artificial intelligence have given rise to transformative models, profoundly impacting our lives. These models demand massive volumes of data to operate effectively, exacerbating the data-transfer bottleneck inherent in the conventional von-Neumann architecture. Compute-in-memory (CIM), a novel computing paradigm, tackles these issues by seamlessly embedding in-memory search functions, thereby obviating the need for data transfers. However, existing non-volatile memory (NVM)-based accelerators are application specific. During the similarity based associative search operation, they only support a single, specific distance metric, such as Hamming, Manhattan, or Euclidean distance in measuring the query against the stored data, calling for reconfigurable in-memory solutions adaptable to various applications. To overcome such a limitation, in this paper, we present FeReX, a reconfigurable associative memory (AM) that accommodates various distance metrics including Hamming, Manhattan, and Euclidean distances. Leveraging multi-bit ferroelectric field-effect transistors (FeFETs) as the proxy and a hardware-software co-design approach, we introduce a constrained satisfaction problem (CSP)-based method to automate AM search input voltage and stored voltage configurations for different distance based search functions. Device-circuit co-simulations first validate the effectiveness of the proposed FeReX methodology for reconfigurable search distance functions. Then, we benchmark FeReX in the context of k-nearest neighbor (KNN) and hyperdimensional computing (HDC), which highlights the robustness of FeReX and demonstrates up to 250x speedup and 10^4 energy savings compared with GPU.
Paper Structure (11 sections, 8 figures, 3 tables, 1 algorithm)

This paper contains 11 sections, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: (a) 1FeFET1R structure. (b) multi-level I-V curve of 1FeFET1R, where $V_{t0}$, $V_{t1}$, $V_{t2}$ represent different $V_{th}$ stored in the FeFET, $V_{s0}$, $V_{s1}$, $V_{s2}$ represent different search voltage (i.e, $V_{gs}$) applied to the FeFET, and two different $V_{ds}$ result in two level of ON currents.
  • Figure 2: (a) FeReX AM overview. (b) LTA and (c) Interface circuit.
  • Figure 3: Workflow of FeReX's encoding scheme.
  • Figure 4: (a) DM of 2-bit Hamming Distance. (b) Encoding with FeReX circuit. The stored encoding corresponds to programmed $V_{th}$ values, while the search encoding corresponds to FeFET's $V_{ds}$ and $V_{gs}$ voltages. (c) DM element decomposition process based on the number of FeFETs in an AM cell. (d) and (e) The two constraint examples, where (d) for the same search voltage, the current of an FeFET must either be identical or 0, and (e) if $\text{FeFET}_{Search11,Store00,2}$ is ON, $\text{FeFET}_{Search11,Store01,2}$ is OFF, a conflict occurs if $\text{FeFET}_{Search00,Store00,2}$ is OFF and $\text{FeFET}_{Search00,Store01,2}$ is ON.
  • Figure 5: Encoding Feasible Region from algorithm. \ref{['alg:feasibilitydetection']} to the store/search voltage configurations for a single FeFET device.
  • ...and 3 more figures