FeReX: A Reconfigurable Design of Multi-bit Ferroelectric Compute-in-Memory for Nearest Neighbor Search
Zhicheng Xu, Che-Kai Liu, Chao Li, Ruibin Mao, Jianyi Yang, Thomas Kämpfe, Mohsen Imani, Can Li, Cheng Zhuo, Xunzhao Yin
TL;DR
FeReX tackles the data-transfer bottleneck in AI by delivering a reconfigurable, FeFET-based compute-in-memory associative memory capable of executing multiple distance metrics. It introduces a CSP-driven encoding pipeline that maps target distance matrices onto a 1FeFET1R crossbar and uses an LTA circuit to perform nearest-neighbor search, enabling Hamming, Manhattan, and Euclidean metrics within a single AM. The approach yields up to 250× speedup and up to 10^4× energy savings over GPU implementations, validated through device-circuit simulations and benchmarks on KNN and hyperdimensional computing tasks. This work provides the first reconfigurable distance-search AM in NVM hardware, with strong implications for versatile, energy-efficient CIM accelerators in ML and AI inference.
Abstract
Rapid advancements in artificial intelligence have given rise to transformative models, profoundly impacting our lives. These models demand massive volumes of data to operate effectively, exacerbating the data-transfer bottleneck inherent in the conventional von-Neumann architecture. Compute-in-memory (CIM), a novel computing paradigm, tackles these issues by seamlessly embedding in-memory search functions, thereby obviating the need for data transfers. However, existing non-volatile memory (NVM)-based accelerators are application specific. During the similarity based associative search operation, they only support a single, specific distance metric, such as Hamming, Manhattan, or Euclidean distance in measuring the query against the stored data, calling for reconfigurable in-memory solutions adaptable to various applications. To overcome such a limitation, in this paper, we present FeReX, a reconfigurable associative memory (AM) that accommodates various distance metrics including Hamming, Manhattan, and Euclidean distances. Leveraging multi-bit ferroelectric field-effect transistors (FeFETs) as the proxy and a hardware-software co-design approach, we introduce a constrained satisfaction problem (CSP)-based method to automate AM search input voltage and stored voltage configurations for different distance based search functions. Device-circuit co-simulations first validate the effectiveness of the proposed FeReX methodology for reconfigurable search distance functions. Then, we benchmark FeReX in the context of k-nearest neighbor (KNN) and hyperdimensional computing (HDC), which highlights the robustness of FeReX and demonstrates up to 250x speedup and 10^4 energy savings compared with GPU.
