Analog fast Fourier transforms for scalable and efficient signal processing
T. Patrick Xiao, Ben Feinberg, David K. Richardson, Matthew Cannon, Calvin Madsen, Harsha Medu, Vineet Agrawal, Matthew J. Marinella, Sapan Agarwal, Christopher H. Bennett
TL;DR
The paper demonstrates that the fast Fourier transform can be mapped onto analog in-memory computing architectures to achieve scalable, energy-efficient processing of large DFTs. By leveraging Cooley–Tukey factorization and a SONOS charge-trapping memory crossbar, the authors realize large-scale analog FFTs with two-stage or multi-stage MVMs, enabling up to 65,536-point DFTs and 2D vector-radix FFTs for image processing. Experimental proof-of-concept on a 1024×1024 SONOS array, plus simulations for SAR and ASR workloads, show favorable energy and area scaling ($O(N \log_K N)$) compared with direct MVMs and state-of-the-art digital FFTs, along with competitive accuracy (PSNR > 25 dB for images and ~1–2% RMS dot-product error). The work highlights a flexible, reconfigurable analog accelerator fabric capable of FFTs and AI workloads, with potential impact on edge DSP and ML efficiency. Overall, this AFFT approach decouples transform size from hardware size and suggests broad applicability to signal processing kernels beyond FFTs.
Abstract
Edge devices are being deployed at increasing volumes to sense and act on information from the physical world. The discrete Fourier transform (DFT) is often necessary to make this sensed data suitable for further processing -- such as by artificial intelligence (AI) algorithms -- and for transmission over communication networks. Analog in-memory computing has been shown to be a fast, energy-efficient, and scalable solution for processing edge AI workloads, but not for Fourier transforms. This is because of the existence of the fast Fourier transform (FFT) algorithm, which enormously reduces the complexity of the DFT but has so far belonged only to digital processors. Here, we show that the FFT can be mapped to analog in-memory computing systems, enabling them to efficiently scale to arbitrarily large Fourier transforms without requiring large sizes or large numbers of non-volatile memory arrays. We experimentally demonstrate analog FFTs on 1D audio and 2D image signals, performing analog computations on up to 524K charge-trapping memory devices simultaneously, where each device has precisely tunable, low-conductance analog states. The scalability of both the new analog FFT approach and the charge-trapping memory device is leveraged to compute a 65,536-point analog DFT, a scale that is otherwise inaccessible by analog systems and which is $>$500$\times$ larger than any previous analog DFT demonstration. Analog FFT cores can provide higher energy efficiency and performance per area than specialized digital FFT processors at all FFT sizes, while also functioning as efficient matrix multiplication engines for AI workloads.
