Table of Contents
Fetching ...

Analog fast Fourier transforms for scalable and efficient signal processing

T. Patrick Xiao, Ben Feinberg, David K. Richardson, Matthew Cannon, Calvin Madsen, Harsha Medu, Vineet Agrawal, Matthew J. Marinella, Sapan Agarwal, Christopher H. Bennett

TL;DR

The paper demonstrates that the fast Fourier transform can be mapped onto analog in-memory computing architectures to achieve scalable, energy-efficient processing of large DFTs. By leveraging Cooley–Tukey factorization and a SONOS charge-trapping memory crossbar, the authors realize large-scale analog FFTs with two-stage or multi-stage MVMs, enabling up to 65,536-point DFTs and 2D vector-radix FFTs for image processing. Experimental proof-of-concept on a 1024×1024 SONOS array, plus simulations for SAR and ASR workloads, show favorable energy and area scaling ($O(N \log_K N)$) compared with direct MVMs and state-of-the-art digital FFTs, along with competitive accuracy (PSNR > 25 dB for images and ~1–2% RMS dot-product error). The work highlights a flexible, reconfigurable analog accelerator fabric capable of FFTs and AI workloads, with potential impact on edge DSP and ML efficiency. Overall, this AFFT approach decouples transform size from hardware size and suggests broad applicability to signal processing kernels beyond FFTs.

Abstract

Edge devices are being deployed at increasing volumes to sense and act on information from the physical world. The discrete Fourier transform (DFT) is often necessary to make this sensed data suitable for further processing -- such as by artificial intelligence (AI) algorithms -- and for transmission over communication networks. Analog in-memory computing has been shown to be a fast, energy-efficient, and scalable solution for processing edge AI workloads, but not for Fourier transforms. This is because of the existence of the fast Fourier transform (FFT) algorithm, which enormously reduces the complexity of the DFT but has so far belonged only to digital processors. Here, we show that the FFT can be mapped to analog in-memory computing systems, enabling them to efficiently scale to arbitrarily large Fourier transforms without requiring large sizes or large numbers of non-volatile memory arrays. We experimentally demonstrate analog FFTs on 1D audio and 2D image signals, performing analog computations on up to 524K charge-trapping memory devices simultaneously, where each device has precisely tunable, low-conductance analog states. The scalability of both the new analog FFT approach and the charge-trapping memory device is leveraged to compute a 65,536-point analog DFT, a scale that is otherwise inaccessible by analog systems and which is $>$500$\times$ larger than any previous analog DFT demonstration. Analog FFT cores can provide higher energy efficiency and performance per area than specialized digital FFT processors at all FFT sizes, while also functioning as efficient matrix multiplication engines for AI workloads.

Analog fast Fourier transforms for scalable and efficient signal processing

TL;DR

The paper demonstrates that the fast Fourier transform can be mapped onto analog in-memory computing architectures to achieve scalable, energy-efficient processing of large DFTs. By leveraging Cooley–Tukey factorization and a SONOS charge-trapping memory crossbar, the authors realize large-scale analog FFTs with two-stage or multi-stage MVMs, enabling up to 65,536-point DFTs and 2D vector-radix FFTs for image processing. Experimental proof-of-concept on a 1024×1024 SONOS array, plus simulations for SAR and ASR workloads, show favorable energy and area scaling () compared with direct MVMs and state-of-the-art digital FFTs, along with competitive accuracy (PSNR > 25 dB for images and ~1–2% RMS dot-product error). The work highlights a flexible, reconfigurable analog accelerator fabric capable of FFTs and AI workloads, with potential impact on edge DSP and ML efficiency. Overall, this AFFT approach decouples transform size from hardware size and suggests broad applicability to signal processing kernels beyond FFTs.

Abstract

Edge devices are being deployed at increasing volumes to sense and act on information from the physical world. The discrete Fourier transform (DFT) is often necessary to make this sensed data suitable for further processing -- such as by artificial intelligence (AI) algorithms -- and for transmission over communication networks. Analog in-memory computing has been shown to be a fast, energy-efficient, and scalable solution for processing edge AI workloads, but not for Fourier transforms. This is because of the existence of the fast Fourier transform (FFT) algorithm, which enormously reduces the complexity of the DFT but has so far belonged only to digital processors. Here, we show that the FFT can be mapped to analog in-memory computing systems, enabling them to efficiently scale to arbitrarily large Fourier transforms without requiring large sizes or large numbers of non-volatile memory arrays. We experimentally demonstrate analog FFTs on 1D audio and 2D image signals, performing analog computations on up to 524K charge-trapping memory devices simultaneously, where each device has precisely tunable, low-conductance analog states. The scalability of both the new analog FFT approach and the charge-trapping memory device is leveraged to compute a 65,536-point analog DFT, a scale that is otherwise inaccessible by analog systems and which is 500 larger than any previous analog DFT demonstration. Analog FFT cores can provide higher energy efficiency and performance per area than specialized digital FFT processors at all FFT sizes, while also functioning as efficient matrix multiplication engines for AI workloads.
Paper Structure (37 sections, 20 equations, 31 figures, 4 tables)

This paper contains 37 sections, 20 equations, 31 figures, 4 tables.

Figures (31)

  • Figure 1: Processing large Fourier transforms using analog in-memory computing. (a) The direct MVM approach requires a large DFT matrix to be split across many arrays. (b) The analog Cooley-Tukey FFT factorizes the $N$-point DFT into smaller DFTs of size $N_1$ and $N_2$. Only the real part of the temporal signal and frequency spectrum are shown for simplicity. (c) Two of many ways to factorize a 65,536-point DFT using the analog FFT. The leaves of the trees are elementary DFTs mapped to analog MVMs, and the branches are Cooley-Tukey factorizations. (d) Comparison of how the number of ADC conversions scales with DFT size for the analog direct MVM and the analog FFT. We consider analog IMC systems with a maximum single-array DFT size of 16 points ($32 \times 64$ array size), and 256 points ($512 \times 1024$ array size). (e) A mesh fabric of analog IMC cores can accelerate a diverse range of workloads. The same cores for processing FFTs can be reprogrammed to execute DNN layers and other kernels.
  • Figure 2: DFT mapping onto a SONOS charge-trapping memory array. (a) Electrical schematic (top) and transmission electron microscope image (bottom) of the two-transistor SONOS memory cell. (b) Mapping a DFT operation with complex-valued weights and inputs to a resistive memory crossbar. The SONOS cell with the input-output connections in (a) is simplified in this schematic to a resistor. (c) Measured SONOS conductance profile for a DFT-16 matrix. (d) Measured SONOS conductance profile for a DFT-256 matrix. Inset shows a $32\times 32$ region of the programmed array. (e) Constellation of the complex-valued DFT weight values stored in the programmed SONOS devices, for DFT-16 and DFT-256. The ideal DFT weights lie along the unit circle (black dashed circle). (f) Accuracy of individual current-mode analog MVMs for 16-point and 256-point DFTs, showing the dominant sources of error in different current regimes. Data on more than $3.7\times10^7$ analog current sums are collected from processing the audio signal in Fig. \ref{['fig:audio_spectrograms']}.
  • Figure 3: Audio processing with analog FFTs. (a) Speech audio waveform with 65,536 samples. (b) Spectrogram of the audio waveform generated by FP32 STFTs, using a window size of 512 samples, a hop length of 128 samples, and a Hamming window function. The frequency resolution is 31.25 Hz. (c) Spectrogram generated experimentally using STFTs that are implemented with 512-point analog FFTs. The FFTs are factored into 32-point and 16-point analog DFTs that are executed on the SONOS array. (d) Power spectrum of the full audio waveform, computed by a 65,536-point analog FFT using the SONOS array (teal), compared to an FP32 digital FFT (black). The analog FFT was factored into 256-point analog DFTs that are executed on the SONOS array. Insets zoom in on two parts of the spectrum. The frequency resolution is 0.244 Hz. dBFS: decibels relative to full-scale.
  • Figure 4: Analog vector-radix FFT for 2D image processing. (a) Diagram of the analog 2D ($M \times N$) VR-FFT, which is composed of several analog DFT stages of smaller size. (b) Comparison of (left) a $256 \times 256$ input image with (right) the analog reconstruction of the same image. The input is a satellite overhead image of Rotterdam from the SpaceNet-6 dataset Shermeyer2020. The analog reconstruction is obtained by experimentally computing the analog VR-FFT of the image using the SONOS array (with $P=Q=R=S=16$), followed by an ideal digital IFFT. (c) 2D magnitude spectrum of the image in (b), computed using the SONOS array, showing one of three color channels. The right side shows the color-averaged error of the magnitude spectrum relative to that calculated by a 2D FFT at FP32 precision. (d-e), Original vs experimental analog reconstruction for two other $256 \times 256$ images. ("Orchid" photograph was taken by the author. "Sandia" photograph from Dorothy Harris, Wikimedia Commons, CC-BY-2.0 license.)
  • Figure 5: Simulated accuracy, energy, and performance scaling of analog Fourier transforms. (a) Formation of a 2048$\times$2048 range-azimuth image from raw SAR phase history data, using an analog VR-FFT as part of the polar format algorithm. The SSIM of the simulated image using analog FFTs (right) is computed relative to the formed image using a 2D FFT at FP32 precision (left). The mean and standard deviation of the SSIM is reported, from ten Monte Carlo accuracy simulations of an optimized 40-nm SONOS core with 8-bit ADCs. (b) Automatic speech recognition using 512-point analog FFTs for spectrogram generation and analog MVMs to accelerate the RNNT speech-to-text neural network. The simulated WER on the Librispeech "test-clean" dataset (2620 audio samples) is reported, where SONOS-based analog MVMs are used for STFT only, and for both STFT and RNNT. The ADC resolution for the analog STFT is varied from 8 to 10 bits, while for all RNNT layers it is fixed at 8 bits. (c) Comparison of the FFT energy vs 1D FFT size for a SONOS-based analog FFT, commercial chips that can process DSP workloads mckeown2010fftversal-fft, and various state-of-the-art (SOTA) digital FFT processors from the literature that support a flexible FFT size Guo2023Liu2025Shih2018Shih2018_2Chen2018Yang2012Liu2019. For the analog FFT, the ADC resolution is 8 bits and the maximum array size is 1024$\times$1024 (one array can process up to a 256-point DFT). (d) Comparison of the FFT compute density, quantified as the throughput normalized by chip area, between the SONOS-based analog FFT core and specialized digital FFT processors. The analog FFT was evaluated for a 4096-point FFT. The other processors were evaluated at their individual maximum supported FFT size that can fit onto one chip.
  • ...and 26 more figures