Adaptive Hybrid FFT: A Novel Pipeline and Memory-Based Architecture for Radix-$2^k$ FFT in Large Size Processing
Fangyu Zhao, Chunhua Xiao, Zhiguo Wang, Xiaohua Du, Bo Dong
TL;DR
The paper tackles the challenge of high-throughput, large-size FFT processing within hardware-constrained environments. It introduces an adaptive hybrid FFT that combines pipeline and memory-based architectures using radix-$2^k$ multi-path delay commutators (MDC) and a conflict-free memory access scheme, augmented by bit-dimension permutation for data reordering. Key contributions include a scalable MDC design supporting radix up to $2^5$, extended data permutation methods, and a memory access strategy that supports in-place and interleaved data flows, enabling near-continuous operation. FPGA implementation demonstrates feasibility up to $512K$ FFTs, achieving up to $196.8$ MHz and significantly reduced compute cycles compared to prior work, indicating strong potential for real-time, large-scale DSP applications.
Abstract
In the field of digital signal processing, the fast Fourier transform (FFT) is a fundamental algorithm, with its processors being implemented using either the pipelined architecture, well-known for high-throughput applications but weak in hardware utilization, or the memory-based architecture, designed for area-constrained scenarios but failing to meet stringent throughput requirements. Therefore, we propose an adaptive hybrid FFT, which leverages the strengths of both pipelined and memory-based architectures. In this paper, we propose an adaptive hybrid FFT processor that combines the advantages of both architectures, and it has the following features. First, a set of radix-$2^k$ multi-path delay commutators (MDC) units are developed to support high-performance large-size processing. Second, a conflict-free memory access scheme is formulated to ensure a continuous data flow without data contention. Third, We demonstrate the existence of a series of bit-dimension permutations for reordering input data, satisfying the generalized constraints of variable-length, high-radix, and any level of parallelism for wide adaptivity. Furthermore, the proposed FFT processor has been implemented on a field-programmable gate array (FPGA). As a result, the proposed work outperforms conventional memory-based FFT processors by requiring fewer computation cycles. It achieves higher hardware utilization than pipelined FFT architectures, making it suitable for highly demanding applications.
