A Low-Latency FFT-IFFT Cascade Architecture

Keshab K. Parhi

A Low-Latency FFT-IFFT Cascade Architecture

Keshab K. Parhi

TL;DR

The paper tackles the latency and area penalties of partly-parallel FFT-IFFT cascades by introducing ASAP scheduling with a uniquely designed IFFT folding set, enabling a bufferless cascade and preserving hardware footprint. The method extends to interleaved multi-channel processing, achieving full resource utilization without extra reorder or interleaving hardware. Quantitative results show memory and latency reductions of about $N/2$ elements and $N/4$ clock cycles for single-channel, and about $N/2$ elements and $N/2$ clock cycles for two-channel cascades, with throughput fixed at 2 samples per clock. This approach provides a scalable, hardware-efficient solution for real-time FFT-based processing in communications, imaging, and ML feature extraction.

Abstract

This paper addresses the design of a partly-parallel cascaded FFT-IFFT architecture that does not require any intermediate buffer. Folding can be used to design partly-parallel architectures for FFT and IFFT. While many cascaded FFT-IFFT architectures can be designed using various folding sets for the FFT and the IFFT, for a specified folded FFT architecture, there exists a unique folding set to design the IFFT architecture that does not require an intermediate buffer. Such a folding set is designed by processing the output of the FFT as soon as possible (ASAP) in the folded IFFT. Elimination of the intermediate buffer reduces latency and saves area. The proposed approach is also extended to interleaved processing of multi-channel time-series. The proposed FFT-IFFT cascade architecture saves about N/2 memory elements and N/4 clock cycles of latency compared to a design with identical folding sets. For the 2-interleaved FFT-IFFT cascade, the memory and latency savings are, respectively, N/2 units and N/2 clock cycles, compared to a design with identical folding sets.

A Low-Latency FFT-IFFT Cascade Architecture

TL;DR

elements and

clock cycles for single-channel, and about

elements and

clock cycles for two-channel cascades, with throughput fixed at 2 samples per clock. This approach provides a scalable, hardware-efficient solution for real-time FFT-based processing in communications, imaging, and ML feature extraction.

Abstract

Paper Structure (8 sections, 6 equations, 5 figures, 1 table)

This paper contains 8 sections, 6 equations, 5 figures, 1 table.

Introduction
Cascaded FFT-IFFT Architecture Design
Traditional FFT/IFFT Cascade Architecture
FFT-IFFT cascade using ASAP Scheduling
Interleaved FFT-IFFT Cascade Architecture
Comparison and Performance Analysis
Conclusion
Acknowledgment

Figures (5)

Figure 1: Cascaded FFT-IFFT architecture with and without intermediate buffer.
Figure 2: Data-flow graphs for FFT and IFFT with scheduling. Clock cyles are marked in red.
Figure 3: Cascaded 16-Point FFT-IFFT architectures. Top-Middle cascade represents a traditional design. Top-bottom cascade represents the proposed design.
Figure 4: Data-flow graphs and schedules for Interleaved FFT and IFFT.
Figure 5: Cascaded interleaved 16-Point FFT-IFFT architectures. Top-Middle cascade represents a traditional design. Top-bottom cascade represents the proposed design.

A Low-Latency FFT-IFFT Cascade Architecture

TL;DR

Abstract

A Low-Latency FFT-IFFT Cascade Architecture

Authors

TL;DR

Abstract

Table of Contents

Figures (5)