Table of Contents
Fetching ...

Dual-Select FMA Butterfly for FFT: Eliminating Twiddle Factor Singularities with Bounded Precomputed Ratios

Mohamed Amine Bergach

Abstract

The fused multiply-add (FMA) instruction enables the radix-2 FFT butterfly to be computed in 6~FMA operations -- the proven minimum. The classical factorization by Linzer and Feig~\cite{linzer1993} precomputes the ratio $\cotθ= \cosθ/\sinθ$, which is singular when the twiddle factor is $W^0 = 1$ (i.e., $\sinθ= 0$). Standard practice clamps $\sinθ$ to a small epsilon, degrading numerical precision. We observe that an alternative factorization using $\cosθ$ as the outer multiplier (precomputing $\tanθ$) avoids this particular singularity but introduces a new one at $W^{N/4}$. We then propose a \emph{dual-select} strategy that chooses, per twiddle factor, whichever factorization yields $|\text{ratio}| \leq 1$. This eliminates all singularities, requires no epsilon clamping, and bounds the precomputed ratio to unity for all twiddle factors. For $N = 1024$, the worst-case ratio drops from 163 (Linzer-Feig) to exactly~1.0 (dual-select), yielding a $235\times$ tighter error bound in FP16 arithmetic over 10~FFT passes. The strategy adds zero computational overhead -- only the precomputed twiddle table changes.

Dual-Select FMA Butterfly for FFT: Eliminating Twiddle Factor Singularities with Bounded Precomputed Ratios

Abstract

The fused multiply-add (FMA) instruction enables the radix-2 FFT butterfly to be computed in 6~FMA operations -- the proven minimum. The classical factorization by Linzer and Feig~\cite{linzer1993} precomputes the ratio , which is singular when the twiddle factor is (i.e., ). Standard practice clamps to a small epsilon, degrading numerical precision. We observe that an alternative factorization using as the outer multiplier (precomputing ) avoids this particular singularity but introduces a new one at . We then propose a \emph{dual-select} strategy that chooses, per twiddle factor, whichever factorization yields . This eliminates all singularities, requires no epsilon clamping, and bounds the precomputed ratio to unity for all twiddle factors. For , the worst-case ratio drops from 163 (Linzer-Feig) to exactly~1.0 (dual-select), yielding a tighter error bound in FP16 arithmetic over 10~FFT passes. The strategy adds zero computational overhead -- only the precomputed twiddle table changes.

Paper Structure

This paper contains 13 sections, 1 theorem, 6 equations, 1 figure, 2 tables.

Key Result

Theorem 1

For all twiddle factors $W^k = e^{-j2\pi k/N}$ with $0 \leq k < N/2$, the dual-select strategy produces a precomputed ratio satisfying $|t| \leq 1$. $\blacktriangleleft$$\blacktriangleleft$

Theorems & Definitions (1)

  • Theorem 1