Dual-Select FMA Butterfly for FFT: Eliminating Twiddle Factor Singularities with Bounded Precomputed Ratios

Mohamed Amine Bergach

Dual-Select FMA Butterfly for FFT: Eliminating Twiddle Factor Singularities with Bounded Precomputed Ratios

Mohamed Amine Bergach

Abstract

The fused multiply-add (FMA) instruction enables the radix-2 FFT butterfly to be computed in 6~FMA operations -- the proven minimum. The classical factorization by Linzer and Feig~\cite{linzer1993} precomputes the ratio $\cotθ= \cosθ/\sinθ$, which is singular when the twiddle factor is $W^0 = 1$ (i.e., $\sinθ= 0$). Standard practice clamps $\sinθ$ to a small epsilon, degrading numerical precision. We observe that an alternative factorization using $\cosθ$ as the outer multiplier (precomputing $\tanθ$) avoids this particular singularity but introduces a new one at $W^{N/4}$. We then propose a \emph{dual-select} strategy that chooses, per twiddle factor, whichever factorization yields $|\text{ratio}| \leq 1$. This eliminates all singularities, requires no epsilon clamping, and bounds the precomputed ratio to unity for all twiddle factors. For $N = 1024$, the worst-case ratio drops from 163 (Linzer-Feig) to exactly~1.0 (dual-select), yielding a $235\times$ tighter error bound in FP16 arithmetic over 10~FFT passes. The strategy adds zero computational overhead -- only the precomputed twiddle table changes.

Dual-Select FMA Butterfly for FFT: Eliminating Twiddle Factor Singularities with Bounded Precomputed Ratios

Abstract

, which is singular when the twiddle factor is

(i.e.,

). Standard practice clamps

to a small epsilon, degrading numerical precision. We observe that an alternative factorization using

as the outer multiplier (precomputing

) avoids this particular singularity but introduces a new one at

. We then propose a \emph{dual-select} strategy that chooses, per twiddle factor, whichever factorization yields

. This eliminates all singularities, requires no epsilon clamping, and bounds the precomputed ratio to unity for all twiddle factors. For

, the worst-case ratio drops from 163 (Linzer-Feig) to exactly~1.0 (dual-select), yielding a

tighter error bound in FP16 arithmetic over 10~FFT passes. The strategy adds zero computational overhead -- only the precomputed twiddle table changes.

Dual-Select FMA Butterfly for FFT: Eliminating Twiddle Factor Singularities with Bounded Precomputed Ratios

Abstract

Dual-Select FMA Butterfly for FFT: Eliminating Twiddle Factor Singularities with Bounded Precomputed Ratios

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (1)