Why some audio signal short-time Fourier transform coefficients have nonuniform phase distributions
Stephen D. Voran
TL;DR
The paper shows that STFT phases $\phi_k$ are not globally uniform; per-bin phase distributions are nonuniform due to tonal content and window sidelobes. It derives the tone-to-bin phase mapping, revealing nonlinear relationships that create intrinsic phase peaks in each coefficient, and demonstrates how window shape modulates these effects. The key contributions include closed-form relationships for tonal impact on $\phi_k$, identification of four intrinsic phase peaks per coefficient, and empirical evidence across diverse audio shows that more sidelobe-suppressive windows yield more uniform phase distributions. The findings imply that per-bin phase priors, rather than a global uniform prior, can improve phase-aware tasks such as reconstruction and source separation.
Abstract
The short-time Fourier transform (STFT) represents a window of audio samples as a set of complex coefficients. These are advantageously viewed as magnitudes and phases and the overall distribution of phases is very often assumed to be uniform. We show that when audio signal STFT phase distributions are analyzed per-frequency or per-magnitude range, they can be far from uniform. That is, the uniform phase distribution assumption obscures significant important details. We explain the significance of the nonuniform phase distributions and how they might be exploited, derive their source, and explain why the choice of the STFT window shape influences the nonuniformity of the resulting phase distributions.
