On a time-frequency blurring operator with applications in data augmentation
Simon Halvdansson
TL;DR
This work introduces a time-frequency blurring operator $B_^$ that convolves the STFT of a signal with a kernel and then inverts the result to produce a blurred waveform, enabling phase-aware data augmentation in the time-frequency domain. It provides a rigorous TF-analysis framework, establishing mapping properties, weak action, positivity, and non-compactness, and discusses generalizations such as position-dependent kernels and multi-window windows. The method is empirically evaluated on SpeechCommands V2 with CNNs and ViTs, demonstrating that STFT-blur and spectrogram blurring improve performance, especially in data-scarce settings, and that combining augmentations yields the strongest gains. The results highlight the practical potential of phase-space augmentation as an efficient, in-distribution augmentation technique that preserves essential signal structure while perturbing local TF content.
Abstract
Inspired by the success of recent data augmentation methods for signals which act on time-frequency representations, we introduce an operator which convolves the short-time Fourier transform of a signal with a specified kernel. Analytical properties including boundedness, compactness and positivity are investigated from the perspective of time-frequency analysis. A convolutional neural network and a vision transformer are trained to classify audio signals using spectrograms with different augmentation setups, including the above mentioned time-frequency blurring operator, with results indicating that the operator can significantly improve test performance, especially in the data-starved regime.
