On a time-frequency blurring operator with applications in data augmentation

Simon Halvdansson

On a time-frequency blurring operator with applications in data augmentation

Simon Halvdansson

TL;DR

This work introduces a time-frequency blurring operator $B_^$ that convolves the STFT of a signal with a kernel and then inverts the result to produce a blurred waveform, enabling phase-aware data augmentation in the time-frequency domain. It provides a rigorous TF-analysis framework, establishing mapping properties, weak action, positivity, and non-compactness, and discusses generalizations such as position-dependent kernels and multi-window windows. The method is empirically evaluated on SpeechCommands V2 with CNNs and ViTs, demonstrating that STFT-blur and spectrogram blurring improve performance, especially in data-scarce settings, and that combining augmentations yields the strongest gains. The results highlight the practical potential of phase-space augmentation as an efficient, in-distribution augmentation technique that preserves essential signal structure while perturbing local TF content.

Abstract

Inspired by the success of recent data augmentation methods for signals which act on time-frequency representations, we introduce an operator which convolves the short-time Fourier transform of a signal with a specified kernel. Analytical properties including boundedness, compactness and positivity are investigated from the perspective of time-frequency analysis. A convolutional neural network and a vision transformer are trained to classify audio signals using spectrograms with different augmentation setups, including the above mentioned time-frequency blurring operator, with results indicating that the operator can significantly improve test performance, especially in the data-starved regime.

On a time-frequency blurring operator with applications in data augmentation

TL;DR

This work introduces a time-frequency blurring operator

that convolves the STFT of a signal with a kernel and then inverts the result to produce a blurred waveform, enabling phase-aware data augmentation in the time-frequency domain. It provides a rigorous TF-analysis framework, establishing mapping properties, weak action, positivity, and non-compactness, and discusses generalizations such as position-dependent kernels and multi-window windows. The method is empirically evaluated on SpeechCommands V2 with CNNs and ViTs, demonstrating that STFT-blur and spectrogram blurring improve performance, especially in data-scarce settings, and that combining augmentations yields the strongest gains. The results highlight the practical potential of phase-space augmentation as an efficient, in-distribution augmentation technique that preserves essential signal structure while perturbing local TF content.

Abstract

Paper Structure (24 sections, 15 theorems, 57 equations, 4 figures, 2 tables)

This paper contains 24 sections, 15 theorems, 57 equations, 4 figures, 2 tables.

Introduction and motivation
Time-frequency preliminaries
Short-time Fourier transform
Gabor spaces
Localization operators
Modulation spaces
Quadratic time-frequency distributions
Related objects and adaptations
Spectrogram blurring
Position-dependent kernel
Window generalizations
Analytical properties
Mapping properties
Weak action and positivity
Non-compactness
...and 9 more sections

Key Result

Lemma 2.1

Let $1 \leq p_1, p_2 \leq \infty$ and $p_\theta$ be defined by the relation Then we can interpolate between $M^{p_1}(\mathbb{R}^d)$ and $M^{p_2}(\mathbb{R}^d)$ as

Figures (4)

Figure 1: Action of a time-frequency blurring operator with Gaussian kernel on an audio recording, illustrated using spectrograms.
Figure 2: Spectrograms and log-mel spectrograms of an audio recording and the same recording with a time-frequency blurring operator with Gaussian kernel applied to it.
Figure 3: Spectrograms and log-mel spectrograms of an audio recording and the same spectrograms blurred with a Gaussian kernel.
Figure 4: Log-mel spectrograms of an audio recording from the SpeechCommands V2 dataset speechcommandsv2 with different augmentation techniques applied to it.

Theorems & Definitions (26)

Lemma 2.1
Lemma 2.2
Proposition 4.1
proof
Proposition 4.2
proof
Proposition 4.3
proof
Proposition 4.4
proof
...and 16 more

On a time-frequency blurring operator with applications in data augmentation

TL;DR

Abstract

On a time-frequency blurring operator with applications in data augmentation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (26)