Table of Contents
Fetching ...

On a time-frequency blurring operator with applications in data augmentation

Simon Halvdansson

TL;DR

This work introduces a time-frequency blurring operator $B_^$ that convolves the STFT of a signal with a kernel and then inverts the result to produce a blurred waveform, enabling phase-aware data augmentation in the time-frequency domain. It provides a rigorous TF-analysis framework, establishing mapping properties, weak action, positivity, and non-compactness, and discusses generalizations such as position-dependent kernels and multi-window windows. The method is empirically evaluated on SpeechCommands V2 with CNNs and ViTs, demonstrating that STFT-blur and spectrogram blurring improve performance, especially in data-scarce settings, and that combining augmentations yields the strongest gains. The results highlight the practical potential of phase-space augmentation as an efficient, in-distribution augmentation technique that preserves essential signal structure while perturbing local TF content.

Abstract

Inspired by the success of recent data augmentation methods for signals which act on time-frequency representations, we introduce an operator which convolves the short-time Fourier transform of a signal with a specified kernel. Analytical properties including boundedness, compactness and positivity are investigated from the perspective of time-frequency analysis. A convolutional neural network and a vision transformer are trained to classify audio signals using spectrograms with different augmentation setups, including the above mentioned time-frequency blurring operator, with results indicating that the operator can significantly improve test performance, especially in the data-starved regime.

On a time-frequency blurring operator with applications in data augmentation

TL;DR

This work introduces a time-frequency blurring operator that convolves the STFT of a signal with a kernel and then inverts the result to produce a blurred waveform, enabling phase-aware data augmentation in the time-frequency domain. It provides a rigorous TF-analysis framework, establishing mapping properties, weak action, positivity, and non-compactness, and discusses generalizations such as position-dependent kernels and multi-window windows. The method is empirically evaluated on SpeechCommands V2 with CNNs and ViTs, demonstrating that STFT-blur and spectrogram blurring improve performance, especially in data-scarce settings, and that combining augmentations yields the strongest gains. The results highlight the practical potential of phase-space augmentation as an efficient, in-distribution augmentation technique that preserves essential signal structure while perturbing local TF content.

Abstract

Inspired by the success of recent data augmentation methods for signals which act on time-frequency representations, we introduce an operator which convolves the short-time Fourier transform of a signal with a specified kernel. Analytical properties including boundedness, compactness and positivity are investigated from the perspective of time-frequency analysis. A convolutional neural network and a vision transformer are trained to classify audio signals using spectrograms with different augmentation setups, including the above mentioned time-frequency blurring operator, with results indicating that the operator can significantly improve test performance, especially in the data-starved regime.
Paper Structure (24 sections, 15 theorems, 57 equations, 4 figures, 2 tables)

This paper contains 24 sections, 15 theorems, 57 equations, 4 figures, 2 tables.

Key Result

Lemma 2.1

Let $1 \leq p_1, p_2 \leq \infty$ and $p_\theta$ be defined by the relation Then we can interpolate between $M^{p_1}(\mathbb{R}^d)$ and $M^{p_2}(\mathbb{R}^d)$ as

Figures (4)

  • Figure 1: Action of a time-frequency blurring operator with Gaussian kernel on an audio recording, illustrated using spectrograms.
  • Figure 2: Spectrograms and log-mel spectrograms of an audio recording and the same recording with a time-frequency blurring operator with Gaussian kernel applied to it.
  • Figure 3: Spectrograms and log-mel spectrograms of an audio recording and the same spectrograms blurred with a Gaussian kernel.
  • Figure 4: Log-mel spectrograms of an audio recording from the SpeechCommands V2 dataset speechcommandsv2 with different augmentation techniques applied to it.

Theorems & Definitions (26)

  • Lemma 2.1
  • Lemma 2.2
  • Proposition 4.1
  • proof
  • Proposition 4.2
  • proof
  • Proposition 4.3
  • proof
  • Proposition 4.4
  • proof
  • ...and 16 more