Table of Contents
Fetching ...

Spectral Wavelet Dropout: Regularization in the Wavelet Domain

Rinor Cakaj, Jens Mehnert, Bin Yang

TL;DR

Convolutional neural networks suffer from overfitting due to feature co-adaptation, motivating regularization strategies. The paper introduces Spectral Wavelet Dropout (SWD), with 1D-SWD and 2D-SWD, which regularize in the wavelet domain by randomly dropping detailed frequency bands while preserving the low-frequency approximation, and uses a single dropout hyperparameter $p$ with energy scaling $(1-p)^{-1}$. SWD is compared against Spectral Fourier Dropout (SFD) and its 1D variant, showing competitive or superior performance on CIFAR-10/100, ImageNet, and Pascal VOC, often with substantially lower training overhead for 1D-SWD. The results demonstrate that SWD, particularly 1D-SWD, provides efficient and effective regularization across vision tasks, with future work including exploring different wavelets, adaptive schemes, and refined band-energy-based regularization.

Abstract

Regularization techniques help prevent overfitting and therefore improve the ability of convolutional neural networks (CNNs) to generalize. One reason for overfitting is the complex co-adaptations among different parts of the network, which make the CNN dependent on their joint response rather than encouraging each part to learn a useful feature representation independently. Frequency domain manipulation is a powerful strategy for modifying data that has temporal and spatial coherence by utilizing frequency decomposition. This work introduces Spectral Wavelet Dropout (SWD), a novel regularization method that includes two variants: 1D-SWD and 2D-SWD. These variants improve CNN generalization by randomly dropping detailed frequency bands in the discrete wavelet decomposition of feature maps. Our approach distinguishes itself from the pre-existing Spectral "Fourier" Dropout (2D-SFD), which eliminates coefficients in the Fourier domain. Notably, SWD requires only a single hyperparameter, unlike the two required by SFD. We also extend the literature by implementing a one-dimensional version of Spectral "Fourier" Dropout (1D-SFD), setting the stage for a comprehensive comparison. Our evaluation shows that both 1D and 2D SWD variants have competitive performance on CIFAR-10/100 benchmarks relative to both 1D-SFD and 2D-SFD. Specifically, 1D-SWD has a significantly lower computational complexity compared to 1D/2D-SFD. In the Pascal VOC Object Detection benchmark, SWD variants surpass 1D-SFD and 2D-SFD in performance and demonstrate lower computational complexity during training.

Spectral Wavelet Dropout: Regularization in the Wavelet Domain

TL;DR

Convolutional neural networks suffer from overfitting due to feature co-adaptation, motivating regularization strategies. The paper introduces Spectral Wavelet Dropout (SWD), with 1D-SWD and 2D-SWD, which regularize in the wavelet domain by randomly dropping detailed frequency bands while preserving the low-frequency approximation, and uses a single dropout hyperparameter with energy scaling . SWD is compared against Spectral Fourier Dropout (SFD) and its 1D variant, showing competitive or superior performance on CIFAR-10/100, ImageNet, and Pascal VOC, often with substantially lower training overhead for 1D-SWD. The results demonstrate that SWD, particularly 1D-SWD, provides efficient and effective regularization across vision tasks, with future work including exploring different wavelets, adaptive schemes, and refined band-energy-based regularization.

Abstract

Regularization techniques help prevent overfitting and therefore improve the ability of convolutional neural networks (CNNs) to generalize. One reason for overfitting is the complex co-adaptations among different parts of the network, which make the CNN dependent on their joint response rather than encouraging each part to learn a useful feature representation independently. Frequency domain manipulation is a powerful strategy for modifying data that has temporal and spatial coherence by utilizing frequency decomposition. This work introduces Spectral Wavelet Dropout (SWD), a novel regularization method that includes two variants: 1D-SWD and 2D-SWD. These variants improve CNN generalization by randomly dropping detailed frequency bands in the discrete wavelet decomposition of feature maps. Our approach distinguishes itself from the pre-existing Spectral "Fourier" Dropout (2D-SFD), which eliminates coefficients in the Fourier domain. Notably, SWD requires only a single hyperparameter, unlike the two required by SFD. We also extend the literature by implementing a one-dimensional version of Spectral "Fourier" Dropout (1D-SFD), setting the stage for a comprehensive comparison. Our evaluation shows that both 1D and 2D SWD variants have competitive performance on CIFAR-10/100 benchmarks relative to both 1D-SFD and 2D-SFD. Specifically, 1D-SWD has a significantly lower computational complexity compared to 1D/2D-SFD. In the Pascal VOC Object Detection benchmark, SWD variants surpass 1D-SFD and 2D-SFD in performance and demonstrate lower computational complexity during training.
Paper Structure (37 sections, 1 equation, 6 figures, 12 tables, 2 algorithms)

This paper contains 37 sections, 1 equation, 6 figures, 12 tables, 2 algorithms.

Figures (6)

  • Figure 1: Illustration of the 2D Spectral Wavelet Dropout process applied to an input feature map. A one-level 2D wavelet decomposition divides the map into four sub-bands: LL (low-frequency), LH (horizontal high-frequency), HL (vertical high-frequency), and HH (diagonal high-frequency), presented sequentially from top to bottom in the figure. The high-frequency bands are randomly dropped (HL is shown as dropped) according to a dropout probability $p$, while the LL band is consistently maintained. The feature map is then transformed back into the spatial domain using an inverse 2D wavelet transform (2D-IDWT).
  • Figure 2: Detailed schematic of a three-level wavelet filter bank used in our proposed 1D-SWD method. The filter bank decomposes an input signal $x[n]$ into hierarchical sets of approximation and detailed coefficients. This allows analysis of the signal across multiple resolutions. This structured arrangement, using sequential high-pass $h[n]$ and low-pass $g[n]$ filters with subsequent downsampling, enables SWD to prevent overfitting. By randomly dropping high-frequency details, it promotes model generalization and prevents overfitting.
  • Figure 3: Schematic of the 1D-SWD applied to a feature map $X \in \mathbb{R}^{C \times H \times W}$. The process begins by flattening $X$ across channels, resulting in $\hat{X} \in \mathbb{R}^{C \times H \cdot W}$. A three-level 1D-DWT is then applied to $\hat{X}$, yielding approximation and detailed coefficients across three levels. For regularization, detailed coefficients are dropped using a dropout probability $p$, with Level 2 Coefficients being dropped in the example. The regularized feature map is transformed back using an inverse 1D-DWT.
  • Figure 4: This figure shows the Discrete Wavelet Transform dividing a signal into distinct frequency bands. Such multi-resolution analysis is important for our Spectral Wavelet Dropout method, as it allows randomly dropping of information across different scales.
  • Figure 5: Hyperparameter search results for ResNet50 with 2D-SFD on CIFAR-10, illustrating the impact of various configurations on model performance.
  • ...and 1 more figures