Spectral Wavelet Dropout: Regularization in the Wavelet Domain
Rinor Cakaj, Jens Mehnert, Bin Yang
TL;DR
Convolutional neural networks suffer from overfitting due to feature co-adaptation, motivating regularization strategies. The paper introduces Spectral Wavelet Dropout (SWD), with 1D-SWD and 2D-SWD, which regularize in the wavelet domain by randomly dropping detailed frequency bands while preserving the low-frequency approximation, and uses a single dropout hyperparameter $p$ with energy scaling $(1-p)^{-1}$. SWD is compared against Spectral Fourier Dropout (SFD) and its 1D variant, showing competitive or superior performance on CIFAR-10/100, ImageNet, and Pascal VOC, often with substantially lower training overhead for 1D-SWD. The results demonstrate that SWD, particularly 1D-SWD, provides efficient and effective regularization across vision tasks, with future work including exploring different wavelets, adaptive schemes, and refined band-energy-based regularization.
Abstract
Regularization techniques help prevent overfitting and therefore improve the ability of convolutional neural networks (CNNs) to generalize. One reason for overfitting is the complex co-adaptations among different parts of the network, which make the CNN dependent on their joint response rather than encouraging each part to learn a useful feature representation independently. Frequency domain manipulation is a powerful strategy for modifying data that has temporal and spatial coherence by utilizing frequency decomposition. This work introduces Spectral Wavelet Dropout (SWD), a novel regularization method that includes two variants: 1D-SWD and 2D-SWD. These variants improve CNN generalization by randomly dropping detailed frequency bands in the discrete wavelet decomposition of feature maps. Our approach distinguishes itself from the pre-existing Spectral "Fourier" Dropout (2D-SFD), which eliminates coefficients in the Fourier domain. Notably, SWD requires only a single hyperparameter, unlike the two required by SFD. We also extend the literature by implementing a one-dimensional version of Spectral "Fourier" Dropout (1D-SFD), setting the stage for a comprehensive comparison. Our evaluation shows that both 1D and 2D SWD variants have competitive performance on CIFAR-10/100 benchmarks relative to both 1D-SFD and 2D-SFD. Specifically, 1D-SWD has a significantly lower computational complexity compared to 1D/2D-SFD. In the Pascal VOC Object Detection benchmark, SWD variants surpass 1D-SFD and 2D-SFD in performance and demonstrate lower computational complexity during training.
