Comparing Spectral Bias and Robustness For Two-Layer Neural Networks: SGD vs Adaptive Random Fourier Features
Aku Kammonen, Lisi Liang, Anamika Pandey, Raúl Tempone
TL;DR
This paper investigates how the choice of training algorithm affects spectral bias and robustness in a two-layer neural network. It contrasts SGD with adaptive random Fourier features (ARFF) and demonstrates that ARFF can reduce spectral bias toward zero by adaptively sampling Fourier frequencies, formalized through $SB = (\mathcal{E}_{high}-\mathcal{E}_{low})/(\mathcal{E}_{high}+\mathcal{E}_{low})$. Experimental results on function reconstruction show ARFF achieving spectral unbiasedness, while SGD remains spectrally biased. In MNIST (and CIFAR-10) experiments, ARFF-based models exhibit enhanced robustness to sparse additive perturbations, particularly when using noisy validation data with early stopping, highlighting a practical path to improved reliability via frequency-aware training.
Abstract
We present experimental results highlighting two key differences resulting from the choice of training algorithm for two-layer neural networks. The spectral bias of neural networks is well known, while the spectral bias dependence on the choice of training algorithm is less studied. Our experiments demonstrate that an adaptive random Fourier features algorithm (ARFF) can yield a spectral bias closer to zero compared to the stochastic gradient descent optimizer (SGD). Additionally, we train two identically structured classifiers, employing SGD and ARFF, to the same accuracy levels and empirically assess their robustness against adversarial noise attacks.
