Table of Contents
Fetching ...

Robust Fourier Neural Networks

Halyun Jeong, Jihun Han

TL;DR

It is demonstrated that introducing a simple diagonal layer after the Fourier embedding layer makes the network more robust to measurement noise, effectively prompting it to learn sparse Fourier features.

Abstract

Fourier embedding has shown great promise in removing spectral bias during neural network training. However, it can still suffer from high generalization errors, especially when the labels or measurements are noisy. We demonstrate that introducing a simple diagonal layer after the Fourier embedding layer makes the network more robust to measurement noise, effectively prompting it to learn sparse Fourier features. We provide theoretical justifications for this Fourier feature learning, leveraging recent developments in diagonal networks and implicit regularization in neural networks. Under certain conditions, our proposed approach can also learn functions that are noisy mixtures of nonlinear functions of Fourier features. Numerical experiments validate the effectiveness of our proposed architecture, supporting our theory.

Robust Fourier Neural Networks

TL;DR

It is demonstrated that introducing a simple diagonal layer after the Fourier embedding layer makes the network more robust to measurement noise, effectively prompting it to learn sparse Fourier features.

Abstract

Fourier embedding has shown great promise in removing spectral bias during neural network training. However, it can still suffer from high generalization errors, especially when the labels or measurements are noisy. We demonstrate that introducing a simple diagonal layer after the Fourier embedding layer makes the network more robust to measurement noise, effectively prompting it to learn sparse Fourier features. We provide theoretical justifications for this Fourier feature learning, leveraging recent developments in diagonal networks and implicit regularization in neural networks. Under certain conditions, our proposed approach can also learn functions that are noisy mixtures of nonlinear functions of Fourier features. Numerical experiments validate the effectiveness of our proposed architecture, supporting our theory.
Paper Structure (18 sections, 11 theorems, 70 equations, 13 figures, 1 algorithm)

This paper contains 18 sections, 11 theorems, 70 equations, 13 figures, 1 algorithm.

Key Result

Lemma 1

Suppose Assumption assumption:recovery_condition2 holds. Then, we have or

Figures (13)

  • Figure 1: Fourier feature-embedded neural networks
  • Figure 2: Regression for noisy data generated by \ref{['eq:linearFourier']} using the neural networks $\ast u_0^{\text{diag}}$,$u_0^{\text{diag}}$, $u_1^{\text{diag}}$ and $u_1^{\text{standard}}$. (a): a noisy target for regression. (b): Regression results corresponding to the part in (a) enclosed by the bounding box. (c): Learning procedure during training iterations.
  • Figure 3: Weight distribution of neural networks $\ast u_0^{\text{diag}}$, $u_0^{\text{diag}}$ and $u_1^{\text{standard}}$ in the regression of noisy data generated by \ref{['eq:linearFourier']}. (a1) presents the final state of $\ast u_0^{\text{diag}}$ after training and (b1) is its training procedures. (a2) and (b2) correspond to $u_0^{\text{diag}}$. (c) presents the weight distribution of the dense layer subsequent to the Fourier embedding layer in $u_1^{\text{standard}}$.
  • Figure 4: Regression for noisy data generated by \ref{['eq:linearFourierPhaseShift']} using the neural networks $\ast u_0^{\text{diag}}$,$u_0^{\text{diag}}$, $u_1^{\text{diag}}$ and $u_1^{\text{standard}}$. (a): a noisy target for regression. (b): Regression results corresponding to the part in (a) enclosed by the bounding box. (c): Learning procedure during training iterations.
  • Figure 5: Weight distribution of neural networks $\ast u_0^{\text{diag}}$, $u_0^{\text{diag}}$ and $u_1^{\text{standard}}$ in the regression of noisy data generated by \ref{['eq:linearFourierPhaseShift']}. (a1) presents the final state of $\ast u_0^{\text{diag}}$ after training and (b1) is its training procedures. (a2) and (b2) correspond to $u_0^{\text{diag}}$. (c) presents the weight distribution of the dense layer subsequent to the Fourier embedding layer in $u_1^{\text{standard}}$.
  • ...and 8 more figures

Theorems & Definitions (20)

  • Remark 1: Polynomial link functions
  • Remark 2
  • Lemma 1
  • proof
  • Theorem 2
  • Corollary 3
  • Corollary 4
  • Corollary 5
  • proof : Proof of Theorem \ref{['thm:feature_learning']}
  • Lemma 6
  • ...and 10 more