Table of Contents
Fetching ...

Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks

Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao, Zheng Ma

TL;DR

This paper uncovers a universal Frequency Principle (F-Principle) in gradient-based training of deep neural networks: models tend to fit training targets from low to high response frequencies. It introduces two frequency-centric examination methods—projection and filtering—to demonstrate the principle on high-dimensional real data (MNIST/CIFAR10) and across architectures (including VGG16). A simple theoretical argument links activation-function regularity to the observed frequency bias, and the work shows practical implications for generalization and for hybrid numerical schemes in scientific computing, such as solving Poisson-type PDEs. Overall, the F-Principle provides a unifying lens for understanding why DNNs generalize well on real datasets and where they may struggle, with actionable insights for leveraging frequency content in training and algorithm design.

Abstract

We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective. We demonstrate a very universal Frequency Principle (F-Principle) -- DNNs often fit target functions from low to high frequencies -- on high-dimensional benchmark datasets such as MNIST/CIFAR10 and deep neural networks such as VGG16. This F-Principle of DNNs is opposite to the behavior of most conventional iterative numerical schemes (e.g., Jacobi method), which exhibit faster convergence for higher frequencies for various scientific computing problems. With a simple theory, we illustrate that this F-Principle results from the regularity of the commonly used activation functions. The F-Principle implies an implicit bias that DNNs tend to fit training data by a low-frequency function. This understanding provides an explanation of good generalization of DNNs on most real datasets and bad generalization of DNNs on parity function or randomized dataset.

Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks

TL;DR

This paper uncovers a universal Frequency Principle (F-Principle) in gradient-based training of deep neural networks: models tend to fit training targets from low to high response frequencies. It introduces two frequency-centric examination methods—projection and filtering—to demonstrate the principle on high-dimensional real data (MNIST/CIFAR10) and across architectures (including VGG16). A simple theoretical argument links activation-function regularity to the observed frequency bias, and the work shows practical implications for generalization and for hybrid numerical schemes in scientific computing, such as solving Poisson-type PDEs. Overall, the F-Principle provides a unifying lens for understanding why DNNs generalize well on real datasets and where they may struggle, with actionable insights for leveraging frequency content in training and algorithm design.

Abstract

We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective. We demonstrate a very universal Frequency Principle (F-Principle) -- DNNs often fit target functions from low to high frequencies -- on high-dimensional benchmark datasets such as MNIST/CIFAR10 and deep neural networks such as VGG16. This F-Principle of DNNs is opposite to the behavior of most conventional iterative numerical schemes (e.g., Jacobi method), which exhibit faster convergence for higher frequencies for various scientific computing problems. With a simple theory, we illustrate that this F-Principle results from the regularity of the commonly used activation functions. The F-Principle implies an implicit bias that DNNs tend to fit training data by a low-frequency function. This understanding provides an explanation of good generalization of DNNs on most real datasets and bad generalization of DNNs on parity function or randomized dataset.

Paper Structure

This paper contains 24 sections, 4 theorems, 62 equations, 11 figures.

Key Result

Theorem 1

Considering a DNN of one hidden layer with activation function $\sigma(x)=\tanh(x)$, for any frequencies $k_{1}$ and $k_{2}$ such that $|\hat{f}(k_{1})|>0$, $|\hat{f}(k_{2})|>0$, and $|k_{2}|>|k_{1}|>0$, there exist positive constants $c$ and $C$ such that for sufficiently small $\delta$, we have where $B_{\delta}\subset\mathbb{R}^{m}$ is a ball with radius $\delta$ centered at the origin and $\m

Figures (11)

  • Figure 1: Projection method. (a, b) are for MNIST, (c, d) for CIFAR10. (a, c) Amplitude $|\hat{y}_{k}|$ vs. frequency. Selected frequencies are marked by black squares. (b, d) $\Delta_{F}(k)$ vs. training epochs for the selected frequencies.
  • Figure 2: F-Principle in real datasets. $e_{\mathrm{low}}$ and $e_{\mathrm{high}}$ indicated by color against training epoch.
  • Figure 3: Poisson's equation. (a) $u_{\mathrm{ref}}(x)$. Inset: $|\hat{u}_{\mathrm{ref}}(k)|$ as a function of frequency. Frequencies peaks are marked with black dots. (b,c) $\Delta_{F}(k)$ computed on the inputs of training data at different epochs for the selected frequencies for DNN (b) and Jacobi (c). (d) $|h-u_{\mathrm{ref}}|_{\infty}$ at different running time. Green stars indicate $|h-u_{\mathrm{ref}}|_{\infty}$ using DNN alone. The dashed lines indicate $|h-u_{\mathrm{ref}}|_{\infty}$ for the Jacobi method with different colors indicating initialization by different timing of DNN training.
  • Figure 4: Fourier analysis for different generalization ability. The plot is the amplitude of the Fourier coefficient against frequency $k$. The red dots are for the training dataset, the green line is for the whole dataset, and the blue dashed line is for an output of well-trained DNN on the input of the whole dataset. For (c), $d=10$. The training data is $200$ randomly selected points.
  • Figure 5: 1d input. (a) $f(x)$. Inset : $|\hat{f}(k)|$. (b) $\Delta_{F}(k)$ of three important frequencies (indicated by black dots in the inset of (a)) against different training epochs.
  • ...and 6 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Theorem 2
  • Theorem
  • proof
  • Theorem
  • proof