Table of Contents
Fetching ...

Deep Learning without Global Optimization by Random Fourier Neural Networks

Owen Davis, Gianluca Geraci, Mohammad Motamed

TL;DR

This paper proposes a global-optimization-free training algorithm for deep residual networks with random Fourier (complex exponential) activations, termed random Fourier neural networks (rFNNs). It introduces a block-by-block training scheme where each block learns a residual correction using optimally derived frequency distributions and convex amplitude fitting, aided by adaptive Metropolis within Gibbs sampling to update frequencies. The approach achieves or surpasses the known theoretical approximation rates for rFNNs, learns high-frequency and multiscale features efficiently, and provides interpretable frequency decompositions, while avoiding Gibbs phenomena in discontinuous targets. Empirical results on discontinuous and multidimensional functions illustrate faster convergence and superior approximation relative to global optimization baselines, with potential extensions to uncertainty quantification and vector-valued tasks.

Abstract

We introduce a new training algorithm for deep neural networks that utilize random complex exponential activation functions. Our approach employs a Markov Chain Monte Carlo sampling procedure to iteratively train network layers, avoiding global and gradient-based optimization while maintaining error control. It consistently attains the theoretical approximation rate for residual networks with complex exponential activation functions, determined by network complexity. Additionally, it enables efficient learning of multiscale and high-frequency features, producing interpretable parameter distributions. Despite using sinusoidal basis functions, we do not observe Gibbs phenomena in approximating discontinuous target functions.

Deep Learning without Global Optimization by Random Fourier Neural Networks

TL;DR

This paper proposes a global-optimization-free training algorithm for deep residual networks with random Fourier (complex exponential) activations, termed random Fourier neural networks (rFNNs). It introduces a block-by-block training scheme where each block learns a residual correction using optimally derived frequency distributions and convex amplitude fitting, aided by adaptive Metropolis within Gibbs sampling to update frequencies. The approach achieves or surpasses the known theoretical approximation rates for rFNNs, learns high-frequency and multiscale features efficiently, and provides interpretable frequency decompositions, while avoiding Gibbs phenomena in discontinuous targets. Empirical results on discontinuous and multidimensional functions illustrate faster convergence and superior approximation relative to global optimization baselines, with potential extensions to uncertainty quantification and vector-valued tasks.

Abstract

We introduce a new training algorithm for deep neural networks that utilize random complex exponential activation functions. Our approach employs a Markov Chain Monte Carlo sampling procedure to iteratively train network layers, avoiding global and gradient-based optimization while maintaining error control. It consistently attains the theoretical approximation rate for residual networks with complex exponential activation functions, determined by network complexity. Additionally, it enables efficient learning of multiscale and high-frequency features, producing interpretable parameter distributions. Despite using sinusoidal basis functions, we do not observe Gibbs phenomena in approximating discontinuous target functions.
Paper Structure (14 sections, 3 theorems, 55 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 14 sections, 3 theorems, 55 equations, 7 figures, 5 tables, 1 algorithm.

Key Result

theorem 1

Let $Q$ be a target function in $S$, as defined in target_function_space, excluding the identically zero function. Let $Q_{\Phi}$ be a random Fourier neural network eqn:rFNN with depth $L\geq 2$, width $W\geq 1$, and parameters $\{\bm{\omega}, \bm{\omega}', \bm{b}, \bm{b}'\}$. Then there exists posi where $\hat{Q}$ is the Fourier transform of $Q$. Furthermore, for sufficiently large $WL$, with $W

Figures (7)

  • Figure 1: An rFNN with one input and $(W,L) = (2,3)$.
  • Figure 2: Mean squared error in the network predictions (red diamonds) and predicted convergence rate (black circles) as a function of network complexity $WL$ over the first 10 blocks of training.
  • Figure 3: A Fourier neural network $Q^{global}_{\Phi}$ with architecture $(W,L)=(6,3)$ trained with global Adam optimization for $30000$ epochs (left) and the loss $\mathcal{L}(\Phi)$ on the training set a function of Epoch number (right)
  • Figure 4: A discontinuous target function $Q(\theta)$.
  • Figure 5: Mean squared error for networks approximating the stairstep function pictured in Figure \ref{['fig:stairstep']} trained with Method 1 (red diamonds) and Method 2 (blue squares) after blocks $1$ through $10$ as a function of $WL$.
  • ...and 2 more figures

Theorems & Definitions (7)

  • theorem 1
  • proof
  • Corollary 1.1
  • proof
  • Remark 3.1
  • theorem 2: minimizing probability densities
  • proof