Table of Contents
Fetching ...

Nonuniform random feature models using derivative information

Konstantin Pieper, Zezhong Zhang, Guannan Zhang

TL;DR

Densities that concentrate in regions of the parameter space corresponding to neurons that are well suited to model the local derivatives of the unknown function are obtained and simplified to lead to performance of random feature models close to optimal networks in several scenarios.

Abstract

We propose nonuniform data-driven parameter distributions for neural network initialization based on derivative data of the function to be approximated. These parameter distributions are developed in the context of non-parametric regression models based on shallow neural networks, and compare favorably to well-established uniform random feature models based on conventional weight initialization. We address the cases of Heaviside and ReLU activation functions, and their smooth approximations (sigmoid and softplus), and use recent results on the harmonic analysis and sparse representation of neural networks resulting from fully trained optimal networks. Extending analytic results that give exact representation, we obtain densities that concentrate in regions of the parameter space corresponding to neurons that are well suited to model the local derivatives of the unknown function. Based on these results, we suggest simplifications of these exact densities based on approximate derivative data in the input points that allow for very efficient sampling and lead to performance of random feature models close to optimal networks in several scenarios.

Nonuniform random feature models using derivative information

TL;DR

Densities that concentrate in regions of the parameter space corresponding to neurons that are well suited to model the local derivatives of the unknown function are obtained and simplified to lead to performance of random feature models close to optimal networks in several scenarios.

Abstract

We propose nonuniform data-driven parameter distributions for neural network initialization based on derivative data of the function to be approximated. These parameter distributions are developed in the context of non-parametric regression models based on shallow neural networks, and compare favorably to well-established uniform random feature models based on conventional weight initialization. We address the cases of Heaviside and ReLU activation functions, and their smooth approximations (sigmoid and softplus), and use recent results on the harmonic analysis and sparse representation of neural networks resulting from fully trained optimal networks. Extending analytic results that give exact representation, we obtain densities that concentrate in regions of the parameter space corresponding to neurons that are well suited to model the local derivatives of the unknown function. Based on these results, we suggest simplifications of these exact densities based on approximate derivative data in the input points that allow for very efficient sampling and lead to performance of random feature models close to optimal networks in several scenarios.
Paper Structure (38 sections, 6 theorems, 102 equations, 11 figures)

This paper contains 38 sections, 6 theorems, 102 equations, 11 figures.

Key Result

Theorem 3.1

For sufficiently smooth $f \in \mathcal{V}$ (e.g., $f \in C_c^{d+s}(\mathbb{R}^d)$) we define: which is an even function for $s$ even, and an odd function for $s$ odd. Together with a polynomial $p_{f}$ of maximal degree $s-1$ the function $f$ has the unique decomposition where the polynomial and the coefficient function are uniquely determined (in the space of even or odd measures).

Figures (11)

  • Figure 1: Comparison of uniform (left, $N=30$) and nonuniform (middle, $N=30$) random feature regression and sparse infinite feature regression (right, $N=11$) with for a scaled sigmoidal activation function $\sigma(t) = 1/(1 + \exp(-t/\delta))$ with $\delta = 1/80$. The noisy data values are dashed blue, the model function is the black line, and the inflection points $x_n = -b_n/a_n$ of the sigmoids are orange crosses. The nonuniform random and sparse feature model are able to allocate more inflection points to regions of space with high variability of $f$.
  • Figure 2: Comparison of uniform (left, $N=30$) and nonuniform (middle, $N=30$) random feature regression and sparse infinite feature regression (right, $N=11$) with for a scaled sigmoidal activation function $\sigma(t) = 1/(1 + \exp(- t/\delta))$ with $\delta = 1/40$. The data values are black circles and the model function is the surface (bottom plot); the inflection hyperplanes are visualized in the top figure. The nonuniform random and sparse feature model are able to orient inflection hyperplanes with directions of variability of $f$.
  • Figure 3: Random feature models with a sigmoidal activation function, uniform weight sampling, gradient based weight sampling (each with $N=75$), and sparse feature learning with $N=31$ (from left to right). The data values are black circles and the model function is the surface plot (bottom); the inflection hyperplanes are visualized as lines (top). The nonuniform random and sparse feature model are able to orient inflection hyperplanes with respect to directions of variability of $f$ and associate them to regions in parameter space with high variability.
  • Figure 4: Convergence plot for the Gaussian bump, left: $s=1$, right: $s=2$.
  • Figure 5: Convergence plots for anisotropic examples from the introduction; left: globally anisotropic planar wave, right: locally anisotropic Gaussian "checkmark" function.
  • ...and 6 more figures

Theorems & Definitions (9)

  • Theorem 3.1: ongie2019functionparhi2021neuralRKBS:2023
  • Remark 3.2
  • Corollary 3.3
  • Proposition 3.4
  • Proposition 3.5
  • Proposition 4.1
  • proof
  • Remark 4.2
  • Proposition 4.3