Table of Contents
Fetching ...

Adaptive Random Fourier Features Training Stabilized By Resampling With Applications in Image Regression

Aku Kammonen, Anamika Pandey, Erik von Schwerin, Raúl Tempone

TL;DR

The paper addresses instability and hyperparameter sensitivity in Adaptive Random Fourier Features (ARFF) by introducing a particle-filter–style resampling mechanism. This resampling yields a stabilized training dynamic and allows Metropolis-free operation, enabling both standalone training and pretraining for gradient-based optimization. The authors demonstrate the approach on function regression and image regression by adaptively sampling RFF frequencies for the RFF layer in coordinate-based MLPs, achieving faster early convergence and improved robustness. Collectively, the work offers a practical method to automate RFF frequency selection in scalable shallow networks and RFF-enabled MLPs used for high-frequency image representation.

Abstract

This paper presents an enhanced adaptive random Fourier features (ARFF) training algorithm for shallow neural networks, building upon the work introduced in "Adaptive Random Fourier Features with Metropolis Sampling", Kammonen et al., \emph{Foundations of Data Science}, 2(3):309--332, 2020. This improved method uses a particle filter-type resampling technique to stabilize the training process and reduce the sensitivity to parameter choices. The Metropolis test can also be omitted when resampling is used, reducing the number of hyperparameters by one and reducing the computational cost per iteration compared to the ARFF method. We present comprehensive numerical experiments demonstrating the efficacy of the proposed algorithm in function regression tasks as a stand-alone method and as a pretraining step before gradient-based optimization, using the Adam optimizer. Furthermore, we apply the proposed algorithm to a simple image regression problem, illustrating its utility in sampling frequencies for the random Fourier features (RFF) layer of coordinate-based multilayer perceptrons. In this context, we use the proposed algorithm to sample the parameters of the RFF layer in an automated manner.

Adaptive Random Fourier Features Training Stabilized By Resampling With Applications in Image Regression

TL;DR

The paper addresses instability and hyperparameter sensitivity in Adaptive Random Fourier Features (ARFF) by introducing a particle-filter–style resampling mechanism. This resampling yields a stabilized training dynamic and allows Metropolis-free operation, enabling both standalone training and pretraining for gradient-based optimization. The authors demonstrate the approach on function regression and image regression by adaptively sampling RFF frequencies for the RFF layer in coordinate-based MLPs, achieving faster early convergence and improved robustness. Collectively, the work offers a practical method to automate RFF frequency selection in scalable shallow networks and RFF-enabled MLPs used for high-frequency image representation.

Abstract

This paper presents an enhanced adaptive random Fourier features (ARFF) training algorithm for shallow neural networks, building upon the work introduced in "Adaptive Random Fourier Features with Metropolis Sampling", Kammonen et al., \emph{Foundations of Data Science}, 2(3):309--332, 2020. This improved method uses a particle filter-type resampling technique to stabilize the training process and reduce the sensitivity to parameter choices. The Metropolis test can also be omitted when resampling is used, reducing the number of hyperparameters by one and reducing the computational cost per iteration compared to the ARFF method. We present comprehensive numerical experiments demonstrating the efficacy of the proposed algorithm in function regression tasks as a stand-alone method and as a pretraining step before gradient-based optimization, using the Adam optimizer. Furthermore, we apply the proposed algorithm to a simple image regression problem, illustrating its utility in sampling frequencies for the random Fourier features (RFF) layer of coordinate-based multilayer perceptrons. In this context, we use the proposed algorithm to sample the parameters of the RFF layer in an automated manner.
Paper Structure (16 sections, 17 equations, 20 figures, 9 tables, 1 algorithm)

This paper contains 16 sections, 17 equations, 20 figures, 9 tables, 1 algorithm.

Figures (20)

  • Figure 1: Test \ref{['test:statistics']} (i.e., \ref{['eq:reg_disc_data_set']} with $B$ in \ref{['eq:rot_mat']} and parameters in Table \ref{['tab:Sigint_statistics']}) based on 100 independent realizations of the stochastic algorithms for each $K$. Top row: Convergence of the minimal training and testing errors w.r.t. the number of nodes, $K$. Sample means with the error bars that indicating a confidence interval of $\pm 2$ sample standard deviations. The $\lambda\to 0$ limit of the error estimate \ref{['eq:error_bound']} is included for reference. Bottom row: Errors for $K=512$ as a function of the number of iterations. Sample means and sample means $\pm 2$ sample standard deviations.
  • Figure 2: Test \ref{['test:all_data']} (i.e., \ref{['eq:reg_disc_data_set']} with $B$ in \ref{['eq:rot_mat']} and parameters in Table \ref{['tab:Sigint_full']}) with one realization of the stochastic algorithms for each $K$. Top row: Convergence of the minimal training and testing errors w.r.t. the number of nodes, $K$. The $\lambda\to 0$ limit of the error estimate \ref{['eq:error_bound']} is included for reference. Middle row: Errors for $K=1024$ as a function of the number of iterations. Bottom row: Normalized effective sample size, $K_\mathrm{ESS}/K$, with $K_\mathrm{ESS}$ defined in \ref{['eq:ESS']}, for $K=1024$ as a function of the number of iterations.
  • Figure 3: Test \ref{['test:effect_gamma']} (i.e., \ref{['eq:reg_disc_data_set']} with $B$ in \ref{['eq:rot_mat']}) illustrating sensitivity w.r.t. $\gamma$. One realization of the stochastic algorithms is included. Left column: The case $\gamma=1$, with parameters in Table \ref{['tab:Sigint_gamma_1']}. Right column: The case $\gamma=10$, with parameters in Table \ref{['tab:Sigint_full']} for $K=256$.
  • Figure 4: Test \ref{['test:effect_batch']} (i.e., \ref{['eq:reg_disc_data_set']} with $B$ in \ref{['eq:rot_mat']}) illustrating sensitivity w.r.t. $M_B$. One realization of the stochastic algorithms is included. Parameters are the same as in the case of $K=512$ in Table \ref{['tab:Sigint_full']}, except $M_B$, which is $M_B=10^3$, (top), $M_B=10^4$, (middle), and $M_B=M$, (bottom).
  • Figure 5: Test \ref{['test:effect_batch']} (i.e., \ref{['eq:reg_disc_data_set']} with $B$ in \ref{['eq:rot_mat']}) illustrating the effective sample sizes corresponding to Figure \ref{['fig:reduced_batch_size']}. One realization of the stochastic algorithms is included. Parameters are the same as in the case of $K=512$ in Table \ref{['tab:Sigint_full']}, except $M_B$, which is $M_B=10^3$, (top), $M_B=10^4$, (middle), and $M_B=M$, (bottom).
  • ...and 15 more figures