Hyperparameter Optimization for Randomized Algorithms: A Case Study on Random Features

Oliver R. A. Dunbar; Nicholas H. Nelsen; Maya Mutic

Hyperparameter Optimization for Randomized Algorithms: A Case Study on Random Features

Oliver R. A. Dunbar, Nicholas H. Nelsen, Maya Mutic

TL;DR

The paper tackles hyperparameter optimization for randomized feature methods used to emulate complex functions in scientific settings. It develops a black-box empirical Bayes objective for learning the RF sampling distribution and solves the resulting optimization with ensemble Kalman inversion (EKI), enabling derivative-free calibration in high dimensions. By constructing practical RF feature structures (including nonseparable and separable low-rank variants) and recasting the tuning as a stochastic inverse problem, the authors demonstrate competitive performance to Gaussian processes across global sensitivity analysis, chaotic Lorenz dynamics, and atmospheric-UQ tasks, while offering improved robustness and scalability. This work provides a principled, automated framework for hyperparameter learning in randomized algorithms and opens avenues for applying EKI-based tuning to other randomised learning tools with large input/output spaces.

Abstract

Randomized algorithms exploit stochasticity to reduce computational complexity. One important example is random feature regression (RFR) that accelerates Gaussian process regression (GPR). RFR approximates an unknown function with a random neural network whose hidden weights and biases are sampled from a probability distribution. Only the final output layer is fit to data. In randomized algorithms like RFR, the hyperparameters that characterize the sampling distribution greatly impact performance, yet are not directly accessible from samples. This makes optimization of hyperparameters via standard (gradient-based) optimization tools inapplicable. Inspired by Bayesian ideas from GPR, this paper introduces a random objective function that is tailored for hyperparameter tuning of vector-valued random features. The objective is minimized with ensemble Kalman inversion (EKI). EKI is a gradient-free particle-based optimizer that is scalable to high-dimensions and robust to randomness in objective functions. A numerical study showcases the new black-box methodology to learn hyperparameter distributions in several problems that are sensitive to the hyperparameter selection: two global sensitivity analyses, integrating a chaotic dynamical system, and solving a Bayesian inverse problem from atmospheric dynamics. The success of the proposed EKI-based algorithm for RFR suggests its potential for automated optimization of hyperparameters arising in other randomized algorithms.

Hyperparameter Optimization for Randomized Algorithms: A Case Study on Random Features

TL;DR

Abstract

Paper Structure (33 sections, 50 equations, 12 figures, 6 tables)

This paper contains 33 sections, 50 equations, 12 figures, 6 tables.

Introduction
Contributions
Outline
Bayesian Regression with Random Features
Scalar-Valued Learning
Gaussian Process Regression
Random Feature Regression
Vector-Valued Learning
Vector-Valued Gaussian Process Regression
Vector-Valued Random Feature Regression
Hyperparameter Learning for Random Feature Regression
Empirical Bayes Motivation
Ensemble Kalman Inversion
Recasting Optimization as Inversion
A Practical Feature Distribution Structure
...and 18 more sections

Figures (12)

Figure 1: Learning the $\mathbb{R}^3 \to \mathbb{R}$ Ishigami function from $300$ noisy samples. In orange, Sobol samples of the true function; in red, $300$ noisy observations; in blue, Sobol samples of the emulated functions. Row 2 is sampled from an RF emulator without hyperparameter tuning, Row 3 is sampled from a tuned GP emulator, and Row 4 is sampled from the RF approximation (using a nonseparable feature distribution with full rank covariance) using our hyperparameter optimizer.
Figure 2: First order global sensitivity analysis of the Sobol G function with input dimensions $d=3,6,10,20$. The vertical axis represents the dimensionless values and the horizontal axis represents the index. Crosses ($\times$) denote the analytic Sobol indices, circles ($\circ$) denote the empirical indices calculated at $1600\cdot d$ points, error bars denote the $(0.05,0.95)$ percentile range of $30$ random feature trials with $250 \cdot d$ training data points.
Figure 3: Views of the Sobol G function plotted against the first three variables for $d=3$, $6$, $10$, and $20$ (from top to bottom row). The red points denote the noisy observed data, and the blue represents the tuned RF emulator prediction at all other Sobol points.
Figure 4: Learning a Lorenz 63 integrator from noisy data: emulated (blue) vs. truth (orange). The first column displays the time evolution of the three state variables $x(t)$, $y(t)$, and $z(t)$ (vertical axis) as a function of time $t$ (horizontal axis) on initial conditions not seen during training. The second column visualizes the marginal probability density functions for each state variable. Row 1 shows integration with the untuned RF without hyperparameter learning. Row 2 shows integration with the GP emulator with 12 total hyperparameters learned, and Row 3 shows integration with RF using a rank-$4$ nonseparable feature distribution with 31 total hyperparameters learned. The $500$ data pairs were subjected to observational noise with covariance $\Sigma = 10^{-4}I$.
Figure 5: Comparison of true marginal empirical CDF (orange) with the CDF of the RF emulator corresponding to $30$ re-tunings of the hyperparameters (blue) when performing the experiment displayed in Figure \ref{['fig:L63']}. The noise covariance is $\Sigma=10^{-4}I$.
...and 7 more figures

Theorems & Definitions (1)

Example 1: Random Fourier Features for RBF Kernel

Hyperparameter Optimization for Randomized Algorithms: A Case Study on Random Features

TL;DR

Abstract

Hyperparameter Optimization for Randomized Algorithms: A Case Study on Random Features

Authors

TL;DR

Abstract

Table of Contents

Figures (12)

Theorems & Definitions (1)