Table of Contents
Fetching ...

Dynamic Anisotropic Smoothing for Noisy Derivative-Free Optimization

Sam Reifenstein, Timothee Leleu, Yoshihisa Yamamoto

TL;DR

Derivative-free optimization under noisy evaluations is improved by dynamic anisotropic smoothing (DAS), which adapts the sampling kernel to heterogeneous curvature by converging toward the local Hessian $\nabla^2 f$ near optima. The framework comprises DIS (dynamic isotropic smoothing) and DAS, where DAS uses a matrix $L$ to shape the window and stochastic-dynamics for $x$ and $L$, yielding Hessian-aligned gradient estimates. The authors show that the gradient-estimation error is minimized when the kernel aligns with the Hessian eigenbasis, and demonstrate superior performance over derivative-free and Bayesian tuners on artificial benchmarks and NP-hard combinatorial solvers (e.g., SAT, Ising) under noise. These results suggest robust, curvature-aware optimization in high-noise settings and broader applicability to hyperparameter tuning and neural-network training with heterogeneous sensitivity across directions.

Abstract

We propose a novel algorithm that extends the methods of ball smoothing and Gaussian smoothing for noisy derivative-free optimization by accounting for the heterogeneous curvature of the objective function. The algorithm dynamically adapts the shape of the smoothing kernel to approximate the Hessian of the objective function around a local optimum. This approach significantly reduces the error in estimating the gradient from noisy evaluations through sampling. We demonstrate the efficacy of our method through numerical experiments on artificial problems. Additionally, we show improved performance when tuning NP-hard combinatorial optimization solvers compared to existing state-of-the-art heuristic derivative-free and Bayesian optimization methods.

Dynamic Anisotropic Smoothing for Noisy Derivative-Free Optimization

TL;DR

Derivative-free optimization under noisy evaluations is improved by dynamic anisotropic smoothing (DAS), which adapts the sampling kernel to heterogeneous curvature by converging toward the local Hessian near optima. The framework comprises DIS (dynamic isotropic smoothing) and DAS, where DAS uses a matrix to shape the window and stochastic-dynamics for and , yielding Hessian-aligned gradient estimates. The authors show that the gradient-estimation error is minimized when the kernel aligns with the Hessian eigenbasis, and demonstrate superior performance over derivative-free and Bayesian tuners on artificial benchmarks and NP-hard combinatorial solvers (e.g., SAT, Ising) under noise. These results suggest robust, curvature-aware optimization in high-noise settings and broader applicability to hyperparameter tuning and neural-network training with heterogeneous sensitivity across directions.

Abstract

We propose a novel algorithm that extends the methods of ball smoothing and Gaussian smoothing for noisy derivative-free optimization by accounting for the heterogeneous curvature of the objective function. The algorithm dynamically adapts the shape of the smoothing kernel to approximate the Hessian of the objective function around a local optimum. This approach significantly reduces the error in estimating the gradient from noisy evaluations through sampling. We demonstrate the efficacy of our method through numerical experiments on artificial problems. Additionally, we show improved performance when tuning NP-hard combinatorial optimization solvers compared to existing state-of-the-art heuristic derivative-free and Bayesian optimization methods.
Paper Structure (39 sections, 70 equations, 12 figures, 4 tables, 1 algorithm)

This paper contains 39 sections, 70 equations, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: Left: Cartoon depiction of different derivative-free optimization methods discussed in this work while optimizing a 2-D objective function. Red represents the distribution of sampled points while the blue dot is the optimum. Middle: Sampling window of DAS while optimizing 2-dimensional modified Rosenbrock function (see section \ref{['sec: rosenbrock']} for details). Red ellipses represent one standard deviation of the Gaussian sampling distribution. As the algorithm progresses, the window adapts to the shape of the fitness function helping convergence. Right: Example of gradient ascent on $h(w,x)$ for a one-dimensional Gaussian fitness function in an ideal noiseless setting. The optimum is obtained by updating both window size $w$ and position $x$ according to the gradient of $h$ until the window size shrinks to 0 and $x$ converges to the optimum of $f$. This is the principle for which DIS (dynamic isotropic smoothing) and DAS (dynamic anisotropic smoothing) are based upon.
  • Figure 2: Fitness as a function of $n_s$ on the modified Rosenbrock function in different dimensions for four different algorithms. Left: $D = 2, \beta = 0.5$, Middle: $D = 4, \beta = 0.5$, Right: $D = 8, \beta = 0.2$. Traces are averages over 5 runs, and shaded regions represens one standard deviation of the data. Except for BOHB, each algorithm is initialized with an initial condition in $[0,1]^D$ and BOHB uses $[0,1]^D$ as the sampling window.
  • Figure 3: Left: Average success probability obtained by different tuning methods on random 3-SAT with $N = 150, \alpha = 4.0, T= 148$. Averages are over 5 realizations of the tuning dynamics starting at different randomized positions. The shaded region represents one standard deviation of the data. To evaluate the fitness of each parameter configuration, 20 random SAT instances are generated and 50 trajectories are evaluated for each. Right: A similar plot for tuning an Ising solver on problem size $N=150$ in which the performance improvement provided by DAS is clearer. This result is not discussed in the main text but is included in appendix \ref{['sec: ising']}
  • Figure 4: Error in gradient estimation $E = \sum_i Var[(\eta_x)_i]$ of the gradient of $x$ with $\text{Var}[(\eta_x)_i] = (\text{Var}[X_i] + \text{Var}[X_i] - 2 \text{Cov}[X_i,Y_i] )$ for a Gaussian objective function $f(x) = \kappa(M(x-\bar{x}))$ centered in $\bar{x}$ and Hessian $H$ with eigenvalues $\lambda'_1 = 1$ and $\lambda'_2 = 4$ rotated by a reference angle $\theta_0$ and anisotropic smoothing kernel centered in $\bar{x}$ and curvature $(L L^{\top})^{-1}$ with with eigenvalues $\lambda_1 = 0.01$ and $\lambda_2 = 0.04$ and rotation $\theta$. The gradient estimation error $E$ is minimized for $\theta = \theta_0$. The dashed lines show the approximations of eq. (\ref{['eq: approx_eta_x']}). $D=2$.
  • Figure 5: Ratio $\frac{\lambda_1}{\lambda_2}$ of the eigenvalues of the Hessian of the smoothing kernel $(LL^{\top})^{-1}$ noted $\lambda_1$ and $\lambda_2$ vs. the number of samples for various ratio $\frac{\lambda'_1}{\lambda'_2}$ of the Hessian of $f$ when $f$ is a Gaussian function. The dynamics converge to $\frac{\lambda_1}{\lambda_2} = \frac{\lambda'_1}{\lambda'_2}$. $D=2$.
  • ...and 7 more figures