Table of Contents
Fetching ...

Soft quasi-Newton: Guaranteed positive definiteness by relaxing the secant constraint

Erik Berglund, Jiaojiao Zhang, Mikael Johansson

TL;DR

An algorithm is proposed that exhibits linear convergence toward a neighborhood of the optimal solution even when gradient and function evaluations are subject to bounded perturbations and consistently outperforms state-of-the-art methods across a range of scenarios.

Abstract

We propose a novel algorithm, termed soft quasi-Newton (soft QN), for optimization in the presence of bounded noise. Traditional quasi-Newton algorithms are vulnerable to such perturbations. To develop a more robust quasi-Newton method, we replace the secant condition in the matrix optimization problem for the Hessian update with a penalty term in its objective and derive a closed-form update formula. A key feature of our approach is its ability to maintain positive definiteness of the Hessian inverse approximation. Furthermore, we establish the following properties of soft QN: it recovers the BFGS method under specific limits, it treats positive and negative curvature equally, and it is scale invariant. Collectively, these features enhance the efficacy of soft QN in noisy environments. For strongly convex objective functions and Hessian approximations obtained using soft QN, we develop an algorithm that exhibits linear convergence toward a neighborhood of the optimal solution, even if gradient and function evaluations are subject to bounded perturbations. Through numerical experiments, we demonstrate superior performance of soft QN compared to state-of-the-art methods in various scenarios.

Soft quasi-Newton: Guaranteed positive definiteness by relaxing the secant constraint

TL;DR

An algorithm is proposed that exhibits linear convergence toward a neighborhood of the optimal solution even when gradient and function evaluations are subject to bounded perturbations and consistently outperforms state-of-the-art methods across a range of scenarios.

Abstract

We propose a novel algorithm, termed soft quasi-Newton (soft QN), for optimization in the presence of bounded noise. Traditional quasi-Newton algorithms are vulnerable to such perturbations. To develop a more robust quasi-Newton method, we replace the secant condition in the matrix optimization problem for the Hessian update with a penalty term in its objective and derive a closed-form update formula. A key feature of our approach is its ability to maintain positive definiteness of the Hessian inverse approximation. Furthermore, we establish the following properties of soft QN: it recovers the BFGS method under specific limits, it treats positive and negative curvature equally, and it is scale invariant. Collectively, these features enhance the efficacy of soft QN in noisy environments. For strongly convex objective functions and Hessian approximations obtained using soft QN, we develop an algorithm that exhibits linear convergence toward a neighborhood of the optimal solution, even if gradient and function evaluations are subject to bounded perturbations. Through numerical experiments, we demonstrate superior performance of soft QN compared to state-of-the-art methods in various scenarios.
Paper Structure (18 sections, 6 theorems, 40 equations, 4 figures, 3 tables, 2 algorithms)

This paper contains 18 sections, 6 theorems, 40 equations, 4 figures, 3 tables, 2 algorithms.

Key Result

Theorem 3.1

For every $\alpha_k>0$ and every $H_k \succ 0$, there exists a unique positive definite solution $B^{\star}$ to eqn:soft_QN_problem with the function $\upsilon$ defined in eqn:penalty_term. Letting $H_{k+1}=(B^{\star})^{-1}$ leads to the recursive update where

Figures (4)

  • Figure 1: The soft QN method detects a diagonal direction of low curvature and takes a large step along it. The saddle-free Newton method is unable to perceive the low curvature in that direction and moves further toward the saddle point before changing its course.
  • Figure 2: Logistic regression experiments. The 10-logarithm of the gradient norm is plotted against iteration number. Solid lines indicate the mean over all instances in the Monte Carlo simulation, while shaded areas represent three standard deviation confidence intervals.
  • Figure 3: Experiments on quadratic problems. The 10-logarithm of the normalized suboptimality is plotted against iteration number. Solid lines show the mean over all instances in the Monte Carlo simulation, shaded areas represent three standard deviation confidence intervals.
  • Figure 4: Comparision of soft QN and SP-BFGS on the CUTEst-problem DIXMAANA. Interval of suboptimality for different test runs, plotted on a logarithmic scale against number of function evaluations.

Theorems & Definitions (7)

  • Theorem 3.1
  • Theorem 3.2
  • Proposition 3.3
  • Lemma 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Remark 1