Table of Contents
Fetching ...

A Stochastic Newton-type Method for Non-smooth Optimization

Titus Pinta

TL;DR

This work develops a stochastic, non-smooth Newton-type framework in which randomness enters solely through Hessian approximations. By leveraging stochastic process tools and a backtracking step, it derives finite-iteration and tail bounds for achieving approximate first-order optimality without requiring unbiased Hessian estimators or finite variance, and demonstrates practical effectiveness in XFEL tomography and image denoising via both random-noise and sketching approaches. The results show that stochastic Quasi-Newton methods can outperform traditional first-order methods in large-scale or physics-driven settings, with robust convergence guarantees under realistic regularity assumptions. Overall, the paper broadens the applicability of Newton-type methods to stochastic, non-smooth optimization and large-scale problems, offering rigorous performance guarantees and versatile algorithmic templates.

Abstract

We introduce a new framework for analyzing (Quasi-}Newton type methods applied to non-smooth optimization problems. The source of randomness comes from the evaluation of the (approximation) of the Hessian. We derive, using a variant of Chernoff bounds for stopping times, expectation and probability bounds for the random variable representing the number of iterations of the algorithm until approximate first order optimality conditions are validated. As an important distinction to previous results in the literature, we do not require that the estimator is unbiased or that it has finite variance. We then showcase our theoretical results in a stochastic Quasi-Newton method for X-ray free electron laser orbital tomography and in a sketched Newton method for image denoising.

A Stochastic Newton-type Method for Non-smooth Optimization

TL;DR

This work develops a stochastic, non-smooth Newton-type framework in which randomness enters solely through Hessian approximations. By leveraging stochastic process tools and a backtracking step, it derives finite-iteration and tail bounds for achieving approximate first-order optimality without requiring unbiased Hessian estimators or finite variance, and demonstrates practical effectiveness in XFEL tomography and image denoising via both random-noise and sketching approaches. The results show that stochastic Quasi-Newton methods can outperform traditional first-order methods in large-scale or physics-driven settings, with robust convergence guarantees under realistic regularity assumptions. Overall, the paper broadens the applicability of Newton-type methods to stochastic, non-smooth optimization and large-scale problems, offering rigorous performance guarantees and versatile algorithmic templates.

Abstract

We introduce a new framework for analyzing (Quasi-}Newton type methods applied to non-smooth optimization problems. The source of randomness comes from the evaluation of the (approximation) of the Hessian. We derive, using a variant of Chernoff bounds for stopping times, expectation and probability bounds for the random variable representing the number of iterations of the algorithm until approximate first order optimality conditions are validated. As an important distinction to previous results in the literature, we do not require that the estimator is unbiased or that it has finite variance. We then showcase our theoretical results in a stochastic Quasi-Newton method for X-ray free electron laser orbital tomography and in a sketched Newton method for image denoising.

Paper Structure

This paper contains 10 sections, 13 theorems, 80 equations, 3 figures, 1 algorithm.

Key Result

Proposition 1.2

Let ${\{T^k\}}_{k \in \mathbb{N}} \in {\lbrack}0, \infty{)}$ be a sequence of independent random variables. Let $\varepsilon > 0$ and assume that $\mathbb{E}(T^k) < \infty$ and $\mathbb{E}(T^k) > \varepsilon$ for all $k$. Consider the stochastic process ${\{S^k\}}_{k \in \mathbb{N}}$ defined by Then $K$, defined by is a stopping time for $S^k$ and for any $\alpha \ge 0$. Furthermore $\mathbb{E}(

Figures (3)

  • Figure 1: Mean and variance over $k$ of the steps $x_{k+1} - x_k$, and the objective value compared between Stochastic Gradient Descent and Stochastic Quasi-Newton with $M = 10$ inner iterations
  • Figure 2: Mean and variance over $k$ of the steps $x_{k+1} - x_k$, and the objective value compared between Stochastic Gradient Descent and Stochastic Quasi-Newton with $M = 100$ inner iterations
  • Figure 3: The step size of the sketched Newton algorithm with the dimension of the embedding space given as a percentage of the full space compared to a standard Newton method

Theorems & Definitions (31)

  • Definition 1.1
  • Proposition 1.2: Hitting Times are Stopping Times
  • Theorem 1.3: Doob's Optional Stopping Theorem
  • Proposition 1.4: Expectation of Sums
  • Proposition 1.5: Chernoff Bound for Stopping Times
  • proof
  • Corollary 1.6
  • Proposition 1.7: Chernoff Bound
  • Definition 2.1: Weak Uniform Newton differentiability
  • Definition 2.2: Single-Valued Adaptation
  • ...and 21 more