Table of Contents
Fetching ...

Online estimation of the inverse of the Hessian for stochastic optimization with application to universal stochastic Newton algorithms

Antoine Godichon-Baggioni, Wei Lu, Bruno Portier

TL;DR

The paper tackles online second-order stochastic optimization by directly estimating the inverse Hessian $H^{-1}$ via a Robbins-Monro recursion, avoiding explicit Hessian inversion and achieving $\mathcal{O}(d^2)$ per-iteration complexity with a randomized update using $Z_n$. It introduces the Universal Stochastic Newton Algorithm (USNA) and its weighted averaged variant UWASNA, proving consistency, convergence rates, and asymptotic efficiency for the parameter $\theta$, as well as rates for the Hessian inverse estimates. Through extensive simulations and real-data experiments, the methods demonstrate competitive performance against Riccati-based stochastic Newton algorithms and provide viable options when Riccati updates are infeasible (e.g., spherical distributions, $p$-means). The results highlight the practical impact of a Riccati-free, online second-order approach for diverse stochastic optimization problems, including logistic regression, geometric median, and higher-order statistical functionals. The framework is supported by rigorous proofs detailing convergence, rate results, and stability properties under clearly stated assumptions.

Abstract

This paper addresses second-order stochastic optimization for estimating the minimizer of a convex function written as an expectation. A direct recursive estimation technique for the inverse Hessian matrix using a Robbins-Monro procedure is introduced. This approach enables to drastically reduces computational complexity. Above all, it allows to develop universal stochastic Newton methods and investigate the asymptotic efficiency of the proposed approach. This work so expands the application scope of secondorder algorithms in stochastic optimization.

Online estimation of the inverse of the Hessian for stochastic optimization with application to universal stochastic Newton algorithms

TL;DR

The paper tackles online second-order stochastic optimization by directly estimating the inverse Hessian via a Robbins-Monro recursion, avoiding explicit Hessian inversion and achieving per-iteration complexity with a randomized update using . It introduces the Universal Stochastic Newton Algorithm (USNA) and its weighted averaged variant UWASNA, proving consistency, convergence rates, and asymptotic efficiency for the parameter , as well as rates for the Hessian inverse estimates. Through extensive simulations and real-data experiments, the methods demonstrate competitive performance against Riccati-based stochastic Newton algorithms and provide viable options when Riccati updates are infeasible (e.g., spherical distributions, -means). The results highlight the practical impact of a Riccati-free, online second-order approach for diverse stochastic optimization problems, including logistic regression, geometric median, and higher-order statistical functionals. The framework is supported by rigorous proofs detailing convergence, rate results, and stability properties under clearly stated assumptions.

Abstract

This paper addresses second-order stochastic optimization for estimating the minimizer of a convex function written as an expectation. A direct recursive estimation technique for the inverse Hessian matrix using a Robbins-Monro procedure is introduced. This approach enables to drastically reduces computational complexity. Above all, it allows to develop universal stochastic Newton methods and investigate the asymptotic efficiency of the proposed approach. This work so expands the application scope of secondorder algorithms in stochastic optimization.
Paper Structure (32 sections, 9 theorems, 235 equations, 5 figures, 2 tables)

This paper contains 32 sections, 9 theorems, 235 equations, 5 figures, 2 tables.

Key Result

Theorem 3.1

Assume that Assumptions (A2) to (A4) hold, that $\frac{1-\gamma}{q-1}<\beta<\gamma - \frac{1}{2}$, and that there is an estimate $\hat{\theta}_n$ satisfying for all $\delta >0$ with $a >0$. Then $A_n$ and $A_{n,\tau}$ defined by An and Antau satisfy

Figures (5)

  • Figure 1: Evolution of the mean squared error with respect to the sample size for logistic regression.
  • Figure 2: Evolution of the mean squared error with respect to the sample size for geometric median estimation.
  • Figure 3: Evolution of the mean squared error with respect to the sample size for parameters estimation in a spherical Gaussian distribution.
  • Figure 4: Frobenius norm of the difference between the estimated matrix $A_n$ and the true matrix $H^{-1}$.
  • Figure 5: Evolution of the mean squared error with respect to the sample size for p-means estimation.

Theorems & Definitions (14)

  • Theorem 3.1
  • Theorem 4.1
  • Theorem 4.2
  • Remark 4.1
  • Proposition 6.1
  • proof : Proof of Proposition \ref{['grandvp']}
  • Proposition 6.2
  • Lemma B.1
  • Lemma B.2
  • proof : Proof of Lemma \ref{['corRS']}
  • ...and 4 more