Online estimation of the inverse of the Hessian for stochastic optimization with application to universal stochastic Newton algorithms

Antoine Godichon-Baggioni; Wei Lu; Bruno Portier

Online estimation of the inverse of the Hessian for stochastic optimization with application to universal stochastic Newton algorithms

Antoine Godichon-Baggioni, Wei Lu, Bruno Portier

TL;DR

The paper tackles online second-order stochastic optimization by directly estimating the inverse Hessian $H^{-1}$ via a Robbins-Monro recursion, avoiding explicit Hessian inversion and achieving $\mathcal{O}(d^2)$ per-iteration complexity with a randomized update using $Z_n$. It introduces the Universal Stochastic Newton Algorithm (USNA) and its weighted averaged variant UWASNA, proving consistency, convergence rates, and asymptotic efficiency for the parameter $\theta$, as well as rates for the Hessian inverse estimates. Through extensive simulations and real-data experiments, the methods demonstrate competitive performance against Riccati-based stochastic Newton algorithms and provide viable options when Riccati updates are infeasible (e.g., spherical distributions, $p$-means). The results highlight the practical impact of a Riccati-free, online second-order approach for diverse stochastic optimization problems, including logistic regression, geometric median, and higher-order statistical functionals. The framework is supported by rigorous proofs detailing convergence, rate results, and stability properties under clearly stated assumptions.

Abstract

This paper addresses second-order stochastic optimization for estimating the minimizer of a convex function written as an expectation. A direct recursive estimation technique for the inverse Hessian matrix using a Robbins-Monro procedure is introduced. This approach enables to drastically reduces computational complexity. Above all, it allows to develop universal stochastic Newton methods and investigate the asymptotic efficiency of the proposed approach. This work so expands the application scope of secondorder algorithms in stochastic optimization.

Online estimation of the inverse of the Hessian for stochastic optimization with application to universal stochastic Newton algorithms

TL;DR

The paper tackles online second-order stochastic optimization by directly estimating the inverse Hessian

via a Robbins-Monro recursion, avoiding explicit Hessian inversion and achieving

per-iteration complexity with a randomized update using

. It introduces the Universal Stochastic Newton Algorithm (USNA) and its weighted averaged variant UWASNA, proving consistency, convergence rates, and asymptotic efficiency for the parameter

, as well as rates for the Hessian inverse estimates. Through extensive simulations and real-data experiments, the methods demonstrate competitive performance against Riccati-based stochastic Newton algorithms and provide viable options when Riccati updates are infeasible (e.g., spherical distributions,

-means). The results highlight the practical impact of a Riccati-free, online second-order approach for diverse stochastic optimization problems, including logistic regression, geometric median, and higher-order statistical functionals. The framework is supported by rigorous proofs detailing convergence, rate results, and stability properties under clearly stated assumptions.

Abstract

Paper Structure (32 sections, 9 theorems, 235 equations, 5 figures, 2 tables)

This paper contains 32 sections, 9 theorems, 235 equations, 5 figures, 2 tables.

Introduction
Framework
Estimation of the Hessian inverse
Universal Weighted Averaged Stochastic Newton Algorithm
Applications
Choice of the hyperparameters
Comparison with Riccati Newton
Logistic regression
Geometric Median
Cases where the Riccati formula cannot be used
Spherical Distribution
p-means
Application to real data
Proofs
Notations and preliminary definitions
...and 17 more sections

Key Result

Theorem 3.1

Assume that Assumptions (A2) to (A4) hold, that $\frac{1-\gamma}{q-1}<\beta<\gamma - \frac{1}{2}$, and that there is an estimate $\hat{\theta}_n$ satisfying for all $\delta >0$ with $a >0$. Then $A_n$ and $A_{n,\tau}$ defined by An and Antau satisfy

Figures (5)

Figure 1: Evolution of the mean squared error with respect to the sample size for logistic regression.
Figure 2: Evolution of the mean squared error with respect to the sample size for geometric median estimation.
Figure 3: Evolution of the mean squared error with respect to the sample size for parameters estimation in a spherical Gaussian distribution.
Figure 4: Frobenius norm of the difference between the estimated matrix $A_n$ and the true matrix $H^{-1}$.
Figure 5: Evolution of the mean squared error with respect to the sample size for p-means estimation.

Theorems & Definitions (14)

Theorem 3.1
Theorem 4.1
Theorem 4.2
Remark 4.1
Proposition 6.1
proof : Proof of Proposition \ref{['grandvp']}
Proposition 6.2
Lemma B.1
Lemma B.2
proof : Proof of Lemma \ref{['corRS']}
...and 4 more

Online estimation of the inverse of the Hessian for stochastic optimization with application to universal stochastic Newton algorithms

TL;DR

Abstract

Online estimation of the inverse of the Hessian for stochastic optimization with application to universal stochastic Newton algorithms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (14)