Table of Contents
Fetching ...

Online Covariance Matrix Estimation in Sketched Newton Methods

Wei Kuang, Mihai Anitescu, Sen Na

TL;DR

This work develops an online, fully online covariance estimator for sketched Newton methods in stochastic optimization, enabling online inference from second-order updates without costly Hessian inversions. By leveraging a weighted, batch-free covariance estimator constructed from Newton iterates, the authors establish consistency and convergence rates, and show that online confidence intervals can be built directly from the iterates. The method remains efficient even with sketching and adaptive step sizes, and extends naturally to constrained problems and sketched SQP. Empirical results on linear and logistic regression, as well as CUTEst benchmarks, demonstrate accurate covariance estimation and reliable confidence intervals, highlighting the practical impact for streaming, large-scale, second-order online learning.

Abstract

Given the ubiquity of streaming data, online algorithms have been widely used for parameter estimation, with second-order methods particularly standing out for their efficiency and robustness. In this paper, we study an online sketched Newton method that leverages a randomized sketching technique to perform an approximate Newton step in each iteration, thereby eliminating the computational bottleneck of second-order methods. While existing studies have established the asymptotic normality of sketched Newton methods, a consistent estimator of the limiting covariance matrix remains an open problem. We propose a fully online covariance matrix estimator that is constructed entirely from the Newton iterates and requires no matrix factorization. Compared to covariance estimators for first-order online methods, our estimator for second-order methods is batch-free. We establish the consistency and convergence rate of our estimator, and coupled with asymptotic normality results, we can then perform online statistical inference for the model parameters based on sketched Newton methods. We also discuss the extension of our estimator to constrained problems, and demonstrate its superior performance on regression problems as well as benchmark problems in the CUTEst set.

Online Covariance Matrix Estimation in Sketched Newton Methods

TL;DR

This work develops an online, fully online covariance estimator for sketched Newton methods in stochastic optimization, enabling online inference from second-order updates without costly Hessian inversions. By leveraging a weighted, batch-free covariance estimator constructed from Newton iterates, the authors establish consistency and convergence rates, and show that online confidence intervals can be built directly from the iterates. The method remains efficient even with sketching and adaptive step sizes, and extends naturally to constrained problems and sketched SQP. Empirical results on linear and logistic regression, as well as CUTEst benchmarks, demonstrate accurate covariance estimation and reliable confidence intervals, highlighting the practical impact for streaming, large-scale, second-order online learning.

Abstract

Given the ubiquity of streaming data, online algorithms have been widely used for parameter estimation, with second-order methods particularly standing out for their efficiency and robustness. In this paper, we study an online sketched Newton method that leverages a randomized sketching technique to perform an approximate Newton step in each iteration, thereby eliminating the computational bottleneck of second-order methods. While existing studies have established the asymptotic normality of sketched Newton methods, a consistent estimator of the limiting covariance matrix remains an open problem. We propose a fully online covariance matrix estimator that is constructed entirely from the Newton iterates and requires no matrix factorization. Compared to covariance estimators for first-order online methods, our estimator for second-order methods is batch-free. We establish the consistency and convergence rate of our estimator, and coupled with asymptotic normality results, we can then perform online statistical inference for the model parameters based on sketched Newton methods. We also discuss the extension of our estimator to constrained problems, and demonstrate its superior performance on regression problems as well as benchmark problems in the CUTEst set.

Paper Structure

This paper contains 36 sections, 19 theorems, 176 equations, 2 figures, 7 tables, 1 algorithm.

Key Result

Theorem 3.5

Consider the iteration scheme nequ:7. Suppose Assumptions ass:1 -- ass:4 hold, the number of sketches satisfies $\tau\geq \log(\gamma_H/4\Upsilon_H)/\log \rho$ with $\rho=1-\gamma_S$, and the stepsize parameters satisfy $\beta\in(0.5,1]$, $\chi>0.5(\beta+1)$, and $c_{\beta}, c_{\chi}>0$. Then, we ha

Figures (2)

  • Figure 1: The averaged trajectories for linear regression problems with $d=5$ and Equi-correlation $\Sigma_a \;(r=0.3)$. From left to right, the columns correspond to SGD, exact Newton method, and sketched Newton method $(\tau = 2)$. For averaged SGD, the limiting covariance $\Xi^\star$ is estimated using the batch-means estimator $\bar{\Xi}_t$. For exact and sketched Newton methods, $\Xi^\star$ is estimated using both the plug-in estimator $\widetilde{\Xi}_t$ and the proposed sample covariance $\widehat{\Xi}_t$. The first row shows the log relative covariance estimation error $(\textit{e.g.,}\; \log(\|\widehat{\Xi}_t-\Xi^\star\|/\|\Xi^\star\|))$ v.s $\log t$. The second row shows the coverage rate of the 95% confidence intervals for $\sum_{i=1}^d{\boldsymbol{x}}^{\star}_i/d$ constructed using corresponding estimators of $\Xi^\star$. The third row shows the coverage rate of the oracle 95% confidence intervals, where the oracle confidence intervals are constructed using the true covariance $\Xi^\star$. The figures demonstrate the consistency of $\widehat{\Xi}_t$ and its superior performance in statistical inference.
  • Figure 2: The averaged trajectories for logistic regression problems with $d=5$ and Toeplitz $\Sigma_a$$(r=0.6)$. See Figure \ref{['fig:1']} for interpretation.

Theorems & Definitions (20)

  • Theorem 3.5: Almost sure convergence
  • Theorem 3.6: Asymptotic normality
  • Remark 4.1
  • Lemma 4.2: Error bounds of ${\boldsymbol{x}}_t$ and $B_t$
  • Theorem 4.3
  • Corollary 4.4
  • Lemma B.1: Na2022Statistical, Lemma B.1
  • Lemma B.2: Na2022Statistical, Lemma B.3(a)
  • Lemma B.3
  • Lemma B.4
  • ...and 10 more