Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming

Sen Na; Michael W. Mahoney

Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming

Sen Na, Michael W. Mahoney

TL;DR

This work develops AI-StoSQP, an online, projection-free approach to constrained stochastic nonlinear optimization that uses a sketching-based inexact Newton step together with an adaptive stepsize rule. The authors prove global almost-sure convergence of the method and establish a local asymptotic normality result for the joint primal-dual iterates, showing that, after proper rescaling, the error converges to a mean-zero Gaussian with a covariance that depends on the sketching distribution. They introduce a practical plug-in covariance estimator for online uncertainty quantification and demonstrate the method on benchmark nonlinear problems (CUTEst) and constrained regression tasks, highlighting the trade-offs between exact and inexact Newton solves. The results show that online inference with StoSQP is feasible and informative for constrained estimation under streaming data, with asymptotic covariance approaching minimax-optimal limits when the Newton system is solved exactly, and a controlled inflation when sketched Newton steps are used. Overall, the paper provides a principled framework for uncertainty quantification in online constrained optimization and offers actionable guidance on the use of sketching in second-order online methods for scalable inference.

Abstract

We consider online statistical inference of constrained stochastic nonlinear optimization problems. We apply the Stochastic Sequential Quadratic Programming (StoSQP) method to solve these problems, which can be regarded as applying second-order Newton's method to the Karush-Kuhn-Tucker (KKT) conditions. In each iteration, the StoSQP method computes the Newton direction by solving a quadratic program, and then selects a proper adaptive stepsize $\barα_t$ to update the primal-dual iterate. To reduce dominant computational cost of the method, we inexactly solve the quadratic program in each iteration by employing an iterative sketching solver. Notably, the approximation error of the sketching solver need not vanish as iterations proceed, meaning that the per-iteration computational cost does not blow up. For the above StoSQP method, we show that under mild assumptions, the rescaled primal-dual sequence $1/\sqrt{\barα_t}\cdot (x_t - x^\star, λ_t - λ^\star)$ converges to a mean-zero Gaussian distribution with a nontrivial covariance matrix depending on the underlying sketching distribution. To perform inference in practice, we also analyze a plug-in covariance matrix estimator. We illustrate the asymptotic normality result of the method both on benchmark nonlinear problems in CUTEst test set and on linearly/nonlinearly constrained regression problems.

Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming

TL;DR

Abstract

to update the primal-dual iterate. To reduce dominant computational cost of the method, we inexactly solve the quadratic program in each iteration by employing an iterative sketching solver. Notably, the approximation error of the sketching solver need not vanish as iterations proceed, meaning that the per-iteration computational cost does not blow up. For the above StoSQP method, we show that under mild assumptions, the rescaled primal-dual sequence

converges to a mean-zero Gaussian distribution with a nontrivial covariance matrix depending on the underlying sketching distribution. To perform inference in practice, we also analyze a plug-in covariance matrix estimator. We illustrate the asymptotic normality result of the method both on benchmark nonlinear problems in CUTEst test set and on linearly/nonlinearly constrained regression problems.

Paper Structure (47 sections, 24 theorems, 212 equations, 2 figures, 11 tables, 1 algorithm)

This paper contains 47 sections, 24 theorems, 212 equations, 2 figures, 11 tables, 1 algorithm.

Introduction
Applications and Literature Review
Motivating examples
Related literature and contribution
Adaptive Inexact StoSQP Method
Global Almost Sure Convergence
Assumptions and preliminary results
Almost sure convergence
Statistical Inference via StoSQP
Iteration recursion
Asymptotic rate and normality
An estimator of the covariance matrix
Numerical Experiments
Benchmark constrained problems
Constrained regression problems
...and 32 more sections

Key Result

Lemma 4.4

Under Assumption ass:3, for all $t\geq 0$:

Figures (2)

Figure 1: Convergence plots of CUTEst problems. Each row corresponds to one problem and has three figures in the $\log$ scale. From the left to the right, they correspond to $\|\nabla\mathcal{L}_t\|$ v.s. $t$, $\|({\boldsymbol{x}}_t-{\boldsymbol{x}}^\star, {\boldsymbol{\lambda}}_t-{\boldsymbol{\lambda}}^\star)\|$ v.s. $t$, and $\|K_t-K^\star\|$ v.s. $t$. Each figure has five lines; four lines correspond to four setups of $\sigma^2$, and the red line corresponds to $\sqrt{\beta_t\log(1/\beta_t)}$ v.s. $t$, which is the theoretical asymptotic rate.
Figure 2: Convergence plots of CUTEst problems. See Figure \ref{['fig:1']} for the interpretation.

Theorems & Definitions (32)

Example 2.1: Constrained regression problems
Example 2.2: Physics-informed machine learning
Remark 3.1
Remark 3.2
Lemma 4.4: Guarantees of sketching solvers
Remark 4.5
Lemma 4.6
Lemma 4.7
Theorem 4.8: Global convergence
Corollary 4.9
...and 22 more

Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming

TL;DR

Abstract

Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (32)