Table of Contents
Fetching ...

Retrospective Approximation Sequential Quadratic Programming for Stochastic Optimization with General Deterministic Nonlinear Constraints

Albert S. Berahas, Raghu Bollapragada, Shagun Gupta

TL;DR

This work introduces a Retrospective Approximation (RA) framework for stochastic optimization with general nonlinear deterministic constraints, enabling the use of deterministic solvers by solving increasingly accurate subsampled problems. It presents two core instantiations: (i) an equality-constrained RA framework with deterministic SQP yielding optimal gradient and linear-system complexity, and (ii) an RA-based method using robust-SQP for general constraints, with adaptive sampling and convergence guarantees. The paper provides comprehensive theoretical analysis of convergence and complexity, and demonstrates strong empirical performance on regularized multi-class logistic regression and CUTEst benchmark problems, including adaptations with quasi-Newton Hessians and inexact inner solves. This framework offers a practical bridge between stochastic optimization and mature deterministic methods, delivering fast convergence, robustness to sampling, and scalable performance for large-scale constrained problems.

Abstract

In this paper, we propose a framework based on the Retrospective Approximation (RA) paradigm to solve optimization problems with a stochastic objective function and general nonlinear deterministic constraints. This framework sequentially constructs increasingly accurate approximations of the true problems which are solved to a specified accuracy via a deterministic solver, thereby decoupling the uncertainty from the optimization. Such frameworks retain the advantages of deterministic optimization methods, such as fast convergence, while achieving the optimal performance of stochastic methods without the need to redesign algorithmic components. For problems with general nonlinear equality constraints, we present a framework that can employ any deterministic solver and analyze its theoretical work complexity. We then present an instance of the framework that employs a deterministic Sequential Quadratic Programming (SQP) method and that achieves optimal complexity in terms of gradient evaluations and linear system solves for this class of problems. For problems with general nonlinear constraints, we present an RA-based algorithm that employs an SQP method with robust subproblems. Finally, we demonstrate the empirical performance of the proposed framework on multi-class logistic regression problems and benchmark instances from the CUTEst test set, comparing its results to established methods from the literature.

Retrospective Approximation Sequential Quadratic Programming for Stochastic Optimization with General Deterministic Nonlinear Constraints

TL;DR

This work introduces a Retrospective Approximation (RA) framework for stochastic optimization with general nonlinear deterministic constraints, enabling the use of deterministic solvers by solving increasingly accurate subsampled problems. It presents two core instantiations: (i) an equality-constrained RA framework with deterministic SQP yielding optimal gradient and linear-system complexity, and (ii) an RA-based method using robust-SQP for general constraints, with adaptive sampling and convergence guarantees. The paper provides comprehensive theoretical analysis of convergence and complexity, and demonstrates strong empirical performance on regularized multi-class logistic regression and CUTEst benchmark problems, including adaptations with quasi-Newton Hessians and inexact inner solves. This framework offers a practical bridge between stochastic optimization and mature deterministic methods, delivering fast convergence, robustness to sampling, and scalable performance for large-scale constrained problems.

Abstract

In this paper, we propose a framework based on the Retrospective Approximation (RA) paradigm to solve optimization problems with a stochastic objective function and general nonlinear deterministic constraints. This framework sequentially constructs increasingly accurate approximations of the true problems which are solved to a specified accuracy via a deterministic solver, thereby decoupling the uncertainty from the optimization. Such frameworks retain the advantages of deterministic optimization methods, such as fast convergence, while achieving the optimal performance of stochastic methods without the need to redesign algorithmic components. For problems with general nonlinear equality constraints, we present a framework that can employ any deterministic solver and analyze its theoretical work complexity. We then present an instance of the framework that employs a deterministic Sequential Quadratic Programming (SQP) method and that achieves optimal complexity in terms of gradient evaluations and linear system solves for this class of problems. For problems with general nonlinear constraints, we present an RA-based algorithm that employs an SQP method with robust subproblems. Finally, we demonstrate the empirical performance of the proposed framework on multi-class logistic regression problems and benchmark instances from the CUTEst test set, comparing its results to established methods from the literature.

Paper Structure

This paper contains 26 sections, 19 theorems, 99 equations, 9 figures, 2 algorithms.

Key Result

Lemma 2.2

Suppose Assumptions ass:EQ_base assumption and ass:well_posed hold. Then, for all $k \geq 0$, the outer iterates generated by alg:Equality_Constrained_RA satisfy For the expectation problem eq:intro_stoch_error_obj, for all $k \geq 0$

Figures (9)

  • Figure 1: Constraint violation ($\|c(x)\|_{\infty}$) and Lagrangian gradient norm ($\|\nabla_x {\cal L}(x,\lambda^*)\|_{\infty}$) with optimized dual variable $\lambda^*$, with respect to number of gradient evaluations and number of MINRES iterations for stochastic SQP ( "S-SQP" berahas2021sequential), adaptive sampling SQP ("AS-SQP" berahas2022adaptive), deterministic SQP ("SQP" nocedal2006numericalberahas2021sequential) and our proposed algorithms "RA-SQP $\|d\|$", "RA-SQP $\Delta l$", and "RA-SQP $\Delta l$ Inexact" over the multi-class logistic regression problem \ref{['eq:logreg_obj']} with equality regularization constraints for the covtype dataset ($n_f = 55$, $|{\cal K}| = 7$, $|{\cal S}| = 581,012$, CC01a).
  • Figure 2: Constraint violation ($\|c(x)\|_{\infty}$) and Lagrangian gradient norm ($\|\nabla_x {\cal L}(x,\lambda^*)\|_{\infty}$) with optimized dual variable $\lambda^*$, with respect to number of gradient evaluations and number of MINRES iterations for stochastic SQP ( "S-SQP" berahas2021sequential), adaptive sampling SQP ("AS-SQP" berahas2022adaptive), deterministic SQP ("SQP" nocedal2006numericalberahas2021sequential) and our proposed algorithms "RA-SQP $\|d\|$", "RA-SQP $\Delta l$", and "RA-SQP $\Delta l$ Inexact" over the multi-class logistic regression problem \ref{['eq:logreg_obj']} with equality regularization constraints for the mnist dataset ($n_f = 781$, $|{\cal K}| = 10$, $|{\cal S}| = 60,000$, CC01a).
  • Figure 3: Constraint violation ($\|c(x)\|_{\infty}$) and Lagrangian gradient norm ($\|\nabla_x {\cal L}(x,\lambda^*)\|_{\infty}$) with optimized dual variable $\lambda^*$, with respect to number of gradient evaluations and number of MINRES iterations for "RA-SQP $\Delta l$" with and without L-BFGS approximations and exact and inexact SQP linear system solutions over the multi-class logistic regression problem \ref{['eq:logreg_obj']} with equality regularization constraints for the mnist dataset ($n_f = 781$, $|{\cal K}| = 10$, $|{\cal S}| = 60,000$, CC01a).
  • Figure 4: Number of Inner Iterations ($N_k$) and Batch Size ($|S_k|$) with respect to outer iterations and number of gradient evaluations for "RA-SQP $\Delta l$" with and without L-BFGS approximations and exact and inexact SQP linear system solutions over the multi-class logistic regression problem \ref{['eq:logreg_obj']} with equality regularization constraints for the mnist dataset ($n_f = 781$, $|{\cal K}| = 10$, $|{\cal S}| = 60,000$, CC01a).
  • Figure 5: Performance profiles for feaibility and stationarity errors with respect to number of gradient evaluations and number of MINRES iterations for stochastic SQP ( "S-SQP" berahas2021sequential), adaptive sampling SQP ("AS-SQP" berahas2022adaptive) and our proposed algorithm "RA-SQP $\Delta l$" with and without L-BFGS Hessian approximations and exact and inexact SQP linear system solutions over the CUTEst problem set gratton2024s2mpj to accuracy levels $\epsilon_{tol} \in \{10^{-1}, 10^{-2}, 10^{-3}, 10^{-4}\}$ that are decreasing going down the rows.
  • ...and 4 more figures

Theorems & Definitions (46)

  • Remark 1.1
  • Remark 2.1
  • Lemma 2.2
  • proof
  • Theorem 2.3
  • proof
  • Remark 2.4
  • Theorem 2.5
  • proof
  • Corollary 2.6
  • ...and 36 more