Table of Contents
Fetching ...

Learning Acceleration Algorithms for Fast Parametric Convex Optimization with Certified Robustness

Rajiv Sambharya, Jinho Bok, Nikolai Matni, George Pappas

TL;DR

This work tackles fast, robust solution of parametric convex optimization by learning acceleration hyperparameters for momentum-based first-order methods within a finite iteration budget. It unites gradient-based training with a robustness-oriented objective derived from the Performance Estimation Problem (PEP) framework, yielding worst-case guarantees over a parameter set while maintaining empirical performance on distributional samples. The method supports accelerated gradient, proximal gradient, and ADMM-based solvers (including OSQP and SCS), with time-varying iteration weights and time-invariant ADMM parameters. Empirical results across logistic regression, sparse coding, nonnegative least squares, MPC for a quadcopter, and robust Kalman filtering show substantial speedups and strong finite-iteration robustness, even when trained on as few as ten instances. This data-efficient, SDP-backed approach opens practical pathways for robust, fast parametric optimization in control, signal processing, and statistics.

Abstract

We develop a machine-learning framework to learn hyperparameter sequences for accelerated first-order methods (e.g., the step size and momentum sequences in accelerated gradient descent) to quickly solve parametric convex optimization problems with certified robustness. We obtain a strong form of robustness guarantee -- certification of worst-case performance over all parameters within a set after a given number of iterations -- through regularization-based training. The regularization term is derived from the performance estimation problem (PEP) framework based on semidefinite programming, in which the hyperparameters appear as problem data. We show how to use gradient-based training to learn the hyperparameters for several first-order methods: accelerated versions of gradient descent, proximal gradient descent, and alternating direction method of multipliers. Through various numerical examples from signal processing, control, and statistics, we demonstrate that the quality of the solution can be dramatically improved within a budget of iterations, while also maintaining strong robustness guarantees. Notably, our approach is highly data-efficient in that we only use ten training instances in all of the numerical examples.

Learning Acceleration Algorithms for Fast Parametric Convex Optimization with Certified Robustness

TL;DR

This work tackles fast, robust solution of parametric convex optimization by learning acceleration hyperparameters for momentum-based first-order methods within a finite iteration budget. It unites gradient-based training with a robustness-oriented objective derived from the Performance Estimation Problem (PEP) framework, yielding worst-case guarantees over a parameter set while maintaining empirical performance on distributional samples. The method supports accelerated gradient, proximal gradient, and ADMM-based solvers (including OSQP and SCS), with time-varying iteration weights and time-invariant ADMM parameters. Empirical results across logistic regression, sparse coding, nonnegative least squares, MPC for a quadcopter, and robust Kalman filtering show substantial speedups and strong finite-iteration robustness, even when trained on as few as ten instances. This data-efficient, SDP-backed approach opens practical pathways for robust, fast parametric optimization in control, signal processing, and statistics.

Abstract

We develop a machine-learning framework to learn hyperparameter sequences for accelerated first-order methods (e.g., the step size and momentum sequences in accelerated gradient descent) to quickly solve parametric convex optimization problems with certified robustness. We obtain a strong form of robustness guarantee -- certification of worst-case performance over all parameters within a set after a given number of iterations -- through regularization-based training. The regularization term is derived from the performance estimation problem (PEP) framework based on semidefinite programming, in which the hyperparameters appear as problem data. We show how to use gradient-based training to learn the hyperparameters for several first-order methods: accelerated versions of gradient descent, proximal gradient descent, and alternating direction method of multipliers. Through various numerical examples from signal processing, control, and statistics, we demonstrate that the quality of the solution can be dramatically improved within a budget of iterations, while also maintaining strong robustness guarantees. Notably, our approach is highly data-efficient in that we only use ten training instances in all of the numerical examples.

Paper Structure

This paper contains 68 sections, 4 theorems, 22 equations, 9 figures, 7 tables.

Key Result

Proposition 1

Let $x$ be the problem parameter and let $\{y^k(x),z^k(x)\}_{k=0,1\dots}$ denote the iterates of an ADMM-based algorithm from Table table:fp_algorithms with weights $\theta$. Let the time-invariant parameters $\theta^{\rm inv}$ be positive. Then there exist a matrix $R(\theta^{\rm inv}) \in {\hbox{\

Figures (9)

  • Figure 1: Logistic regression results. The LAH Accel method with no robustness outperforms the LAH method by about $6$ orders of magnitude. When trained with robustness, the LAH Accel still outperforms other methods by wide margins.
  • Figure 2: Sparse coding in-distribution results. The non-robust LAH Accel scheme results in $\gamma=\infty$ and achieves a benefit of about $3$ orders of magnitude after $30$ iterations. Here, we pay a relatively small price for robustness since the $\gamma$ value is about $5$ times that of Nesterov's method, but it is still $2$ orders of magnitude better over the parametric family.
  • Figure 3: Sparse coding out-of-distribution results. Both the LAH method (without safeguarding) and the non-robust LAH Accel method diverge. LAH Accel trained with robustness achieves a suboptimality level $6$ times better than Nesterov's method after $30$ iterations.
  • Figure 4: Sparse coding step sizes. The y-scale of the step sizes row is logarithmic. The LAH Accel column trained with robustness has step sizes all below $2/L$ and all but two momentum sizes below $1$. The non-robust LAH Accel method generally learns larger values. Nearly all of the step sizes from LAH are above $(2/L)$, which explains why $\gamma=\infty$ for that method.
  • Figure 5: Non-negative least squares results. The non-robust LAH Accel method performs best. The robust LAH Accel method performs similarly to LAH, but LAH is not robust.
  • ...and 4 more figures

Theorems & Definitions (15)

  • Proposition 1
  • Definition 4.1: $\mathcal{G}$-parameterized pairs
  • Example 2
  • Theorem 4
  • proof
  • Definition 4.2: $R$-nonexpansive pairs
  • Theorem 5
  • proof
  • Definition B.1: Nonexpansive operator
  • Definition B.2: $\alpha$-averaged operator
  • ...and 5 more