Table of Contents
Fetching ...

Data-Driven Performance Guarantees for Classical and Learned Optimizers

Rajiv Sambharya, Bartolomeo Stellato

TL;DR

The paper develops a data-driven framework for performance guarantees of optimization algorithms under parametric problem distributions. It provides probabilistic bounds for classical fixed-point optimizers via a sample-convergence bound and PAC-Bayes-based generalization bounds for learned optimizers, including a gradient-based training objective to minimize the PAC-Bayes bound itself. The approach is validated across domains such as image deblurring, robust Kalman filtering, sparse coding (LISTA variants), warm starts, and MAML, showing tighter probabilistic guarantees and, in several cases, bounds that beat observed empirical outcomes. This work offers a practical pathway to reliable, data-informed guarantees for both traditional and learned optimization strategies in settings with fixed iteration budgets. The framework facilitates calibrated, task-aware guarantees that can inform algorithm selection and deployment in signal processing, control, and meta-learning applications.

Abstract

We introduce a data-driven approach to analyze the performance of continuous optimization algorithms using generalization guarantees from statistical learning theory. We study classical and learned optimizers to solve families of parametric optimization problems. We build generalization guarantees for classical optimizers, using a sample convergence bound, and for learned optimizers, using the Probably Approximately Correct (PAC)-Bayes framework. To train learned optimizers, we use a gradient-based algorithm to directly minimize the PAC-Bayes upper bound. Numerical experiments in signal processing, control, and meta-learning showcase the ability of our framework to provide strong generalization guarantees for both classical and learned optimizers given a fixed budget of iterations. For classical optimizers, our bounds are much tighter than those that worst-case guarantees provide. For learned optimizers, our bounds outperform the empirical outcomes observed in their non-learned counterparts.

Data-Driven Performance Guarantees for Classical and Learned Optimizers

TL;DR

The paper develops a data-driven framework for performance guarantees of optimization algorithms under parametric problem distributions. It provides probabilistic bounds for classical fixed-point optimizers via a sample-convergence bound and PAC-Bayes-based generalization bounds for learned optimizers, including a gradient-based training objective to minimize the PAC-Bayes bound itself. The approach is validated across domains such as image deblurring, robust Kalman filtering, sparse coding (LISTA variants), warm starts, and MAML, showing tighter probabilistic guarantees and, in several cases, bounds that beat observed empirical outcomes. This work offers a practical pathway to reliable, data-informed guarantees for both traditional and learned optimization strategies in settings with fixed iteration budgets. The framework facilitates calibrated, task-aware guarantees that can inform algorithm selection and deployment in signal processing, control, and meta-learning applications.

Abstract

We introduce a data-driven approach to analyze the performance of continuous optimization algorithms using generalization guarantees from statistical learning theory. We study classical and learned optimizers to solve families of parametric optimization problems. We build generalization guarantees for classical optimizers, using a sample convergence bound, and for learned optimizers, using the Probably Approximately Correct (PAC)-Bayes framework. To train learned optimizers, we use a gradient-based algorithm to directly minimize the PAC-Bayes upper bound. Numerical experiments in signal processing, control, and meta-learning showcase the ability of our framework to provide strong generalization guarantees for both classical and learned optimizers given a fixed budget of iterations. For classical optimizers, our bounds are much tighter than those that worst-case guarantees provide. For learned optimizers, our bounds outperform the empirical outcomes observed in their non-learned counterparts.
Paper Structure (85 sections, 6 theorems, 87 equations, 12 figures, 6 tables, 2 algorithms)

This paper contains 85 sections, 6 theorems, 87 equations, 12 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1

Consider a set of $N$ i.i.d. samples $S$. Let the prior mean $w_0 \in {\hbox{\bf R}}^p$, and the prior variance hyperparameters $\lambda^{\rm max} \in {\hbox{\bf R}}_+$ and $b \in {\hbox{\bf R}}_+$, be independent of the samples. Then for any $\delta \in (0, 1)$, posterior distribution $\mathcal{N}_ Here, $\lambda = \lambda^{\rm max} \exp(-a / b)$ and the regularization term is

Figures (12)

  • Figure 1: The procedure to generate probabilistic guarantees for classical optimizers. Given $N$ parameter samples, we first approximately solve each parametric problem by running $k$ fixed-point steps in step $1$. Then given an error function $e(x)$ with an underlying metric $\phi$, number of algorithm steps $k$, and tolerance $\epsilon$, we evaluate the empirical risk $\hat{r}_S$ in step $2$. Lastly in step $3$, we apply the sample convergence bound to bound the risk $r_\mathcal{X}$ with high probability.
  • Figure 2: The two-phase procedure to generate generalization guarantees for learned optimizers for a metric $\phi$, number of algorithm steps $k$, and tolerance $\epsilon$. The first phase is the training phase. If the loss function is the regression-based loss, then we solve each parametric problem in step $0$ as these are needed in order to train. In step $1$, we train the architecture to optimize the PAC-Bayes guarantee over $M$ epochs using Algorithm \ref{['alg:learning_algo']}. We also round the prior according to Equation \ref{['eq:round_prior']}. Then we enter the second, calibration phase. In step $2$ we sample weights $\{\theta_j\}_{j=1}^H$ from the distribution $\mathcal{N}_{w^\star, s^\star}$ and run $k$ algorithm steps for each training problem and each weight sample $\theta_j$. In step $3$ we compute the Monte Carlo approximation of the empirical expected risk $\hat{R}_S(\hat{P})$. In step $4$, we bound the expected risk $R_\mathcal{X}(P)$ by applying a sample convergence bound from Equation \ref{['eq:e_bar']} and then Theorem \ref{['thm:gen_thm']} where the regularization term is $B^\star = B(w^\star,s^\star,\lambda^\star)$.
  • Figure 3: Probabilistic lower bounds of the success rate for image deblurring. The top row shows results for the fixed-point residual (fp. res.) and the bottom row shows bounds for the quantile. For both metrics, the lower bounds on the success rate are tight for $N=1000$ samples.
  • Figure 4: Probabilistic guarantees for OSQP to solve the image deblurring problem. The top row shows results for the fixed-point residual (fp. res.) and the bottom row shows results for the NMSE. The quantile bounds for both quantities improve as the number of samples increases.
  • Figure 5: Probabilistic lower bounds of the success rate for robust Kalman filtering. Top: fixed-point residual. Bottom: maximum Euclidean distance from Equation \ref{['eq:max_Euclidean']}. Note that the x-axes are different for the top and bottom rows. The bounds get tighter as the number of samples increases.
  • ...and 7 more figures

Theorems & Definitions (13)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • proof
  • Theorem 4
  • proof
  • Lemma 5
  • proof
  • Theorem 6
  • Definition D.1: $\beta$-contractive operator
  • ...and 3 more