Learning-to-Optimize with PAC-Bayesian Guarantees: Theoretical Considerations and Practical Implementation
Michael Sucker, Jalal Fadili, Peter Ochs
TL;DR
This paper introduces a principled framework to learn optimization algorithms with PAC-Bayesian generalization guarantees, moving beyond worst-case analyses by leveraging data-dependent exponential-family priors and posteriors. The core idea is to bound the true risk of a learned optimizer in terms of its empirical performance plus a KL-divergence term, while allowing a controlled trade-off between convergence speed and convergence guarantees via sublevel probabilities. The authors develop a practical learning procedure that includes imitation-based initialization, a probabilistically constrained sampling scheme, and a Gibbs-posterior update to select hyperparameters, and they validate the approach on quadratics, image processing, Lasso, and neural-network training problems. Results show that the learned optimizers vastly outperform standard baselines under the same iteration budgets while providing an interpretable probabilistic guarantee on performance. The work also discusses limitations, notably that guarantees pertain to the objective after a fixed number of iterations and that the offline training can be computationally intensive, suggesting avenues for future refinement.
Abstract
We use the PAC-Bayesian theory for the setting of learning-to-optimize. To the best of our knowledge, we present the first framework to learn optimization algorithms with provable generalization guarantees (PAC-Bayesian bounds) and explicit trade-off between convergence guarantees and convergence speed, which contrasts with the typical worst-case analysis. Our learned optimization algorithms provably outperform related ones derived from a (deterministic) worst-case analysis. The results rely on PAC-Bayesian bounds for general, possibly unbounded loss-functions based on exponential families. Then, we reformulate the learning procedure into a one-dimensional minimization problem and study the possibility to find a global minimum. Furthermore, we provide a concrete algorithmic realization of the framework and new methodologies for learning-to-optimize, and we conduct four practically relevant experiments to support our theory. With this, we showcase that the provided learning framework yields optimization algorithms that provably outperform the state-of-the-art by orders of magnitude.
