Learning Algorithm Hyperparameters for Fast Parametric Convex Optimization

Rajiv Sambharya; Bartolomeo Stellato

Learning Algorithm Hyperparameters for Fast Parametric Convex Optimization

Rajiv Sambharya, Bartolomeo Stellato

TL;DR

This work targets accelerating parametric convex optimization by learning a fixed, shared sequence of hyperparameters for first-order methods through a two-phase LAH framework (step-varying then steady-state). The authors derive closed-form lookahead results for gradient descent and quadratic problems, develop progressive training, and provide generalization guarantees via validation-based risk bounds, all while maintaining convergence guarantees. LAH is demonstrated across gradient descent, proximal gradient descent, OSQP, and SCS on diverse tasks in control, signal processing, and machine learning, with remarkable data efficiency requiring only 10 training instances. The approach yields substantial speedups over baselines, preserves convergence, and offers quantifiable probabilistic guarantees on unseen data, suggesting practical impact for fast, reliable parametric optimization in time-constrained systems.

Abstract

We introduce a machine-learning framework to learn the hyperparameter sequence of first-order methods (e.g., the step sizes in gradient descent) to quickly solve parametric convex optimization problems. Our computational architecture amounts to running fixed-point iterations where the hyperparameters are the same across all parametric instances and consists of two phases. In the first step-varying phase the hyperparameters vary across iterations, while in the second steady-state phase the hyperparameters are constant across iterations. Our learned optimizer is flexible in that it can be evaluated on any number of iterations and is guaranteed to converge to an optimal solution. To train, we minimize the mean square error to a ground truth solution. In the case of gradient descent, the one-step optimal step size is the solution to a least squares problem, and in the case of unconstrained quadratic minimization, we can compute the two and three-step optimal solutions in closed-form. In other cases, we backpropagate through the algorithm steps to minimize the training objective after a given number of steps. We show how to learn hyperparameters for several popular algorithms: gradient descent, proximal gradient descent, and two ADMM-based solvers: OSQP and SCS. We use a sample convergence bound to obtain generalization guarantees for the performance of our learned algorithm for unseen data, providing both lower and upper bounds. We showcase the effectiveness of our method with many examples, including ones from control, signal processing, and machine learning. Remarkably, our approach is highly data-efficient in that we only use $10$ problem instances to train the hyperparameters in all of our examples.

Learning Algorithm Hyperparameters for Fast Parametric Convex Optimization

TL;DR

Abstract

Learning Algorithm Hyperparameters for Fast Parametric Convex Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (8)