Table of Contents
Fetching ...

Learning to optimize with convergence guarantees using nonlinear system theory

Andrea Martin, Luca Furieri

TL;DR

This work tackles the lack of convergence guarantees in learning-to-optimize (L2O) for nonconvex problems by framing optimization as a controlled, convergent dynamical system. It introduces an unconstrained, convergent-by-design parametrization that separates a gradient-descent step from a learnable enhancement $\mathbf{V}$, with $0<\eta<\beta^{-1}$ ensuring $\bm{\pi}(f,\mathbf{x})\in\Gamma(f)$ and allowing optimization over a finite-dimensional parameter $\theta$ via automatic differentiation. The authors extend the theory to settings with gradient errors, showing asymptotic convergence under batch/partial-gradient updates by constructing $\mathbf{V}$ from operatively learnable components $\Omega$ and $Z$. Empirical results on MNIST demonstrate that ConvergentL2O can achieve higher early test accuracy and robust generalization across activation functions, while maintaining convergence guarantees, highlighting practical impact for reliable ML optimization and control tasks. Overall, the work bridges nonlinear system theory and L2O to yield convergent, data-driven optimizers with potential for online and constrained extensions.

Abstract

The increasing reliance on numerical methods for controlling dynamical systems and training machine learning models underscores the need to devise algorithms that dependably and efficiently navigate complex optimization landscapes. Classical gradient descent methods offer strong theoretical guarantees for convex problems; however, they demand meticulous hyperparameter tuning for non-convex ones. The emerging paradigm of learning to optimize (L2O) automates the discovery of algorithms with optimized performance leveraging learning models and data - yet, it lacks a theoretical framework to analyze convergence of the learned algorithms. In this paper, we fill this gap by harnessing nonlinear system theory. Specifically, we propose an unconstrained parametrization of all convergent algorithms for smooth non-convex objective functions. Notably, our framework is directly compatible with automatic differentiation tools, ensuring convergence by design while learning to optimize.

Learning to optimize with convergence guarantees using nonlinear system theory

TL;DR

This work tackles the lack of convergence guarantees in learning-to-optimize (L2O) for nonconvex problems by framing optimization as a controlled, convergent dynamical system. It introduces an unconstrained, convergent-by-design parametrization that separates a gradient-descent step from a learnable enhancement , with ensuring and allowing optimization over a finite-dimensional parameter via automatic differentiation. The authors extend the theory to settings with gradient errors, showing asymptotic convergence under batch/partial-gradient updates by constructing from operatively learnable components and . Empirical results on MNIST demonstrate that ConvergentL2O can achieve higher early test accuracy and robust generalization across activation functions, while maintaining convergence guarantees, highlighting practical impact for reliable ML optimization and control tasks. Overall, the work bridges nonlinear system theory and L2O to yield convergent, data-driven optimizers with potential for online and constrained extensions.

Abstract

The increasing reliance on numerical methods for controlling dynamical systems and training machine learning models underscores the need to devise algorithms that dependably and efficiently navigate complex optimization landscapes. Classical gradient descent methods offer strong theoretical guarantees for convex problems; however, they demand meticulous hyperparameter tuning for non-convex ones. The emerging paradigm of learning to optimize (L2O) automates the discovery of algorithms with optimized performance leveraging learning models and data - yet, it lacks a theoretical framework to analyze convergence of the learned algorithms. In this paper, we fill this gap by harnessing nonlinear system theory. Specifically, we propose an unconstrained parametrization of all convergent algorithms for smooth non-convex objective functions. Notably, our framework is directly compatible with automatic differentiation tools, ensuring convergence by design while learning to optimize.
Paper Structure (7 sections, 4 theorems, 25 equations, 1 figure, 1 table)

This paper contains 7 sections, 4 theorems, 25 equations, 1 figure, 1 table.

Key Result

Lemma 1

Consider the recursion eq:algorithm_dynamics_operator_form. The update rule given by eq:separation with $0 < \eta <\beta^{-1}$ satisfies eq:constraint_convergence for every choice of $\mathbf{v} \in \ell_2$.

Figures (1)

  • Figure 1: Training curves of learned and hand-crafted optimizers; shaded areas and solid lines denote standard deviations and mean values, respectively.

Theorems & Definitions (11)

  • Definition 1
  • Remark 1: The value of convergence
  • Lemma 1
  • Definition 2
  • Lemma 2
  • Theorem 1
  • Theorem 2
  • proof : Lemma \ref{['le:sufficiency']}
  • proof : Lemma \ref{['le:necessity']}
  • proof : Theorem \ref{['th:reformulation']}
  • ...and 1 more