Table of Contents
Fetching ...

Accelerating Optimization via Differentiable Stopping Time

Zhonglin Xie, Yiman Fong, Haoran Yuan, Zaiwen Wen

TL;DR

The paper tackles the practical goal of minimizing the time needed for iterative optimization to reach a target accuracy by introducing a differentiable stopping time. It builds a bridge between continuous-time dynamics and discrete optimization via $T_J(\theta, x_0, \varepsilon)$ and a tractable discrete surrogate $N_J(\theta, x_0, \varepsilon)$, together with a memory-efficient discrete adjoint method to compute sensitivities. The authors formalize conditions for differentiability, provide explicit gradient expressions, and demonstrate applications to learning-to-optimize and online hyperparameter adaptation, supported by experiments on high-dimensional problems showing accurate gradient estimates with lower forward-pass costs. This framework enables direct gradient-based optimization of convergence speed, offering a principled route to faster learned optimizers and adaptive parameter tuning in practical settings, with potential for broad impact in automated algorithm design.

Abstract

Optimization is an important module of modern machine learning applications. Tremendous efforts have been made to accelerate optimization algorithms. A common formulation is achieving a lower loss at a given time. This enables a differentiable framework with respect to the algorithm hyperparameters. In contrast, its dual, minimizing the time to reach a target loss, is believed to be non-differentiable, as the time is not differentiable. As a result, it usually serves as a conceptual framework or is optimized using zeroth-order methods. To address this limitation, we propose a differentiable stopping time and theoretically justify it based on differential equations. An efficient algorithm is designed to backpropagate through it. As a result, the proposed differentiable stopping time enables a new differentiable formulation for accelerating algorithms. We further discuss its applications, such as online hyperparameter tuning and learning to optimize. Our proposed methods show superior performance in comprehensive experiments across various problems, which confirms their effectiveness.

Accelerating Optimization via Differentiable Stopping Time

TL;DR

The paper tackles the practical goal of minimizing the time needed for iterative optimization to reach a target accuracy by introducing a differentiable stopping time. It builds a bridge between continuous-time dynamics and discrete optimization via and a tractable discrete surrogate , together with a memory-efficient discrete adjoint method to compute sensitivities. The authors formalize conditions for differentiability, provide explicit gradient expressions, and demonstrate applications to learning-to-optimize and online hyperparameter adaptation, supported by experiments on high-dimensional problems showing accurate gradient estimates with lower forward-pass costs. This framework enables direct gradient-based optimization of convergence speed, offering a principled route to faster learned optimizers and adaptive parameter tuning in practical settings, with potential for broad impact in automated algorithm design.

Abstract

Optimization is an important module of modern machine learning applications. Tremendous efforts have been made to accelerate optimization algorithms. A common formulation is achieving a lower loss at a given time. This enables a differentiable framework with respect to the algorithm hyperparameters. In contrast, its dual, minimizing the time to reach a target loss, is believed to be non-differentiable, as the time is not differentiable. As a result, it usually serves as a conceptual framework or is optimized using zeroth-order methods. To address this limitation, we propose a differentiable stopping time and theoretically justify it based on differential equations. An efficient algorithm is designed to backpropagate through it. As a result, the proposed differentiable stopping time enables a new differentiable formulation for accelerating algorithms. We further discuss its applications, such as online hyperparameter tuning and learning to optimize. Our proposed methods show superior performance in comprehensive experiments across various problems, which confirms their effectiveness.

Paper Structure

This paper contains 15 sections, 4 theorems, 77 equations, 7 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

Let $T = T_J(\theta, x_0, \varepsilon)$ be the continuous stopping time such that $J(x(T)) = \varepsilon$. We assume that the function $\mathcal{A}(\theta, x, t)$ is continuously differentiable with respect to $\theta$, $x$, and $t$. Additionally, the stopping criterion function $J(x)$ is assumed to Then, the solution $x(t; \theta, x_0)$ of the ODE system is continuously differentiable with respec

Figures (7)

  • Figure 1: Illustration of the differentiable stopping time on $f(x_1,x_2)=0.5x_1^2+2x_2^2$ and $\mathcal{A}(x,\theta,t)=\operatorname{diag}(1,\theta)\nabla f(x)$. Effect of $\theta$ on continuous and discrete stopping time $T_J,N_J$ for different $\varepsilon$ values.
  • Figure 2: Experimental results comparing the discrete and continuous stopping time gradients across varying problem dimensions, stopping thresholds, and time step sizes. (a) shows the relative error of the discrete gradient approximation. (b) shows the computational cost ratio.
  • Figure 3: Test results of different optimizers on logistic regression: Function value versus iteration.
  • Figure 4: Comparison of different optimizers on smooth SVM: Function value versus iteration. Here, $f_{\min}$ denotes the minimum function value achieved across all iterations for each optimizer.
  • Figure 5: NFEs of different solvers.
  • ...and 2 more figures

Theorems & Definitions (11)

  • Definition 1: Continuous Stopping Time
  • Theorem 1: Differentiability of Continuous Stopping Time
  • Definition 2: Discrete Stopping Time
  • Definition 3: Sensitivity of the Discrete Stopping Time
  • Theorem 2: Approximation Error for Gradient of Stopping Time
  • Proposition 1: Discrete Adjoint Method
  • proof
  • Proposition 2: Error analysis of the forward Euler method
  • proof
  • proof : Proof of the Theorem
  • ...and 1 more