Accelerating Optimization via Differentiable Stopping Time
Zhonglin Xie, Yiman Fong, Haoran Yuan, Zaiwen Wen
TL;DR
The paper tackles the practical goal of minimizing the time needed for iterative optimization to reach a target accuracy by introducing a differentiable stopping time. It builds a bridge between continuous-time dynamics and discrete optimization via $T_J(\theta, x_0, \varepsilon)$ and a tractable discrete surrogate $N_J(\theta, x_0, \varepsilon)$, together with a memory-efficient discrete adjoint method to compute sensitivities. The authors formalize conditions for differentiability, provide explicit gradient expressions, and demonstrate applications to learning-to-optimize and online hyperparameter adaptation, supported by experiments on high-dimensional problems showing accurate gradient estimates with lower forward-pass costs. This framework enables direct gradient-based optimization of convergence speed, offering a principled route to faster learned optimizers and adaptive parameter tuning in practical settings, with potential for broad impact in automated algorithm design.
Abstract
Optimization is an important module of modern machine learning applications. Tremendous efforts have been made to accelerate optimization algorithms. A common formulation is achieving a lower loss at a given time. This enables a differentiable framework with respect to the algorithm hyperparameters. In contrast, its dual, minimizing the time to reach a target loss, is believed to be non-differentiable, as the time is not differentiable. As a result, it usually serves as a conceptual framework or is optimized using zeroth-order methods. To address this limitation, we propose a differentiable stopping time and theoretically justify it based on differential equations. An efficient algorithm is designed to backpropagate through it. As a result, the proposed differentiable stopping time enables a new differentiable formulation for accelerating algorithms. We further discuss its applications, such as online hyperparameter tuning and learning to optimize. Our proposed methods show superior performance in comprehensive experiments across various problems, which confirms their effectiveness.
