Dual Optimistic Ascent (PI Control) is the Augmented Lagrangian Method in Disguise
Juan Ramirez, Simon Lacoste-Julien
TL;DR
The paper tackles nonconvex constrained optimization in deep learning by showing that dual optimistic ascent on the Lagrangian $\mathcal{L}$ is exactly the Augmented Lagrangian method (ALM) in the single-step, first-order regime. For equality constraints, the primal iterates of the two methods coincide when the optimism $\omega$ equals the ALM penalty $c$, enabling direct transfer of ALM’s convergence guarantees to dual optimistic methods; for general inequalities, the equivalence holds in the sense of sharing the same locally stable stationary points. The authors establish local linear convergence and spectral-radius relationships, provide principled guidance for tuning the optimism coefficient $\omega$ (analogous to penalty scheduling in ALM), and delineate limits of the approach when stepping beyond single-step, first-order dynamics. Empirically, they validate the equality-constrained equivalence on a nonconvex 1D problem and demonstrate that ALM-inspired scheduling improves stability in dual optimistic ascent. Overall, this work bridges theory and practice, offering a rigorous foundation for using PI control in constrained deep learning and clarifying when to prefer explicit ALM in more complex optimization regimes.
Abstract
Constrained optimization is a powerful framework for enforcing requirements on neural networks. These constrained deep learning problems are typically solved using first-order methods on their min-max Lagrangian formulation, but such approaches often suffer from oscillations and can fail to find all local solutions. While the Augmented Lagrangian method (ALM) addresses these issues, practitioners often favor dual optimistic ascent schemes (PI control) on the standard Lagrangian, which perform well empirically but lack formal guarantees. In this paper, we establish a previously unknown equivalence between these approaches: dual optimistic ascent on the Lagrangian is equivalent to gradient descent-ascent on the Augmented Lagrangian. This finding allows us to transfer the robust theoretical guarantees of the ALM to the dual optimistic setting, proving it converges linearly to all local solutions. Furthermore, the equivalence provides principled guidance for tuning the optimism hyper-parameter. Our work closes a critical gap between the empirical success of dual optimistic methods and their theoretical foundation in the single-step, first-order regime commonly used in constrained deep learning.
