On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization

Motahareh Sohrabi; Juan Ramirez; Tianyue H. Zhang; Simon Lacoste-Julien; Jose Gallego-Posada

On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization

Motahareh Sohrabi, Juan Ramirez, Tianyue H. Zhang, Simon Lacoste-Julien, Jose Gallego-Posada

TL;DR

Constrained optimization in neural networks often suffers from unstable gradient-descent-ascent dynamics; this work introduces νPI, a PI-like multiplier updater augmented with an exponential moving average, to stabilize Lagrange multiplier dynamics. νPI generalizes momentum methods (Polyak, Nesterov) and OG via a unifying mapping, offers qualitative and quantitative insights into its damping behavior, and demonstrates improved stability across SVMs, fairness, and sparsity tasks. Theoretical analysis reveals continuous-time oscillator dynamics and conditions for critical damping that surpass GA, while practical guidance and extensive experiments validate robust convergence and performance gains. Overall, νPI provides a reliable, hyperparameter-friendly mechanism for enforcing constraints in large-scale, nonconvex learning problems, with implications for safety, fairness, and model compression.

Abstract

Constrained optimization offers a powerful framework to prescribe desired behaviors in neural network models. Typically, constrained problems are solved via their min-max Lagrangian formulations, which exhibit unstable oscillatory dynamics when optimized using gradient descent-ascent. The adoption of constrained optimization techniques in the machine learning community is currently limited by the lack of reliable, general-purpose update schemes for the Lagrange multipliers. This paper proposes the $ν$PI algorithm and contributes an optimization perspective on Lagrange multiplier updates based on PI controllers, extending the work of Stooke, Achiam and Abbeel (2020). We provide theoretical and empirical insights explaining the inability of momentum methods to address the shortcomings of gradient descent-ascent, and contrast this with the empirical success of our proposed $ν$PI controller. Moreover, we prove that $ν$PI generalizes popular momentum methods for single-objective minimization. Our experiments demonstrate that $ν$PI reliably stabilizes the multiplier dynamics and its hyperparameters enjoy robust and predictable behavior.

On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization

TL;DR

Abstract

PI algorithm and contributes an optimization perspective on Lagrange multiplier updates based on PI controllers, extending the work of Stooke, Achiam and Abbeel (2020). We provide theoretical and empirical insights explaining the inability of momentum methods to address the shortcomings of gradient descent-ascent, and contrast this with the empirical success of our proposed

PI controller. Moreover, we prove that

PI generalizes popular momentum methods for single-objective minimization. Our experiments demonstrate that

PI reliably stabilizes the multiplier dynamics and its hyperparameters enjoy robust and predictable behavior.

Paper Structure (34 sections, 5 theorems, 41 equations, 24 figures, 16 tables, 3 algorithms)

This paper contains 34 sections, 5 theorems, 41 equations, 24 figures, 16 tables, 3 algorithms.

Introduction
Related Works
Lagrangian Optimization
$\nu$PI Control for Constrained Optimization
$\nu$PI algorithm
Connections to optimization methods
Interpreting the updates of $\nu$PI
Oscillator dynamics
Practical remarks
Experiments
Hard-margin SVMs
Fairness
Sparsity
Conclusion
Appendix
...and 19 more sections

Key Result

Theorem 1

[Proof in appx:nupi_momentum_connections.] Under the same initialization $\boldsymbol{\theta}_0$, UnifiedMomentum$(\alpha, \beta\neq 1, \gamma)$ is a special case of the $\boldsymbol{\nu}$PI algorithm with the hyperparameter choices:

Figures (24)

Figure 1: Dynamics for different dual optimizers on a hard-margin SVM problem (\ref{['eq:svm_problem']}). Amongst the tested methods, $\boldsymbol{\nu}$PI is the only method to successfully converge to the optimal dual variables. Each optimizer uses the best hyperparameters found after a grid-search aiming to minimize the distance to the optimal $\boldsymbol{\lambda}^*$ after 5.000 steps. For improved readability, the plot shows the first 3.000 steps. Constraint 64 corresponds to a support vector. All methods achieved perfect training accuracy.
Figure 2: Constraint dynamics for GA, Polyak and $\boldsymbol{\nu}$PI in a sparsity task (§\ref{['sec:sparsity']}). Constrained optimal solutions for this problem lie at the boundary of the feasible set. The excessive growth in the value of the multiplier for GA causes the constraint to overshoot into the interior of the feasible set. The improved multiplier updates of the $\boldsymbol{\nu}$PI algorithm remove the overshoot in the constraint and multiplier.
Figure 3: $\boldsymbol{\nu}$PI control pipeline for updating the Lagrange multipliers in a constrained optimization problem. We consider the update on the primal variables as a black-box procedure that receives the multipliers $\boldsymbol{\lambda}_t$ and primal variables $\boldsymbol{x}_{t-1}$ as input, and returns an updated $\boldsymbol{x}_{t}.$ The multiplier update is executed by the controller, using the constraint violations as the error signal.
Figure 4: Left: Hyperparameter choices from \ref{['thm:um_as_nupi']} for which $\boldsymbol{\nu}$PI$(\nu, \kappa_p, \kappa_i)$ realizes Polyak$(\alpha=\frac{1}{2}, \beta)$ and Nesterov$(\alpha=\frac{1}{2}, \beta)$. Right: The right plot zooms on the range $-1 \le \beta \le 0.25$. Polyak comprises a limited surface in the $(\nu, \kappa_p, \kappa_i)$ space, leaving configurations outside this surface unexplored. Note how positive (resp. negative) values of $\beta$ result in negative (resp. positive) values of $\kappa_p$, colored in red (resp. blue). Colored paths correspond to different values of $\alpha$. The dashed curves match between both plots.
Figure 5: Comparing the update of $\boldsymbol{\nu}$PI relative to GA. $\boldsymbol{\nu}$PI increases the multipliers faster than GA when the constraint violation is large, enhancing convergence speed; and proactively decreases them near the feasible set, preventing overshoot. The blue, yellow, and red regions correspond to cases in which the updates performed by the $\boldsymbol{\nu}$PI algorithm are faster, slower, or in the opposite direction than those of GA, respectively. This plot illustrates the case $\xi_{t-1} > 0$.
...and 19 more figures

Theorems & Definitions (9)

Theorem 1
Lemma 2
proof : Proof of \ref{['thm:nupi_update_recursive']}
Lemma 3
proof : Proof of \ref{['thm:um_update_recursive']}
Theorem
proof : Proof of \ref{['thm:um_as_nupi']}
Theorem 4
proof : Proof of \ref{['thm:oscillator_flow']}

On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization

TL;DR

Abstract

On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (24)

Theorems & Definitions (9)