On Constraints in First-Order Optimization: A View from Non-Smooth Dynamical Systems
Michael Muehlebach, Michael I. Jordan
TL;DR
This work introduces a velocity-level constrained optimization framework in which constraints are enforced on forward increments rather than positions, enabling simple, local, convex approximations of the feasible set and avoiding full projections. The authors develop both continuous-time gradient-flow and discrete-time Euler-approximation analyses, proving convergence to stationary points under standard CQ and, in convex/strongly convex settings, exponential rates, with duality tools and Moreau-time-stepping guiding the proofs. Empirically, the method scales roughly as $\mathcal{O}(n^2)$ per iteration on dense QPs and delivers substantial speedups over interior-point solvers such as CVXOPT on large problems, while handling nonlinear constraints and infeasibility along the way. The approach is motivated by non-smooth mechanics and offers a unifying perspective that connects constraint handling with tangent-cone concepts, yielding practical gains and opening avenues for extensions to line-search, momentum, and accelerated variants, as well as Newton-type methods.
Abstract
We introduce a class of first-order methods for smooth constrained optimization that are based on an analogy to non-smooth dynamical systems. Two distinctive features of our approach are that (i) projections or optimizations over the entire feasible set are avoided, in stark contrast to projected gradient methods or the Frank-Wolfe method, and (ii) iterates are allowed to become infeasible, which differs from active set or feasible direction methods, where the descent motion stops as soon as a new constraint is encountered. The resulting algorithmic procedure is simple to implement even when constraints are nonlinear, and is suitable for large-scale constrained optimization problems in which the feasible set fails to have a simple structure. The key underlying idea is that constraints are expressed in terms of velocities instead of positions, which has the algorithmic consequence that optimizations over feasible sets at each iteration are replaced with optimizations over local, sparse convex approximations. In particular, this means that at each iteration only constraints that are violated are taken into account. The result is a simplified suite of algorithms and an expanded range of possible applications in machine learning.
