Table of Contents
Fetching ...

On Constraints in First-Order Optimization: A View from Non-Smooth Dynamical Systems

Michael Muehlebach, Michael I. Jordan

TL;DR

This work introduces a velocity-level constrained optimization framework in which constraints are enforced on forward increments rather than positions, enabling simple, local, convex approximations of the feasible set and avoiding full projections. The authors develop both continuous-time gradient-flow and discrete-time Euler-approximation analyses, proving convergence to stationary points under standard CQ and, in convex/strongly convex settings, exponential rates, with duality tools and Moreau-time-stepping guiding the proofs. Empirically, the method scales roughly as $\mathcal{O}(n^2)$ per iteration on dense QPs and delivers substantial speedups over interior-point solvers such as CVXOPT on large problems, while handling nonlinear constraints and infeasibility along the way. The approach is motivated by non-smooth mechanics and offers a unifying perspective that connects constraint handling with tangent-cone concepts, yielding practical gains and opening avenues for extensions to line-search, momentum, and accelerated variants, as well as Newton-type methods.

Abstract

We introduce a class of first-order methods for smooth constrained optimization that are based on an analogy to non-smooth dynamical systems. Two distinctive features of our approach are that (i) projections or optimizations over the entire feasible set are avoided, in stark contrast to projected gradient methods or the Frank-Wolfe method, and (ii) iterates are allowed to become infeasible, which differs from active set or feasible direction methods, where the descent motion stops as soon as a new constraint is encountered. The resulting algorithmic procedure is simple to implement even when constraints are nonlinear, and is suitable for large-scale constrained optimization problems in which the feasible set fails to have a simple structure. The key underlying idea is that constraints are expressed in terms of velocities instead of positions, which has the algorithmic consequence that optimizations over feasible sets at each iteration are replaced with optimizations over local, sparse convex approximations. In particular, this means that at each iteration only constraints that are violated are taken into account. The result is a simplified suite of algorithms and an expanded range of possible applications in machine learning.

On Constraints in First-Order Optimization: A View from Non-Smooth Dynamical Systems

TL;DR

This work introduces a velocity-level constrained optimization framework in which constraints are enforced on forward increments rather than positions, enabling simple, local, convex approximations of the feasible set and avoiding full projections. The authors develop both continuous-time gradient-flow and discrete-time Euler-approximation analyses, proving convergence to stationary points under standard CQ and, in convex/strongly convex settings, exponential rates, with duality tools and Moreau-time-stepping guiding the proofs. Empirically, the method scales roughly as per iteration on dense QPs and delivers substantial speedups over interior-point solvers such as CVXOPT on large problems, while handling nonlinear constraints and infeasibility along the way. The approach is motivated by non-smooth mechanics and offers a unifying perspective that connects constraint handling with tangent-cone concepts, yielding practical gains and opening avenues for extensions to line-search, momentum, and accelerated variants, as well as Newton-type methods.

Abstract

We introduce a class of first-order methods for smooth constrained optimization that are based on an analogy to non-smooth dynamical systems. Two distinctive features of our approach are that (i) projections or optimizations over the entire feasible set are avoided, in stark contrast to projected gradient methods or the Frank-Wolfe method, and (ii) iterates are allowed to become infeasible, which differs from active set or feasible direction methods, where the descent motion stops as soon as a new constraint is encountered. The resulting algorithmic procedure is simple to implement even when constraints are nonlinear, and is suitable for large-scale constrained optimization problems in which the feasible set fails to have a simple structure. The key underlying idea is that constraints are expressed in terms of velocities instead of positions, which has the algorithmic consequence that optimizations over feasible sets at each iteration are replaced with optimizations over local, sparse convex approximations. In particular, this means that at each iteration only constraints that are violated are taken into account. The result is a simplified suite of algorithms and an expanded range of possible applications in machine learning.

Paper Structure

This paper contains 28 sections, 13 theorems, 101 equations, 11 figures, 1 table, 2 algorithms.

Key Result

Proposition 2

(constrained gradient flow) Let $x:[0,\infty) \rightarrow \mathbb{R}^n$ be an absolutely continuous trajectory with a piecewise continuous derivative. Then, for any $x(0)\in C$, the following are equivalent: where $\dot{x}(t)^+$ denotes the right-hand derivative of $x$ at $t$. For any $x(0)\in \mathbb{R}^{n}$, eq:velLeveltmp and eq:velLeveltmp2 are equivalent and lead to a unique trajectory $x(t)

Figures (11)

  • Figure 1: The figure contrasts position constraints with velocity constraints. The leftmost sketch illustrates the position constraint, where $x(t)$ is constrained to the feasible set as indicated by the shaded region. The center and right figures illustrate the induced constraints on the velocity $\dot{x}(t)^+$ (which will be precisely defined below). If $x(t)$ is in the interior of the feasible set, there are no restrictions on the forward velocity, as indicated with the shaded ball without border (center). The figure on the right illustrates the case where $x(t)$ lies on the boundary of the feasible set. As a result, $\dot{x}(t)^+$ is constrained to lie in the cone indicated by the shaded region. In the discrete-time case $\dot{x}(t)^+$ is replaced with $(x_{k+1}-x_k)/T$.
  • Figure 2: This figure shows the values of $\nabla_x l$, $F_\alpha$, and $d$ for $\alpha=1/10$ (left column) and $\alpha=4/5$ (right column). Top row: The solid thick black line represents $\nabla_x l$, which is discontinuous at the origin, where it takes the value zero (the origin is the minimizer of \ref{['eq:simpEx']}). For values $x\leq 0$, $\nabla_x l$ is given by $\min \{ \nabla f(x), \alpha x\}$ and for values $x\geq 2$, $\nabla_x l$ is given by $\max \{ \nabla f(x), \alpha (x-2) \}$, which is represented by the lines in blue and in red. Middle row: The solid thick black line represents $F_\alpha$, which is continuous and has its minimum at the origin (the origin is the minimizer of \ref{['eq:simpEx']}). The objective function $f$ is indicated with dashed lines. Last row: The function $d$ is discontinuous at the origin for $\alpha \neq 1/5$, unbounded below for $\alpha < 1/5$, and unbounded above for $\alpha > 1/5$. For $\alpha < 1/5$, $d(x)$ is upper bounded by $f_{I_{x}}^*$, that is, $d(x) \leq f^* =f^*_{\{1\}}= 0.1$ for $x\leq 0$ and $d(x)\leq f^*_{\{ \}} = f^*_{\{2\}}=0$ for $x>0$, where $g_1(x)=x$ and $g_2(x)=2-x$. As we will show in Section \ref{['Sec:Discrete']}, this holds more generally provided that $f$ and $C$ are convex.
  • Figure 3: The figure illustrates the concept of a free-body diagram, where the geometric boundary condition $g(x)\geq 0$, as shown on the left, is replaced by the constraint forces, $-R\in N_C(x)$, as shown on the right. We note that $x_1$ is in static equilibrium, since $-\nabla f(x_1)$ and $R_1$ cancel, whereas $x_2$ is not.
  • Figure 4: The figure summarizes the analogies between constrained optimization and non-smooth mechanics. On the left, constraint qualifications are assumed to hold ensuring that the set $C$ is regular in the sense of Clarke. On the right, the set $C$ fails to be regular, for example due to a reintrant (inward facing) corner. In that case, the notion of equilibrium needs to be extended by an appropriate closure of $N_C(x)$; see, for example, RockafellarWets. The resulting equilibrium condition is no longer sufficient for stationarity and its equivalence to the Karush-Kuhn-Tucker conditions breaks down RockafellarWets. Moreover, the principle of d'Alembert-Lagrange is no longer a consequence of the principle of virtual work and therefore fails to characterize static equilibria when $C$ is not regular Panagiotopoulous. There are important examples of mechanical systems where $C$ fails to be regular; see, for example, Glocker.
  • Figure 5: This figure shows the values of $\nabla_x l$ (solid thick line) for $\alpha=1/10$ (left) and $\alpha=4/5$ (right), where $\epsilon_\text{g}=0.2$. We note that the discontinuity of $\nabla_x l$ is now at $\epsilon_\text{g}>0$, which means that the origin is an asymptotically stable equilibrium in the sense of Lyapunov. The parameter $\epsilon_\text{g}$ has no effect on the constraint $x\geq 2$. The original gradient $\nabla f$ is again shown in blue (dotted) and the functions $\alpha x$ and $\alpha (x-2)$ are shown in red (dashed).
  • ...and 6 more figures

Theorems & Definitions (20)

  • Definition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Lemma 5
  • Lemma 6
  • Claim 1
  • Claim 2
  • Lemma 7
  • Claim 3
  • ...and 10 more