Table of Contents
Fetching ...

Nonsmooth Projection-Free Optimization with Functional Constraints

Kamiar Asgari, Michael J. Neely

TL;DR

This paper presents a subgradient-based algorithm for constrained nonsmooth convex optimization that does not require projections onto the feasible set and competes favorably with a recent nonsmooth projection-free method designed for constraint-free problems.

Abstract

This paper presents a subgradient-based algorithm for constrained nonsmooth convex optimization that does not require projections onto the feasible set. While the well-established Frank-Wolfe algorithm and its variants already avoid projections, they are primarily designed for smooth objective functions. In contrast, our proposed algorithm can handle nonsmooth problems with general convex functional inequality constraints. It achieves an $ε$-suboptimal solution in $\mathcal{O}(ε^{-2})$ iterations, with each iteration requiring only a single (potentially inexact) Linear Minimization Oracle (LMO) call and a (possibly inexact) subgradient computation. This performance is consistent with existing lower bounds. Similar performance is observed when deterministic subgradients are replaced with stochastic subgradients. In the special case where there are no functional inequality constraints, our algorithm competes favorably with a recent nonsmooth projection-free method designed for constraint-free problems. Our approach utilizes a simple separation scheme in conjunction with a new Lagrange multiplier update rule.

Nonsmooth Projection-Free Optimization with Functional Constraints

TL;DR

This paper presents a subgradient-based algorithm for constrained nonsmooth convex optimization that does not require projections onto the feasible set and competes favorably with a recent nonsmooth projection-free method designed for constraint-free problems.

Abstract

This paper presents a subgradient-based algorithm for constrained nonsmooth convex optimization that does not require projections onto the feasible set. While the well-established Frank-Wolfe algorithm and its variants already avoid projections, they are primarily designed for smooth objective functions. In contrast, our proposed algorithm can handle nonsmooth problems with general convex functional inequality constraints. It achieves an -suboptimal solution in iterations, with each iteration requiring only a single (potentially inexact) Linear Minimization Oracle (LMO) call and a (possibly inexact) subgradient computation. This performance is consistent with existing lower bounds. Similar performance is observed when deterministic subgradients are replaced with stochastic subgradients. In the special case where there are no functional inequality constraints, our algorithm competes favorably with a recent nonsmooth projection-free method designed for constraint-free problems. Our approach utilizes a simple separation scheme in conjunction with a new Lagrange multiplier update rule.
Paper Structure (23 sections, 12 theorems, 144 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 12 theorems, 144 equations, 10 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

If Assumption assum:SFO is met, then the functions $f: \mathbb{V} \to \mathbb{R}$, $h: \mathbb{V} \to \mathbb{R}^m$, and $h_i: \mathbb{V} \to \mathbb{R}$ (for all $i \in \{1, \ldots, m\}$) demonstrate Lipschitz continuity over the set $\mathcal{Y}$ with Lipschitz constants not exceeding $L$, $G$, an

Figures (10)

  • Figure 1: Expected loss as a function of the number of iterations for $\gamma = 350$, compared to noiseless data.
  • Figure 2: Computational time versus error for inexact LMO with $\gamma = 350$, demonstrating the computational efficiency.
  • Figure 3: Performance of four algorithms at $T=300$ iterations. Performance dips for very small or large $\gamma$.
  • Figure 4: The directed acyclic graph used in the experiment has nonnegative capacities on each edge. The objective is to route a fixed amount of flow from the source node $s$ to the target node $t$ at the minimum possible cost.
  • Figure 5: The cost vs. number of iterations parameter $T$ is shown on a log-log scale for all four formulations. There is no significant difference between the two $\mathcal{Y}$ choices. While Formulations 3 and 4 converge slower than 1 and 2, they compensate with a faster LMO oracle (not depicted here).
  • ...and 5 more figures

Theorems & Definitions (34)

  • Remark 1
  • Definition 1
  • Lemma 1
  • proof
  • Lemma 2: Lagrange Multipliers
  • proof
  • Remark 2
  • Theorem 1: Objective gap
  • Theorem 2: Constraint violation
  • Remark 3
  • ...and 24 more