Frank-Wolfe algorithm for DC optimization problem

R. Díaz Millán; O. P. Ferreira; J. Ugon

Frank-Wolfe algorithm for DC optimization problem

R. Díaz Millán, O. P. Ferreira, J. Ugon

TL;DR

The paper tackles constrained DC optimization with $f(x)=g(x)-h(x)$ over a compact convex set ${\cal C}$, where $g$ is differentiable convex with Lipschitz gradient and $h$ is convex (possibly nondifferentiable). It develops two Frank–Wolfe variants with adaptive stepsizes: a first that relies on the Lipschitz condition on $\nabla g$ and a second that uses finite-difference approximations of $\nabla g$ with a relative-error model. The authors introduce piecewise-star-convexity to capture cellwise favorable geometry, proving that accumulation points are Clarke-stationary and, within each cell, correspond to cellwise minima with ${\cal O}(1/k)$ convergence for the objective value and the Frank–Wolfe gap; the second variant achieves a ${\cal O}(1/\sqrt{k})$ rate for the duality gap. The results extend convex-rate guarantees to a broad nonconvex, nonsmooth DC setting, yielding projection-free, scalable algorithms with adaptive stepsizes for large-scale problems.

Abstract

In the present paper, we formulate two versions of Frank--Wolfe algorithm or conditional gradient method to solve the DC optimization problem with an adaptive step size. The DC objective function consists of two components; the first is thought to be differentiable with a continuous Lipschitz gradient, while the second is only thought to be convex. The second version is based on the first and employs finite differences to approximate the gradient of the first component of the objective function. In contrast to past formulations that used the curvature/Lipschitz-type constant of the objective function, the step size computed does not require any constant associated with the components. For the first version, we established that the algorithm is well-defined of the algorithm and that every limit point of the generated sequence is a stationary point of the problem. We also introduce the class of weak-star-convex functions and show that, despite the fact that these functions are non-convex in general, the rate of convergence of the first version of the algorithm to minimize these functions is ${\cal O}(1/k)$. The finite difference used to approximate the gradient in the second version of the Frank-Wolfe algorithm is computed with the step-size adaptively updated using two previous iterations. Unlike previous applications of finite difference in the Frank-Wolfe algorithm, which provided approximate gradients with absolute error, the one used here provides us with a relative error, simplifying the algorithm analysis. In this case, we show that all limit points of the generated sequence for the second version of the Frank-Wolfe algorithm are stationary points for the problem under consideration, and we establish that the rate of convergence for the duality gap is ${\cal O}(1/\sqrt{k})$.

Frank-Wolfe algorithm for DC optimization problem

TL;DR

The paper tackles constrained DC optimization with

over a compact convex set

, where

is differentiable convex with Lipschitz gradient and

is convex (possibly nondifferentiable). It develops two Frank–Wolfe variants with adaptive stepsizes: a first that relies on the Lipschitz condition on

and a second that uses finite-difference approximations of

with a relative-error model. The authors introduce piecewise-star-convexity to capture cellwise favorable geometry, proving that accumulation points are Clarke-stationary and, within each cell, correspond to cellwise minima with

convergence for the objective value and the Frank–Wolfe gap; the second variant achieves a

rate for the duality gap. The results extend convex-rate guarantees to a broad nonconvex, nonsmooth DC setting, yielding projection-free, scalable algorithms with adaptive stepsizes for large-scale problems.

Abstract

. The finite difference used to approximate the gradient in the second version of the Frank-Wolfe algorithm is computed with the step-size adaptively updated using two previous iterations. Unlike previous applications of finite difference in the Frank-Wolfe algorithm, which provided approximate gradients with absolute error, the one used here provides us with a relative error, simplifying the algorithm analysis. In this case, we show that all limit points of the generated sequence for the second version of the Frank-Wolfe algorithm are stationary points for the problem under consideration, and we establish that the rate of convergence for the duality gap is

Paper Structure (8 sections, 18 theorems, 74 equations, 1 figure, 1 algorithm)

This paper contains 8 sections, 18 theorems, 74 equations, 1 figure, 1 algorithm.

Introduction
Preliminaries
The DC optimization problem
Piecewise star-convexity with a nonsmooth DC structure
Frank--Wolfe algorithm
Convergence analysis
Iteration-complexity analysis
Conclusions

Key Result

Theorem 2.1

Let $f:\mathbb{R}^{n}\to\mathbb{R}$ be a locally Lipschitz function. Then, $\partial^{c}f(x)$ is a nonempty, convex, compact subset of $\mathbb{R}^{n}$ and $\|v\|\leq {\cal C} _{x},$ for all $v\in \partial^{c}f(x)$, where ${\cal C} _{x}>0$ is the Lipschitz constant of $f$ around $x$. Moreover, $f^{\

Figures (1)

Figure 1: Plot of the function $f(x) = \min_{c \in S} (x-c)^2$ for $S = \{0, \tfrac{1}{8}, \tfrac{1}{7}, \tfrac{1}{6}, \tfrac{1}{5}, \tfrac{1}{4}, \tfrac{1}{3}, \tfrac{1}{2}, 1\}$. The black segments indicate the active arc of the lower envelope, the dashed lines mark the boundaries of $V(x_x^*)$, and the solid points on the $x$-axis represent the centers $c \in S$.

Theorems & Definitions (40)

Theorem 2.1: clarke1983optimization
Theorem 2.2: clarke1983optimization
Theorem 2.3
Proposition 2.4
Proposition 2.5
Proposition 2.6: Lemarechal
Proposition 2.7: Lemarechal
Lemma 2.8
Proposition 3.1
proof
...and 30 more

Frank-Wolfe algorithm for DC optimization problem

TL;DR

Abstract

Frank-Wolfe algorithm for DC optimization problem

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (40)