Frank-Wolfe algorithm for DC optimization problem
R. Díaz Millán, O. P. Ferreira, J. Ugon
TL;DR
The paper tackles constrained DC optimization with $f(x)=g(x)-h(x)$ over a compact convex set ${\cal C}$, where $g$ is differentiable convex with Lipschitz gradient and $h$ is convex (possibly nondifferentiable). It develops two Frank–Wolfe variants with adaptive stepsizes: a first that relies on the Lipschitz condition on $\nabla g$ and a second that uses finite-difference approximations of $\nabla g$ with a relative-error model. The authors introduce piecewise-star-convexity to capture cellwise favorable geometry, proving that accumulation points are Clarke-stationary and, within each cell, correspond to cellwise minima with ${\cal O}(1/k)$ convergence for the objective value and the Frank–Wolfe gap; the second variant achieves a ${\cal O}(1/\sqrt{k})$ rate for the duality gap. The results extend convex-rate guarantees to a broad nonconvex, nonsmooth DC setting, yielding projection-free, scalable algorithms with adaptive stepsizes for large-scale problems.
Abstract
In the present paper, we formulate two versions of Frank--Wolfe algorithm or conditional gradient method to solve the DC optimization problem with an adaptive step size. The DC objective function consists of two components; the first is thought to be differentiable with a continuous Lipschitz gradient, while the second is only thought to be convex. The second version is based on the first and employs finite differences to approximate the gradient of the first component of the objective function. In contrast to past formulations that used the curvature/Lipschitz-type constant of the objective function, the step size computed does not require any constant associated with the components. For the first version, we established that the algorithm is well-defined of the algorithm and that every limit point of the generated sequence is a stationary point of the problem. We also introduce the class of weak-star-convex functions and show that, despite the fact that these functions are non-convex in general, the rate of convergence of the first version of the algorithm to minimize these functions is ${\cal O}(1/k)$. The finite difference used to approximate the gradient in the second version of the Frank-Wolfe algorithm is computed with the step-size adaptively updated using two previous iterations. Unlike previous applications of finite difference in the Frank-Wolfe algorithm, which provided approximate gradients with absolute error, the one used here provides us with a relative error, simplifying the algorithm analysis. In this case, we show that all limit points of the generated sequence for the second version of the Frank-Wolfe algorithm are stationary points for the problem under consideration, and we establish that the rate of convergence for the duality gap is ${\cal O}(1/\sqrt{k})$.
