Higher-order tensor methods for minimizing difference of convex functions
Ion Necoara
TL;DR
This work introduces a higher-order DC optimization framework (HO-DC) for solving $F(x)=f(x)+\psi(x)-g(x)$ where $\psi$ is convex (potentially nondifferentiable) and $f,g$ are convex with $p$- and $q$-order smoothness. HO-DC constructs a surrogate model by applying higher-order Taylor approximations to $f$ and $g$ with regularization and minimizes a descent-improving surrogate to obtain $x_{k+1}$; a variant allows adaptive regularization. The authors prove that any limit point of the HO-DC sequence is a stationary point, $F(x_k)$ decreases monotonically, and the minimum gradient norm $\min_{i<k} S_F(x_i)$ decays as $O\left(k^{-\frac{2\min(p,q)}{p+q+2}}\right)$; under KL, the whole sequence converges with linear or sublinear rates depending on the KL exponent $r>1$. For $p,q\in\{1,2\}$, the subproblem is implementable as a one-dimensional convex problem or a cubic-regularized Newton step, enabling practical deployment and unifying several DC algorithms (including proximal DCA) while extending to higher-order settings. An adaptive variant AH-DC with line-search over the regularization parameters is also proposed to ensure descent without exact knowledge of Lipschitz constants.
Abstract
Higher-order tensor methods were recently proposed for minimizing smooth convex and nonconvex functions. Higher-order algorithms accelerate the convergence of the classical first-order methods thanks to the higher-order derivatives used in the updates. The purpose of this paper is twofold. Firstly, to show that the higher-order algorithmic framework can be generalized and successfully applied to (nonsmooth) difference of convex functions, namely, those that can be expressed as the difference of two smooth convex functions and a possibly nonsmooth convex one. We also provide examples when the subproblem can be solved efficiently, even globally. Secondly, to derive a complete convergence analysis for our higher-order difference of convex functions (HO-DC) algorithm. In particular, we prove that any limit point of the HO-DC iterative sequence is a critical point of the problem under consideration, the corresponding objective value is monotonically decreasing and the minimum value of the norms of its subgradients converges globally to zero at a sublinear rate. The sublinear or linear convergence rates of the iterations are obtained under the Kurdyka-Lojasiewicz property.
