Table of Contents
Fetching ...

Restart-Free (Accelerated) Gradient Sliding Methods for Strongly Convex Composite Optimization

Xinming Wu, Zi Xu, Huiling Zhang

TL;DR

The paper addresses composite convex optimization with a smooth component $f$, a nonsmooth component $h$, and a proximable term $\chi$, proposing restart-free stochastic gradient sliding (RF-SGS) to circumvent restart overhead. For structured max-form nonsmooth terms, it introduces RF-ASGS, a restart-free accelerated scheme for bilinear saddle-point problems, using smooth approximations $h_\eta$ with carefully chosen parameters. The authors prove that RF-SGS achieves $\epsilon$-solutions with $O\left(\log\frac{1}{\epsilon}\right)$ gradient evaluations of $\nabla f$ and $O\left(\dfrac{1}{\epsilon}\right)$ subgradient evaluations of $h'$, while RF-ASGS attains $O\left(\sqrt{L/\mu}\,\log\frac{1}{\epsilon}\right)$ gradient evaluations of $\nabla f$ and $O\left(\|K\|/\sqrt{\epsilon}\right)$ operator evaluations for $K$ and $K^T$. Numerical experiments on portfolio optimization and total-variation denoising demonstrate smoother convergence and competitive performance relative to restart-based methods, highlighting the practical appeal of the proposed restart-free framework.

Abstract

In this paper, we study a class of composite optimization problems whose objective function is given by the summation of a general smooth and nonsmooth component, together with a relatively simple nonsmooth term. While restart strategies are commonly employed in first-order methods to achieve optimal convergence under strong convexity, they introduce structural complexity and practical overhead, making algorithm design and nesting cumbersome. To address this, we propose a \emph{restart-free} stochastic gradient sliding algorithm that eliminates the need for explicit restart phases when the simple nonsmooth component is strongly convex. Through a novel and carefully designed parameter selection strategy, we prove that the proposed algorithm achieves an $ε$-solution with only $\mathcal{O}(\log(\frac{1}ε))$ gradient evaluations for the smooth component and $\mathcal{O}(\frac{1}ε)$ stochastic subgradient evaluations for the nonsmooth component, matching the optimal complexity of existing multi-phase (restart-based) methods. Moreover, for the case where the nonsmooth component is structured, allowing the overall problem to be reformulated as a bilinear saddle-point problem, we develop a restart-free accelerated stochastic gradient sliding algorithm. We show that the resulting method requires only $\mathcal{O}(\log(\frac{1}ε))$ gradient computations for the smooth component while preserving an overall iteration complexity of $\mathcal{O}(\frac{1}{\sqrtε})$ for solving the corresponding saddle-point problems. Our work thus provides simpler, restart-f

Restart-Free (Accelerated) Gradient Sliding Methods for Strongly Convex Composite Optimization

TL;DR

The paper addresses composite convex optimization with a smooth component , a nonsmooth component , and a proximable term , proposing restart-free stochastic gradient sliding (RF-SGS) to circumvent restart overhead. For structured max-form nonsmooth terms, it introduces RF-ASGS, a restart-free accelerated scheme for bilinear saddle-point problems, using smooth approximations with carefully chosen parameters. The authors prove that RF-SGS achieves -solutions with gradient evaluations of and subgradient evaluations of , while RF-ASGS attains gradient evaluations of and operator evaluations for and . Numerical experiments on portfolio optimization and total-variation denoising demonstrate smoother convergence and competitive performance relative to restart-based methods, highlighting the practical appeal of the proposed restart-free framework.

Abstract

In this paper, we study a class of composite optimization problems whose objective function is given by the summation of a general smooth and nonsmooth component, together with a relatively simple nonsmooth term. While restart strategies are commonly employed in first-order methods to achieve optimal convergence under strong convexity, they introduce structural complexity and practical overhead, making algorithm design and nesting cumbersome. To address this, we propose a \emph{restart-free} stochastic gradient sliding algorithm that eliminates the need for explicit restart phases when the simple nonsmooth component is strongly convex. Through a novel and carefully designed parameter selection strategy, we prove that the proposed algorithm achieves an -solution with only gradient evaluations for the smooth component and stochastic subgradient evaluations for the nonsmooth component, matching the optimal complexity of existing multi-phase (restart-based) methods. Moreover, for the case where the nonsmooth component is structured, allowing the overall problem to be reformulated as a bilinear saddle-point problem, we develop a restart-free accelerated stochastic gradient sliding algorithm. We show that the resulting method requires only gradient computations for the smooth component while preserving an overall iteration complexity of for solving the corresponding saddle-point problems. Our work thus provides simpler, restart-f
Paper Structure (13 sections, 9 theorems, 111 equations, 3 figures, 2 algorithms)

This paper contains 13 sections, 9 theorems, 111 equations, 3 figures, 2 algorithms.

Key Result

Lemma 2.1

If $q$ is a $\mu$-strongly convex function and then $\forall u \in X$, we have

Figures (3)

  • Figure 1: Numerical results of the two tested algorithms for solving portfolio optimization problem.
  • Figure 2: Top-left: the noisy input image of size 128 $\times$ 128, with additive zero mean Gaussian noise ($\sigma$= 0.05). Top-right: gradient norms of the two algorithms versus CPU time. Bottom-right and bottom-left: denoised images using $\tau=16$ from M-AGS and RF-ASGS, respectively.
  • Figure 3: Top-left: the noisy input image of size 128 $\times$ 128, with additive zero mean Gaussian noise ($\sigma$= 0.01). Top-right: gradient norms of the two algorithms versus CPU time. Bottom-right and bottom-left: denoised images using $\tau=24$ from M-AGS and RF-ASGS, respectively.

Theorems & Definitions (18)

  • Lemma 2.1
  • Proof 1
  • Lemma 2.2
  • Theorem 2.8
  • Proof 2
  • Theorem 2.9
  • Proof 3
  • Remark 2.10
  • Remark 2.11
  • Lemma 3.3
  • ...and 8 more