Table of Contents
Fetching ...

Complexity of Projected Gradient Methods for Strongly Convex Optimization with Hölder Continuous Gradient Terms

Xiaojun Chen, C. T. Kelley, Lei Wang

TL;DR

This work addresses constrained strongly convex optimization where the objective is the average of $m$ components with Hölder continuous gradients, yet the sum need not have a globally Hölder continuous gradient. It develops three first-order schemes: a Basic Projected Gradient Descent Method (PGDM) with a fixed stepsize, a Universal Primal Gradient Method (UPGM) with adaptive line-search, and a Universal Fast Gradient Method (UFGM) that exploits estimating sequences for acceleration. The key theoretical contribution is that the iteration complexity is governed by $\hat{\alpha} = \min_i \alpha_i$, with bounds $O(\log(\varepsilon^{-1}) \varepsilon^{2(\hat{\alpha}-1)/(1+\hat{\alpha})})$ for fixed stepsize and $O(\log(\varepsilon^{-1}) \varepsilon^{2(\hat{\alpha}-1)/(1+3\hat{\alpha})})$ for the universal/fast variants, improving upon classical results when $\hat{\alpha} < 1$. The paper also provides numerical experiments on elliptic PDEs with non-Lipschitz terms to illustrate how the Hölder exponents influence convergence and to corroborate the theoretical rates. These findings offer practical guidance for solving non-Lipschitz-regularized, strongly convex problems with a composite gradient structure. All results are expressed with explicit $\hat{\alpha}$-dependent rates and line-search arguments, highlighting the balance between smoothness and convergence speed in constrained optimization.

Abstract

This paper studies the complexity of projected gradient descent methods for a class of strongly convex constrained optimization problems where the objective function is expressed as a summation of $m$ component functions, each possessing a gradient that is Hölder continuous with an exponent $α_i \in (0, 1]$. Under this formulation, the gradient of the objective function may fail to be globally Hölder continuous, thereby rendering existing complexity results inapplicable to this class of problems. Our theoretical analysis reveals that, in this setting, the complexity of projected gradient methods is determined by $\hatα = \min_{i \in \{1, \dotsc, m\}} α_i$. We first prove that, with an appropriately fixed stepsize, the complexity bound for finding an approximate minimizer with a distance to the true minimizer less than $\varepsilon$ is $O (\log (\varepsilon^{-1}) \varepsilon^{2 (\hatα - 1) / (1 + \hatα)})$, which extends the well-known complexity result for $\hatα = 1$. Next we show that the complexity bound can be improved to $O (\log (\varepsilon^{-1}) \varepsilon^{2 (\hatα - 1) / (1 + 3 \hatα)})$ if the stepsize is updated by the universal scheme. We illustrate our complexity results by numerical examples arising from elliptic equations with a non-Lipschitz term.

Complexity of Projected Gradient Methods for Strongly Convex Optimization with Hölder Continuous Gradient Terms

TL;DR

This work addresses constrained strongly convex optimization where the objective is the average of components with Hölder continuous gradients, yet the sum need not have a globally Hölder continuous gradient. It develops three first-order schemes: a Basic Projected Gradient Descent Method (PGDM) with a fixed stepsize, a Universal Primal Gradient Method (UPGM) with adaptive line-search, and a Universal Fast Gradient Method (UFGM) that exploits estimating sequences for acceleration. The key theoretical contribution is that the iteration complexity is governed by , with bounds for fixed stepsize and for the universal/fast variants, improving upon classical results when . The paper also provides numerical experiments on elliptic PDEs with non-Lipschitz terms to illustrate how the Hölder exponents influence convergence and to corroborate the theoretical rates. These findings offer practical guidance for solving non-Lipschitz-regularized, strongly convex problems with a composite gradient structure. All results are expressed with explicit -dependent rates and line-search arguments, highlighting the balance between smoothness and convergence speed in constrained optimization.

Abstract

This paper studies the complexity of projected gradient descent methods for a class of strongly convex constrained optimization problems where the objective function is expressed as a summation of component functions, each possessing a gradient that is Hölder continuous with an exponent . Under this formulation, the gradient of the objective function may fail to be globally Hölder continuous, thereby rendering existing complexity results inapplicable to this class of problems. Our theoretical analysis reveals that, in this setting, the complexity of projected gradient methods is determined by . We first prove that, with an appropriately fixed stepsize, the complexity bound for finding an approximate minimizer with a distance to the true minimizer less than is , which extends the well-known complexity result for . Next we show that the complexity bound can be improved to if the stepsize is updated by the universal scheme. We illustrate our complexity results by numerical examples arising from elliptic equations with a non-Lipschitz term.
Paper Structure (11 sections, 10 theorems, 106 equations, 5 figures)

This paper contains 11 sections, 10 theorems, 106 equations, 5 figures.

Key Result

Proposition 2.1

Let $\delta > 0$ and Then for all $\mathbf{u}, \mathbf{v} \in \Omega$, we have

Figures (5)

  • Figure 1: Numerical performance of Algorithm \ref{['alg:gd']} for problem \ref{['opt:test']}.
  • Figure 2: Numerical performance of Algorithm \ref{['alg:upgm']} for problem \ref{['opt:test']} with different values of $\alpha$.
  • Figure 3: Numerical performance of Algorithm \ref{['alg:ufgm']} for problem \ref{['opt:test']} with smaller stepsizes.
  • Figure 4: Numerical performance of Algorithm \ref{['alg:ufgm']} for problem \ref{['opt:test']} with larger stepsizes.
  • Figure 5: Numerical performance of Algorithm \ref{['alg:gd']} and Algorithm \ref{['alg:ufgm']} for problem \ref{['opt:test2']} with different values of $\alpha$.

Theorems & Definitions (23)

  • Example 1
  • Proposition 2.1
  • proof
  • Theorem 2.2
  • proof
  • Theorem 3.1
  • proof
  • Corollary 3.2
  • proof
  • Lemma 4.1
  • ...and 13 more