Complexity of Projected Gradient Methods for Strongly Convex Optimization with Hölder Continuous Gradient Terms

Xiaojun Chen; C. T. Kelley; Lei Wang

Complexity of Projected Gradient Methods for Strongly Convex Optimization with Hölder Continuous Gradient Terms

Xiaojun Chen, C. T. Kelley, Lei Wang

TL;DR

This work addresses constrained strongly convex optimization where the objective is the average of $m$ components with Hölder continuous gradients, yet the sum need not have a globally Hölder continuous gradient. It develops three first-order schemes: a Basic Projected Gradient Descent Method (PGDM) with a fixed stepsize, a Universal Primal Gradient Method (UPGM) with adaptive line-search, and a Universal Fast Gradient Method (UFGM) that exploits estimating sequences for acceleration. The key theoretical contribution is that the iteration complexity is governed by $\hat{\alpha} = \min_i \alpha_i$, with bounds $O(\log(\varepsilon^{-1}) \varepsilon^{2(\hat{\alpha}-1)/(1+\hat{\alpha})})$ for fixed stepsize and $O(\log(\varepsilon^{-1}) \varepsilon^{2(\hat{\alpha}-1)/(1+3\hat{\alpha})})$ for the universal/fast variants, improving upon classical results when $\hat{\alpha} < 1$. The paper also provides numerical experiments on elliptic PDEs with non-Lipschitz terms to illustrate how the Hölder exponents influence convergence and to corroborate the theoretical rates. These findings offer practical guidance for solving non-Lipschitz-regularized, strongly convex problems with a composite gradient structure. All results are expressed with explicit $\hat{\alpha}$-dependent rates and line-search arguments, highlighting the balance between smoothness and convergence speed in constrained optimization.

Abstract

This paper studies the complexity of projected gradient descent methods for a class of strongly convex constrained optimization problems where the objective function is expressed as a summation of $m$ component functions, each possessing a gradient that is Hölder continuous with an exponent $α_i \in (0, 1]$. Under this formulation, the gradient of the objective function may fail to be globally Hölder continuous, thereby rendering existing complexity results inapplicable to this class of problems. Our theoretical analysis reveals that, in this setting, the complexity of projected gradient methods is determined by $\hatα = \min_{i \in \{1, \dotsc, m\}} α_i$. We first prove that, with an appropriately fixed stepsize, the complexity bound for finding an approximate minimizer with a distance to the true minimizer less than $\varepsilon$ is $O (\log (\varepsilon^{-1}) \varepsilon^{2 (\hatα - 1) / (1 + \hatα)})$, which extends the well-known complexity result for $\hatα = 1$. Next we show that the complexity bound can be improved to $O (\log (\varepsilon^{-1}) \varepsilon^{2 (\hatα - 1) / (1 + 3 \hatα)})$ if the stepsize is updated by the universal scheme. We illustrate our complexity results by numerical examples arising from elliptic equations with a non-Lipschitz term.

Complexity of Projected Gradient Methods for Strongly Convex Optimization with Hölder Continuous Gradient Terms

TL;DR

This work addresses constrained strongly convex optimization where the objective is the average of

components with Hölder continuous gradients, yet the sum need not have a globally Hölder continuous gradient. It develops three first-order schemes: a Basic Projected Gradient Descent Method (PGDM) with a fixed stepsize, a Universal Primal Gradient Method (UPGM) with adaptive line-search, and a Universal Fast Gradient Method (UFGM) that exploits estimating sequences for acceleration. The key theoretical contribution is that the iteration complexity is governed by

, with bounds

for fixed stepsize and

for the universal/fast variants, improving upon classical results when

. The paper also provides numerical experiments on elliptic PDEs with non-Lipschitz terms to illustrate how the Hölder exponents influence convergence and to corroborate the theoretical rates. These findings offer practical guidance for solving non-Lipschitz-regularized, strongly convex problems with a composite gradient structure. All results are expressed with explicit

-dependent rates and line-search arguments, highlighting the balance between smoothness and convergence speed in constrained optimization.

Abstract

This paper studies the complexity of projected gradient descent methods for a class of strongly convex constrained optimization problems where the objective function is expressed as a summation of

component functions, each possessing a gradient that is Hölder continuous with an exponent

. Under this formulation, the gradient of the objective function may fail to be globally Hölder continuous, thereby rendering existing complexity results inapplicable to this class of problems. Our theoretical analysis reveals that, in this setting, the complexity of projected gradient methods is determined by

. We first prove that, with an appropriately fixed stepsize, the complexity bound for finding an approximate minimizer with a distance to the true minimizer less than

, which extends the well-known complexity result for

. Next we show that the complexity bound can be improved to

if the stepsize is updated by the universal scheme. We illustrate our complexity results by numerical examples arising from elliptic equations with a non-Lipschitz term.

Paper Structure (11 sections, 10 theorems, 106 equations, 5 figures)

This paper contains 11 sections, 10 theorems, 106 equations, 5 figures.

Introduction
Basic Projected Gradient Descent Method with a Fixed Stepsize
Universal Primal Gradient Method
Universal Fast Gradient Method
Numerical Experiments
Two-dimensional PDE with a non-Lipschitz term
Numerical results for Algorithm \ref{['alg:gd']}
Numerical results for Algorithm \ref{['alg:upgm']}
Numerical results for Algorithm \ref{['alg:ufgm']}
Semi-linear elliptic problem with a constraint
Conclusion

Key Result

Proposition 2.1

Let $\delta > 0$ and Then for all $\mathbf{u}, \mathbf{v} \in \Omega$, we have

Figures (5)

Figure 1: Numerical performance of Algorithm \ref{['alg:gd']} for problem \ref{['opt:test']}.
Figure 2: Numerical performance of Algorithm \ref{['alg:upgm']} for problem \ref{['opt:test']} with different values of $\alpha$.
Figure 3: Numerical performance of Algorithm \ref{['alg:ufgm']} for problem \ref{['opt:test']} with smaller stepsizes.
Figure 4: Numerical performance of Algorithm \ref{['alg:ufgm']} for problem \ref{['opt:test']} with larger stepsizes.
Figure 5: Numerical performance of Algorithm \ref{['alg:gd']} and Algorithm \ref{['alg:ufgm']} for problem \ref{['opt:test2']} with different values of $\alpha$.

Theorems & Definitions (23)

Example 1
Proposition 2.1
proof
Theorem 2.2
proof
Theorem 3.1
proof
Corollary 3.2
proof
Lemma 4.1
...and 13 more

Complexity of Projected Gradient Methods for Strongly Convex Optimization with Hölder Continuous Gradient Terms

TL;DR

Abstract

Complexity of Projected Gradient Methods for Strongly Convex Optimization with Hölder Continuous Gradient Terms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (23)