Table of Contents
Fetching ...

Adaptive Conditional Gradient Descent

Abbas Khademi, Antonio Silveti-Falls

TL;DR

This work tackles the challenge of step-size selection for projectionfree firstorder optimization with linear minimization oracles, unifying Conditional Gradient and NonEuclidean Normalized Steepest Descent. It introduces Adaptive Conditional Gradient Descent, which uses a gradientdifferencebased local Lipschitz estimate, warmstarts the backtracking, and offers two scaling variants to adapt to local curvature. The authors provide convergence guarantees across nonconvex, quasarconvex, and strongly convex settings and validate the approach through extensive experiments on constrained and unconstrained problems, showing faster convergence and competitive or superior solution quality. The method is practical, versatile across geometries, and particularly valuable for large scale, geometryaware optimization where global Lipschitz constants are unknown or misleading.

Abstract

Selecting an effective step-size is a fundamental challenge in first-order optimization, especially for problems with non-Euclidean geometries. This paper presents a novel adaptive step-size strategy for optimization algorithms that rely on linear minimization oracles, as used in the Conditional Gradient or non-Euclidean Normalized Steepest Descent algorithms. Using a simple heuristic to estimate a local Lipschitz constant for the gradient, we can determine step-sizes that guarantee sufficient decrease at each iteration. More precisely, we establish convergence guarantees for our proposed Adaptive Conditional Gradient Descent algorithm, which covers as special cases both the classical Conditional Gradient algorithm and non-Euclidean Normalized Steepest Descent algorithms with adaptive step-sizes. Our analysis covers optimization of continuously differentiable functions in non-convex, quasar-convex, and strongly convex settings, achieving convergence rates that match state-of-the-art theoretical bounds. Comprehensive numerical experiments validate our theoretical findings and illustrate the practical effectiveness of Adaptive Conditional Gradient Descent. The results exhibit competitive performance, underscoring the potential of the adaptive step-size for applications.

Adaptive Conditional Gradient Descent

TL;DR

This work tackles the challenge of step-size selection for projectionfree firstorder optimization with linear minimization oracles, unifying Conditional Gradient and NonEuclidean Normalized Steepest Descent. It introduces Adaptive Conditional Gradient Descent, which uses a gradientdifferencebased local Lipschitz estimate, warmstarts the backtracking, and offers two scaling variants to adapt to local curvature. The authors provide convergence guarantees across nonconvex, quasarconvex, and strongly convex settings and validate the approach through extensive experiments on constrained and unconstrained problems, showing faster convergence and competitive or superior solution quality. The method is practical, versatile across geometries, and particularly valuable for large scale, geometryaware optimization where global Lipschitz constants are unknown or misleading.

Abstract

Selecting an effective step-size is a fundamental challenge in first-order optimization, especially for problems with non-Euclidean geometries. This paper presents a novel adaptive step-size strategy for optimization algorithms that rely on linear minimization oracles, as used in the Conditional Gradient or non-Euclidean Normalized Steepest Descent algorithms. Using a simple heuristic to estimate a local Lipschitz constant for the gradient, we can determine step-sizes that guarantee sufficient decrease at each iteration. More precisely, we establish convergence guarantees for our proposed Adaptive Conditional Gradient Descent algorithm, which covers as special cases both the classical Conditional Gradient algorithm and non-Euclidean Normalized Steepest Descent algorithms with adaptive step-sizes. Our analysis covers optimization of continuously differentiable functions in non-convex, quasar-convex, and strongly convex settings, achieving convergence rates that match state-of-the-art theoretical bounds. Comprehensive numerical experiments validate our theoretical findings and illustrate the practical effectiveness of Adaptive Conditional Gradient Descent. The results exhibit competitive performance, underscoring the potential of the adaptive step-size for applications.

Paper Structure

This paper contains 42 sections, 8 theorems, 105 equations, 16 figures, 14 tables, 3 algorithms.

Key Result

corollary thmcountercorollary

Let $k\in\mathbb{N}$ and consider the step-size $t_k$, iterate $x_k$, and direction $d^k$ generated by alg:ACGD. Then,

Figures (16)

  • Figure 1: Illustration of one step of the Conditional Gradient algorithm. Starting at $x^k$, the gradient $\nabla f(x^k)$ is computed and used to compute $v^k$, an output of the LMO. The next iterate $x^{k+1}$ is obtained by moving along the line segment connecting $x^k$ to $v^k$. The blue dashed line represents a supporting hyperplane to the set $\mathcal{C}$ dictated by its normal direction $-\nabla f(x^k)$.
  • Figure 2: One step of the Normalized Steepest Descent method using the $\ell^1$ unit-ball centered at $x^k$. Starting at $x^k$, the gradient $\nabla f(x^k)$ is computed and used to compute $v^k$, an output of the LMO over $\mathcal{C}=\mathcal{B}_{\ell^1}(1)$. The next iterate $x^{k+1}$ is then found by adding $t_kv^k$ to $x^k$. The blue dashed line represents the supporting hyperplane to the set $x^k+\mathcal{B}_{\ell^1}(1)$ dictated by its normal direction $-\nabla f(x^k)$.
  • Figure 3: The local Lipschitz constant found at each iteration with the pure backtracking procedure used in pedregosa2020linearly is oscillating heavily between two values, roughly $0.0065$ and $0.0030$.
  • Figure 4: Convergence behavior and computational efficiency for Lasso problem
  • Figure 5: Convergence behavior and computational efficiency for Matrix Balancing problem
  • ...and 11 more figures

Theorems & Definitions (20)

  • remark thmcounterremark
  • remark thmcounterremark
  • corollary thmcountercorollary
  • proof
  • corollary thmcountercorollary
  • proof
  • remark thmcounterremark
  • lemma thmcounterlemma
  • proof
  • theorem 1: Unconstrained Non-convex Convergence Rate
  • ...and 10 more