Table of Contents
Fetching ...

A Novel Unified Parametric Assumption for Nonconvex Optimization

Artem Riabinin, Ahmed Khaled, Peter Richtárik

TL;DR

The paper introduces a novel unified parametric assumption that links the gradient to a projection onto a subset of minimizers through a nonnegative progress function, enabling unified convergence analysis for nonconvex optimization. It develops deterministic and stochastic convergence theorems for gradient-based methods, and shows that classical convex guarantees emerge as a special case while subsuming several nonconvex conditions such as weak quasi-convexity and aiming. The framework is demonstrated across multiple function classes and extended to stochastic settings, with problem formulations and proofs complemented by experiments on half-space learning, Fashion-MNIST with an MLP, and CIFAR-10 with ResNet to validate practical relevance. Overall, the work provides a flexible, theoretically grounded bridge between convex optimization and nonconvex practice, guiding algorithm design and analysis in complex landscapes.

Abstract

Nonconvex optimization is central to modern machine learning, but the general framework of nonconvex optimization yields weak convergence guarantees that are too pessimistic compared to practice. On the other hand, while convexity enables efficient optimization, it is of limited applicability to many practical problems. To bridge this gap and better understand the practical success of optimization algorithms in nonconvex settings, we introduce a novel unified parametric assumption. Our assumption is general enough to encompass a broad class of nonconvex functions while also being specific enough to enable the derivation of a unified convergence theorem for gradient-based methods. Notably, by tuning the parameters of our assumption, we demonstrate its versatility in recovering several existing function classes as special cases and in identifying functions amenable to efficient optimization. We derive our convergence theorem for both deterministic and stochastic optimization, and conduct experiments to verify that our assumption can hold practically over optimization trajectories.

A Novel Unified Parametric Assumption for Nonconvex Optimization

TL;DR

The paper introduces a novel unified parametric assumption that links the gradient to a projection onto a subset of minimizers through a nonnegative progress function, enabling unified convergence analysis for nonconvex optimization. It develops deterministic and stochastic convergence theorems for gradient-based methods, and shows that classical convex guarantees emerge as a special case while subsuming several nonconvex conditions such as weak quasi-convexity and aiming. The framework is demonstrated across multiple function classes and extended to stochastic settings, with problem formulations and proofs complemented by experiments on half-space learning, Fashion-MNIST with an MLP, and CIFAR-10 with ResNet to validate practical relevance. Overall, the work provides a flexible, theoretically grounded bridge between convex optimization and nonconvex practice, guiding algorithm design and analysis in complex landscapes.

Abstract

Nonconvex optimization is central to modern machine learning, but the general framework of nonconvex optimization yields weak convergence guarantees that are too pessimistic compared to practice. On the other hand, while convexity enables efficient optimization, it is of limited applicability to many practical problems. To bridge this gap and better understand the practical success of optimization algorithms in nonconvex settings, we introduce a novel unified parametric assumption. Our assumption is general enough to encompass a broad class of nonconvex functions while also being specific enough to enable the derivation of a unified convergence theorem for gradient-based methods. Notably, by tuning the parameters of our assumption, we demonstrate its versatility in recovering several existing function classes as special cases and in identifying functions amenable to efficient optimization. We derive our convergence theorem for both deterministic and stochastic optimization, and conduct experiments to verify that our assumption can hold practically over optimization trajectories.

Paper Structure

This paper contains 37 sections, 7 theorems, 82 equations, 4 figures, 3 tables.

Key Result

Theorem 2.1

Let Assumptions ass:1 and ass:2 be satisfied. Further assume that the stepsize $\gamma^k$ satisfies the relations that holds for all $k \geq 0$, where $0<\alpha<2$, $\beta^k>0$, $\gamma_{\star}>0$, $x_p := \operatorname{proj}_{\tilde{S}}(x)$. Then we have the following descent inequality that holds for all $k \geq 0$ and where $C^{K} := \frac{\sum_{k=0}^K \gamma^k (2-\alpha)\beta^k}{\alpha c_1 \

Figures (4)

  • Figure 1: Examples of the function $f(x)$, $x \in \mathbb{R}$.
  • Figure 2: Training the half-space learning problem.
  • Figure 3: Training the MLP model with $3$ fully connected layers.
  • Figure 4: Training the ResNet model.

Theorems & Definitions (14)

  • Theorem 2.1
  • Corollary 2.2
  • Corollary 2.3
  • Corollary 2.4
  • Theorem 2.6
  • Corollary 2.7
  • Corollary 2.8
  • proof
  • proof
  • proof
  • ...and 4 more