Table of Contents
Fetching ...

Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings

Evan Markou, Thalaiyasingam Ajanthan, Stephen Gould

TL;DR

The paper addresses nonconvex optimisation with structured, manifold-valued minimisers by introducing reduction mappings that parametrize part of the solution onto the manifold, producing a reduced objective with improved curvature. It proves that affine reductions strictly improve the smoothness constant and that nonlinear reductions yield a further, controlled curvature improvement via a correction term, alongside a strict enhancement of the Morse–Bott constant; these effects combine to yield a strictly better condition number and faster local convergence under gradient-based methods with a pullback metric. The framework also accommodates inner-optimization mappings (argmin) and extends classical preconditioning concepts to geometry-aware reductions, providing a principled explanation for empirical gains in structured nonconvex problems. The results imply that carefully designed reduction mappings can be used as a general, intrinsic preconditioner to accelerate convergence in applications such as matrix factorisation, deep learning, and other structured nonconvex tasks. Overall, the work offers a unified geometric lens for exploiting known optimal-structure to achieve provably faster convergence rates.

Abstract

Many high-dimensional optimisation problems exhibit rich geometric structures in their set of minimisers, often forming smooth manifolds due to over-parametrisation or symmetries. When this structure is known, at least locally, it can be exploited through reduction mappings that reparametrise part of the parameter space to lie on the solution manifold. These reductions naturally arise from inner optimisation problems and effectively remove redundant directions, yielding a lower-dimensional objective. In this work, we introduce a general framework to understand how such reductions influence the optimisation landscape. We show that well-designed reduction mappings improve curvature properties of the objective, leading to better-conditioned problems and theoretically faster convergence for gradient-based methods. Our analysis unifies a range of scenarios where structural information at optimality is leveraged to accelerate convergence, offering a principled explanation for the empirical gains observed in such optimisation algorithms.

Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings

TL;DR

The paper addresses nonconvex optimisation with structured, manifold-valued minimisers by introducing reduction mappings that parametrize part of the solution onto the manifold, producing a reduced objective with improved curvature. It proves that affine reductions strictly improve the smoothness constant and that nonlinear reductions yield a further, controlled curvature improvement via a correction term, alongside a strict enhancement of the Morse–Bott constant; these effects combine to yield a strictly better condition number and faster local convergence under gradient-based methods with a pullback metric. The framework also accommodates inner-optimization mappings (argmin) and extends classical preconditioning concepts to geometry-aware reductions, providing a principled explanation for empirical gains in structured nonconvex problems. The results imply that carefully designed reduction mappings can be used as a general, intrinsic preconditioner to accelerate convergence in applications such as matrix factorisation, deep learning, and other structured nonconvex tasks. Overall, the work offers a unified geometric lens for exploiting known optimal-structure to achieve provably faster convergence rates.

Abstract

Many high-dimensional optimisation problems exhibit rich geometric structures in their set of minimisers, often forming smooth manifolds due to over-parametrisation or symmetries. When this structure is known, at least locally, it can be exploited through reduction mappings that reparametrise part of the parameter space to lie on the solution manifold. These reductions naturally arise from inner optimisation problems and effectively remove redundant directions, yielding a lower-dimensional objective. In this work, we introduce a general framework to understand how such reductions influence the optimisation landscape. We show that well-designed reduction mappings improve curvature properties of the objective, leading to better-conditioned problems and theoretically faster convergence for gradient-based methods. Our analysis unifies a range of scenarios where structural information at optimality is leveraged to accelerate convergence, offering a principled explanation for the empirical gains observed in such optimisation algorithms.

Paper Structure

This paper contains 41 sections, 22 theorems, 110 equations, 12 figures.

Key Result

Theorem 1

Let $f : \mathbb{R}^n \to \mathbb{R}$ be a $C^2$ function satisfying Assumption assum:main on the compact neighbourhood ${\mathcal{N}}$. Let $\Psi : \mathbb{R}^{n_1} \to \mathbb{R}^{n_2}$ be an affine mapping, and define the reduction mapping $\Phi(x_1) = \bigl(x_1, \Psi(x_1)\bigr)$, the reduced fun Then, equipping $\mathbb{R}^{n_1}$ with the pullback metric induced by the embedding $\Phi$, the Ri

Figures (12)

  • Figure 1: Illustration of how well-designed reduction mappings iron out worst‑case curvature. The opaque surface depicts the graph of the function $f : \mathbb{R}^2 \to \mathbb{R}$, lifted above the ambient domain (grey plane) for visualisation purposes. The function has a single global minimum ${\mathcal{S}} = \{(0,0)\}$. In general, ${\mathcal{S}}$ can be a set of non-isolated points. The blue curve shows the restriction of $f$ along the mapping $\Psi_1: x_1 \mapsto x_1$, where the high-curvature quadratic component cancels, yielding a flatter profile in $x_1$. In contrast, the orange curve corresponds to the mapping $\Psi_2: x_1 \mapsto 0$, which preserves most of the steep curvature of $f$. The grey-green curve traces the restriction along a nonlinear sinusoidal mapping---an example of a poorly designed reduction, which introduces additional curvature into the problem. Dashed curves on the ambient domain represent the images of these mappings as one-dimensional submanifolds ${\mathcal{M}}$. The submanifolds have a non-empty intersection with ${\mathcal{S}}$.
  • Figure 2: Curvature profiles for three reduction mappings. The full (unconstrained) function $f(x_1,x_2)=x_1^2+10(x_2-x_1)^2$ exhibits high curvature (approximately 41). Fixing $x_2=0$ yields $F_{\text{fixed}}(x_1) = 11x_1^2$, with curvature 22 at $x_1 = 0$. The bilevel reduction $F_{\text{linear}}(x_1) = x_1^2$ has lower curvature (2 at $x_1 = 0$), while the nonlinear mapping induces strong oscillations and increases the maximum curvature to 82.
  • Figure 3: Visualisation of the ambient function $f(x_1,x_2) = \varphi(x_1) + (x_2 - \sin(x_1))^2$, with $\varphi$ a flat-bottom quartic, and three lifted restriction curves corresponding to different reduction mappings $\Psi_i(x_1)$. The black curve ${\mathcal{S}}$ denotes the global minimisers of $f$, forming a non-isolated solution manifold parametrised by $x_1\in[-0.5,0.5]$ and $x_2 = \sin(x_1)$. Each lifted curve lies on the surface $f$, and the dashed curves on the base plane show the image of each corresponding manifold ${\mathcal{M}}_{F_i}$.
  • Figure 4: Curvature comparison across different reduction mappings applied to the function $f(x_1, x_2) = \varphi(x_1) + (x_2 - \sin(x_1))^2$, where $\varphi$ is flat on $[-0.5, 0.5]$ and quartic outside. The black dashed line represents the maximum eigenvalue of the full Hessian when no reduction is applied. The blue curve shows the curvature induced by fixing $x_2 = 0$, resulting in high curvature due to mismatch with the optimal $x_2 = \sin(x_1)$. The red curve corresponds to the linear reduction $x_2 = x_1$, which aligns more closely with $\sin(x_1)$ and yields lower curvature. The green curve represents the bilevel (implicit) reduction $x_2 = \sin(x_1)$, which eliminates coupling and flattens the function inside the flat region, driving curvature to zero. Each dashed horizontal line indicates the maximum curvature for its corresponding method.
  • Figure 6: Optimisation trajectories for the 2D quadratic problem.
  • ...and 7 more figures

Theorems & Definitions (36)

  • Definition 1: Morse--Bott Property
  • Definition 2: Reduction Mapping and Reduced Function
  • Definition 3: Graph Manifold
  • Theorem 1: Sharper Smoothness Constant for Reduced Functions with Affine Mappings
  • Corollary 1: Euclidean Smoothness Bound under Affine Mappings
  • Theorem 2: Sharper Smoothness Constant for Reduced Functions with Nonlinear Mappings
  • Corollary 2: Euclidean Smoothness Bound under Nonlinear Mappings
  • Remark 1: Inner Mappings as Argmin Problems
  • Theorem 3: Strict Improvement of the Morse--Bott Constant under Smooth Reduction
  • Remark 2: Morse--Bott Constant Equivalence rebjock2024fastconvergence
  • ...and 26 more