Sharper Convergence Rates for Nonconvex Optimisation via Reduction Mappings
Evan Markou, Thalaiyasingam Ajanthan, Stephen Gould
TL;DR
The paper addresses nonconvex optimisation with structured, manifold-valued minimisers by introducing reduction mappings that parametrize part of the solution onto the manifold, producing a reduced objective with improved curvature. It proves that affine reductions strictly improve the smoothness constant and that nonlinear reductions yield a further, controlled curvature improvement via a correction term, alongside a strict enhancement of the Morse–Bott constant; these effects combine to yield a strictly better condition number and faster local convergence under gradient-based methods with a pullback metric. The framework also accommodates inner-optimization mappings (argmin) and extends classical preconditioning concepts to geometry-aware reductions, providing a principled explanation for empirical gains in structured nonconvex problems. The results imply that carefully designed reduction mappings can be used as a general, intrinsic preconditioner to accelerate convergence in applications such as matrix factorisation, deep learning, and other structured nonconvex tasks. Overall, the work offers a unified geometric lens for exploiting known optimal-structure to achieve provably faster convergence rates.
Abstract
Many high-dimensional optimisation problems exhibit rich geometric structures in their set of minimisers, often forming smooth manifolds due to over-parametrisation or symmetries. When this structure is known, at least locally, it can be exploited through reduction mappings that reparametrise part of the parameter space to lie on the solution manifold. These reductions naturally arise from inner optimisation problems and effectively remove redundant directions, yielding a lower-dimensional objective. In this work, we introduce a general framework to understand how such reductions influence the optimisation landscape. We show that well-designed reduction mappings improve curvature properties of the objective, leading to better-conditioned problems and theoretically faster convergence for gradient-based methods. Our analysis unifies a range of scenarios where structural information at optimality is leveraged to accelerate convergence, offering a principled explanation for the empirical gains observed in such optimisation algorithms.
