Table of Contents
Fetching ...

Loss-Transformation Invariance in the Damped Newton Method

Alexander Shestakov, Sushil Bohara, Samuel Horváth, Martin Takáč, Slavomír Hanzely

TL;DR

The paper investigates whether convexity is required for fast Newton convergence and introduces loss transformation invariance, proving that the stepsized Newton method is invariant under monotone transformations $L=\phi\circ f$ up to a multiplicative step-size factor. This enables convexification and star-convexification of pseudoconvex losses without changing the iterate sequence, by selecting $\phi$ to control the Hessian via $\nabla^2 L(x) \propto \nabla^2 f(x) + r(x)\nabla f(x)\nabla f(x)^T$. The authors derive a transformation-induced stepsize schedule that transfers iterates back to the original objective and provide theoretical and practical insights into unconventional stepsizes (including $>1$ and negative values). Numerical experiments demonstrate phenomena such as descent-sign reversal, shifts in convergence neighborhoods, and the ability to recover convergence through stepsize rescheduling on both synthetic and benchmark losses. The work offers a principled path to applying Newton-type methods beyond convex settings, broadening their applicability through loss transformations that preserve iteration trajectories.

Abstract

The Newton method is a powerful optimization algorithm, valued for its rapid local convergence and elegant geometric properties. However, its theoretical guarantees are usually limited to convex problems. In this work, we ask whether convexity is truly necessary. We introduce the concept of loss-transformation invariance, showing that damped Newton methods are unaffected by monotone transformations of the loss - apart from a simple rescaling of the step size. This insight allows difficult losses to be replaced with easier transformed versions, enabling convexification of many nonconvex problems while preserving the same sequence of iterates. Our analysis also explains the effectiveness of unconventional stepsizes in Newton's method, including values greater than one and even negative steps.

Loss-Transformation Invariance in the Damped Newton Method

TL;DR

The paper investigates whether convexity is required for fast Newton convergence and introduces loss transformation invariance, proving that the stepsized Newton method is invariant under monotone transformations up to a multiplicative step-size factor. This enables convexification and star-convexification of pseudoconvex losses without changing the iterate sequence, by selecting to control the Hessian via . The authors derive a transformation-induced stepsize schedule that transfers iterates back to the original objective and provide theoretical and practical insights into unconventional stepsizes (including and negative values). Numerical experiments demonstrate phenomena such as descent-sign reversal, shifts in convergence neighborhoods, and the ability to recover convergence through stepsize rescheduling on both synthetic and benchmark losses. The work offers a principled path to applying Newton-type methods beyond convex settings, broadening their applicability through loss transformations that preserve iteration trajectories.

Abstract

The Newton method is a powerful optimization algorithm, valued for its rapid local convergence and elegant geometric properties. However, its theoretical guarantees are usually limited to convex problems. In this work, we ask whether convexity is truly necessary. We introduce the concept of loss-transformation invariance, showing that damped Newton methods are unaffected by monotone transformations of the loss - apart from a simple rescaling of the step size. This insight allows difficult losses to be replaced with easier transformed versions, enabling convexification of many nonconvex problems while preserving the same sequence of iterates. Our analysis also explains the effectiveness of unconventional stepsizes in Newton's method, including values greater than one and even negative steps.

Paper Structure

This paper contains 24 sections, 14 theorems, 59 equations, 4 figures, 3 tables.

Key Result

Lemma 1

Function $f$ is (strictly) pseudoconvex if and only if the following conditions hold:

Figures (4)

  • Figure 1: Toy example: effect of transformation-induced stepsizes. Loss $f(x) = \ln(1+x^2)$ initialized at $x_0 = 0.8$. Left: the classical Newton method diverges. Middle: Newton method on the surrogate $L(x) = 2x \arctan(x)$ converges to the minimizer. Right: Newton method on the original loss with a transformation-induced stepsize schedule also converges.
  • Figure 2: Regions where the Newton step changes sign after a polynomial loss transformation.
  • Figure 3: Convergence regions under polynomial transformations $\phi(f)=f^r$ on Beale and Goldstein--Price functions.
  • Figure 4: Regions where the Newton step changes sign after a logarithmic loss transformation.

Theorems & Definitions (28)

  • Definition 1: Convexity
  • Definition 2: Pseudoconvexity
  • Lemma 1
  • Theorem 1
  • Corollary 1
  • Lemma 2
  • Claim 1
  • Claim 2
  • Example 1
  • Definition 3
  • ...and 18 more