Loss-Transformation Invariance in the Damped Newton Method
Alexander Shestakov, Sushil Bohara, Samuel Horváth, Martin Takáč, Slavomír Hanzely
TL;DR
The paper investigates whether convexity is required for fast Newton convergence and introduces loss transformation invariance, proving that the stepsized Newton method is invariant under monotone transformations $L=\phi\circ f$ up to a multiplicative step-size factor. This enables convexification and star-convexification of pseudoconvex losses without changing the iterate sequence, by selecting $\phi$ to control the Hessian via $\nabla^2 L(x) \propto \nabla^2 f(x) + r(x)\nabla f(x)\nabla f(x)^T$. The authors derive a transformation-induced stepsize schedule that transfers iterates back to the original objective and provide theoretical and practical insights into unconventional stepsizes (including $>1$ and negative values). Numerical experiments demonstrate phenomena such as descent-sign reversal, shifts in convergence neighborhoods, and the ability to recover convergence through stepsize rescheduling on both synthetic and benchmark losses. The work offers a principled path to applying Newton-type methods beyond convex settings, broadening their applicability through loss transformations that preserve iteration trajectories.
Abstract
The Newton method is a powerful optimization algorithm, valued for its rapid local convergence and elegant geometric properties. However, its theoretical guarantees are usually limited to convex problems. In this work, we ask whether convexity is truly necessary. We introduce the concept of loss-transformation invariance, showing that damped Newton methods are unaffected by monotone transformations of the loss - apart from a simple rescaling of the step size. This insight allows difficult losses to be replaced with easier transformed versions, enabling convexification of many nonconvex problems while preserving the same sequence of iterates. Our analysis also explains the effectiveness of unconventional stepsizes in Newton's method, including values greater than one and even negative steps.
