Gradient flow in parameter space is equivalent to linear interpolation in output space
Thomas Chen, Patrícia Muñoz Ewald
TL;DR
This work studies how gradient flows in parameter space relate to equivalent flows in output space for overparameterized neural networks. The authors construct a one-parameter family of interpolating vector fields that connect the standard parameter-space gradient flow to an adapted flow inducing a constrained Euclidean gradient flow in output space, preserving equilibria. Under the $L^{2}$ loss, a full-rank Jacobian $D$ yields a time reparametrization that makes output-space dynamics linear in time toward a global minimum; for rank-deficient $D$, they quantify the deviation from linear interpolation, and for cross-entropy with positive labels they identify invariant output-space manifolds with a unique global minimum on each fiber. The results highlight the central role of the Jacobian rank (and NTK) in shaping optimization trajectories and offer tools to steer training via output-space geometry, with connections to neural-collapse phenomena and NTK reinterpretations.
Abstract
We prove that the standard gradient flow in parameter space that underlies many training algorithms in deep learning can be continuously deformed into an adapted gradient flow which yields (constrained) Euclidean gradient flow in output space. Moreover, for the $L^{2}$ loss, if the Jacobian of the outputs with respect to the parameters is full rank (for fixed training data), then the time variable can be reparametrized so that the resulting flow is simply linear interpolation, and a global minimum can be achieved. For the cross-entropy loss, under the same rank condition and assuming the labels have positive components, we derive an explicit formula for the unique global minimum.
