Gradient flow in parameter space is equivalent to linear interpolation in output space

Thomas Chen; Patrícia Muñoz Ewald

Gradient flow in parameter space is equivalent to linear interpolation in output space

Thomas Chen, Patrícia Muñoz Ewald

TL;DR

This work studies how gradient flows in parameter space relate to equivalent flows in output space for overparameterized neural networks. The authors construct a one-parameter family of interpolating vector fields that connect the standard parameter-space gradient flow to an adapted flow inducing a constrained Euclidean gradient flow in output space, preserving equilibria. Under the $L^{2}$ loss, a full-rank Jacobian $D$ yields a time reparametrization that makes output-space dynamics linear in time toward a global minimum; for rank-deficient $D$, they quantify the deviation from linear interpolation, and for cross-entropy with positive labels they identify invariant output-space manifolds with a unique global minimum on each fiber. The results highlight the central role of the Jacobian rank (and NTK) in shaping optimization trajectories and offer tools to steer training via output-space geometry, with connections to neural-collapse phenomena and NTK reinterpretations.

Abstract

We prove that the standard gradient flow in parameter space that underlies many training algorithms in deep learning can be continuously deformed into an adapted gradient flow which yields (constrained) Euclidean gradient flow in output space. Moreover, for the $L^{2}$ loss, if the Jacobian of the outputs with respect to the parameters is full rank (for fixed training data), then the time variable can be reparametrized so that the resulting flow is simply linear interpolation, and a global minimum can be achieved. For the cross-entropy loss, under the same rank condition and assuming the labels have positive components, we derive an explicit formula for the unique global minimum.

Gradient flow in parameter space is equivalent to linear interpolation in output space

TL;DR

loss, a full-rank Jacobian

yields a time reparametrization that makes output-space dynamics linear in time toward a global minimum; for rank-deficient

, they quantify the deviation from linear interpolation, and for cross-entropy with positive labels they identify invariant output-space manifolds with a unique global minimum on each fiber. The results highlight the central role of the Jacobian rank (and NTK) in shaping optimization trajectories and offer tools to steer training via output-space geometry, with connections to neural-collapse phenomena and NTK reinterpretations.

Abstract

loss, if the Jacobian of the outputs with respect to the parameters is full rank (for fixed training data), then the time variable can be reparametrized so that the resulting flow is simply linear interpolation, and a global minimum can be achieved. For the cross-entropy loss, under the same rank condition and assuming the labels have positive components, we derive an explicit formula for the unique global minimum.

Paper Structure (11 sections, 3 theorems, 132 equations)

This paper contains 11 sections, 3 theorems, 132 equations.

Introduction
Main results
Adapted gradient flow
Reparametrization for the L^2 loss
Cross-entropy loss
Applications
Prescribed paths in output space
Final layer collapse
Tangent kernel
Some related work
Pseudoinverse

Key Result

lemma 1

Let $K \geq QN$, and let $\underline{x}(s)=\underline{x}( \underline{\theta}(s))$ be defined as in xund. Assume $\underline{x}( \underline{\theta})$ has Lipschitz continuous derivatives. When $D$ is full rank, setting yields If we allow for $\mathop{\mathrm{rank}}\nolimits (D) \leq QN$, then letting for $\psi$ satisfying results in

Theorems & Definitions (13)

remark 1
lemma 1: chen23
proof
remark 2
proof
lemma 2
proof
remark 3
proof
remark 4
...and 3 more

Gradient flow in parameter space is equivalent to linear interpolation in output space

TL;DR

Abstract

Gradient flow in parameter space is equivalent to linear interpolation in output space

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (13)