Derivation of effective gradient flow equations and dynamical truncation of training data in Deep Learning
Thomas Chen
TL;DR
This work derives explicit gradient-flow equations for cumulative weights and biases of a deep ReLU network under Euclidean input-space cost, assuming alignment of weights to the activation. It shows that gradient flow acts as a dynamical truncation of training data in input space, causing data clusters to shrink and, in favorable cases, collapse to points, offering an interpretable view linked to neural collapse. The analysis covers both cluster-separated truncations with explicit ODEs and the general case without separation, and it connects these dynamics to standard cost scenarios where a spectral gap ensures exponential convergence. Overall, the results provide a rigorous, geometry-driven explanation for how training data structures evolve under gradient descent and illuminate interpretability questions in supervised learning.
Abstract
We derive explicit equations governing the cumulative biases and weights in Deep Learning with ReLU activation function, based on gradient descent for the Euclidean cost in the input layer, and under the assumption that the weights are, in a precise sense, adapted to the coordinate system distinguished by the activations. We show that gradient descent corresponds to a dynamical process in the input layer, whereby clusters of data are progressively reduced in complexity ("truncated") at an exponential rate that increases with the number of data points that have already been truncated. We provide a detailed discussion of several types of solutions to the gradient flow equations. A main motivation for this work is to shed light on the interpretability question in supervised learning.
