Low Rank Gradients and Where to Find Them

Rishi Sonthalia; Michael Murray; Guido Montúfar

Low Rank Gradients and Where to Find Them

Rishi Sonthalia, Michael Murray, Guido Montúfar

TL;DR

We study gradient structure in two-layer networks trained under anisotropic, ill-conditioned data with spikes. The central finding is that the input-weight gradient is generically well-approximated by a rank-two matrix formed by a residue-aligned term and a data-spike-aligned term, with an interpolant capturing their interaction. Activation choices, scaling regimes (MF vs NTK), and regularizers (weight decay, input noise, Jacobian penalties) modulate the two components, leading to regimes where one component dominates or both coexist. These insights illuminate how feature learning is guided by data structure and Regularization, and are validated on synthetic data and real embeddings (MNIST/CIFAR).

Abstract

This paper investigates low-rank structure in the gradients of the training loss for two-layer neural networks while relaxing the usual isotropy assumptions on the training data and parameters. We consider a spiked data model in which the bulk can be anisotropic and ill-conditioned, we do not require independent data and weight matrices and we also analyze both the mean-field and neural-tangent-kernel scalings. We show that the gradient with respect to the input weights is approximately low rank and is dominated by two rank-one terms: one aligned with the bulk data-residue , and another aligned with the rank one spike in the input data. We characterize how properties of the training data, the scaling regime and the activation function govern the balance between these two components. Additionally, we also demonstrate that standard regularizers, such as weight decay, input noise and Jacobian penalties, also selectively modulate these components. Experiments on synthetic and real data corroborate our theoretical predictions.

Low Rank Gradients and Where to Find Them

TL;DR

Abstract

Low Rank Gradients and Where to Find Them

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (52)