Bounds for the smallest eigenvalue of the NTK for arbitrary spherical data of arbitrary dimension

Kedar Karhadkar; Michael Murray; Guido Montúfar

Bounds for the smallest eigenvalue of the NTK for arbitrary spherical data of arbitrary dimension

Kedar Karhadkar, Michael Murray, Guido Montúfar

TL;DR

This work addresses the problem of bounding the smallest eigenvalue of the neural tangent kernel (NTK) for neural networks trained with gradient descent. It introduces a geometry-driven approach based on a hemisphere transform and spherical-harmonic decomposition to obtain lower and upper bounds that depend on data collinearity, rather than distributional data assumptions, and that hold even when the input dimension $d_0$ is fixed. For shallow ReLU networks, the authors show $\lambda_{\min}(\mathbf{K}) = \tilde{\Omega}(d_0^{-3}\delta^2)$ under width $d_1 = \tilde{\Omega}(\|\mathbf{X}\|^2 d_0^3 \delta^{-2})$, with an upper bound of $O(\delta')$, and extend these insights to deep networks under a pyramidal width condition to obtain $\lambda_{\min}(\mathbf{K}) = \tilde{\Omega}(d_0^{-3}\delta^{4})$ (scaling with depth as $O(L)$). A corollary recovers known rates in the uniform-sphere setting, showing tightness up to logarithmic factors for certain regimes. This work broadens NTK conditioning analysis beyond distributional data assumptions and high-dimensional regimes, enabling global convergence and memorization results under more general data geometry.

Abstract

Bounds on the smallest eigenvalue of the neural tangent kernel (NTK) are a key ingredient in the analysis of neural network optimization and memorization. However, existing results require distributional assumptions on the data and are limited to a high-dimensional setting, where the input dimension $d_0$ scales at least logarithmically in the number of samples $n$. In this work we remove both of these requirements and instead provide bounds in terms of a measure of the collinearity of the data: notably these bounds hold with high probability even when $d_0$ is held constant versus $n$. We prove our results through a novel application of the hemisphere transform.

Bounds for the smallest eigenvalue of the NTK for arbitrary spherical data of arbitrary dimension

TL;DR

is fixed. For shallow ReLU networks, the authors show

under width

, with an upper bound of

, and extend these insights to deep networks under a pyramidal width condition to obtain

(scaling with depth as

). A corollary recovers known rates in the uniform-sphere setting, showing tightness up to logarithmic factors for certain regimes. This work broadens NTK conditioning analysis beyond distributional data assumptions and high-dimensional regimes, enabling global convergence and memorization results under more general data geometry.

Abstract

scales at least logarithmically in the number of samples

. In this work we remove both of these requirements and instead provide bounds in terms of a measure of the collinearity of the data: notably these bounds hold with high probability even when

is held constant versus

. We prove our results through a novel application of the hemisphere transform.

Paper Structure (39 sections, 45 theorems, 319 equations)

This paper contains 39 sections, 45 theorems, 319 equations.

Introduction
Main contributions.
Notations.
Related work
Prior work on the NTK.
Prior work on the smallest eigenvalue of the NTK.
Shallow networks
Proof outline for Theorem \ref{['thm:shallow-main']}
1) Bound the smallest eigenvalue in terms of the infinite-width limit.
2) Interpret the infinite-width kernel in terms of a hemisphere transform.
3) Bound the hemisphere transform norm via spherical harmonics.
4) Bound the hemisphere transform and spherical harmonics on the data.
5) Upper bound.
From shallow to deep neural networks
Proof outline for Theorem \ref{['thm:deep-main']}
...and 24 more sections

Key Result

Theorem 1

Let $d \geq 3$, $\epsilon \in (0,1)$, and $\delta, \delta' \in (0, \sqrt{2})$. Suppose that ${\bm{x}}_1, \cdots, {\bm{x}}_n \in \mathbb{S}^{d -1}$ are $\delta$-separated and $\min_{i \neq k}\|{\bm{x}}_i - {\bm{x}}_k\| \leq \delta'$. Define If $d_1 \gtrsim \frac{\|{\bm{X}}\|^2 }{\lambda}\log \frac{n}{\epsilon}$, then with probability at least $1 - \epsilon$,

Theorems & Definitions (77)

Theorem 1
Corollary 1
Lemma 1
Lemma 1
Lemma 1
Lemma 1
Lemma 1
Theorem 2
Lemma 2
Lemma 2
...and 67 more

Bounds for the smallest eigenvalue of the NTK for arbitrary spherical data of arbitrary dimension

TL;DR

Abstract

Bounds for the smallest eigenvalue of the NTK for arbitrary spherical data of arbitrary dimension

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (77)