Table of Contents
Fetching ...

Bounds for the smallest eigenvalue of the NTK for arbitrary spherical data of arbitrary dimension

Kedar Karhadkar, Michael Murray, Guido Montúfar

TL;DR

This work addresses the problem of bounding the smallest eigenvalue of the neural tangent kernel (NTK) for neural networks trained with gradient descent. It introduces a geometry-driven approach based on a hemisphere transform and spherical-harmonic decomposition to obtain lower and upper bounds that depend on data collinearity, rather than distributional data assumptions, and that hold even when the input dimension $d_0$ is fixed. For shallow ReLU networks, the authors show $\lambda_{\min}(\mathbf{K}) = \tilde{\Omega}(d_0^{-3}\delta^2)$ under width $d_1 = \tilde{\Omega}(\|\mathbf{X}\|^2 d_0^3 \delta^{-2})$, with an upper bound of $O(\delta')$, and extend these insights to deep networks under a pyramidal width condition to obtain $\lambda_{\min}(\mathbf{K}) = \tilde{\Omega}(d_0^{-3}\delta^{4})$ (scaling with depth as $O(L)$). A corollary recovers known rates in the uniform-sphere setting, showing tightness up to logarithmic factors for certain regimes. This work broadens NTK conditioning analysis beyond distributional data assumptions and high-dimensional regimes, enabling global convergence and memorization results under more general data geometry.

Abstract

Bounds on the smallest eigenvalue of the neural tangent kernel (NTK) are a key ingredient in the analysis of neural network optimization and memorization. However, existing results require distributional assumptions on the data and are limited to a high-dimensional setting, where the input dimension $d_0$ scales at least logarithmically in the number of samples $n$. In this work we remove both of these requirements and instead provide bounds in terms of a measure of the collinearity of the data: notably these bounds hold with high probability even when $d_0$ is held constant versus $n$. We prove our results through a novel application of the hemisphere transform.

Bounds for the smallest eigenvalue of the NTK for arbitrary spherical data of arbitrary dimension

TL;DR

This work addresses the problem of bounding the smallest eigenvalue of the neural tangent kernel (NTK) for neural networks trained with gradient descent. It introduces a geometry-driven approach based on a hemisphere transform and spherical-harmonic decomposition to obtain lower and upper bounds that depend on data collinearity, rather than distributional data assumptions, and that hold even when the input dimension is fixed. For shallow ReLU networks, the authors show under width , with an upper bound of , and extend these insights to deep networks under a pyramidal width condition to obtain (scaling with depth as ). A corollary recovers known rates in the uniform-sphere setting, showing tightness up to logarithmic factors for certain regimes. This work broadens NTK conditioning analysis beyond distributional data assumptions and high-dimensional regimes, enabling global convergence and memorization results under more general data geometry.

Abstract

Bounds on the smallest eigenvalue of the neural tangent kernel (NTK) are a key ingredient in the analysis of neural network optimization and memorization. However, existing results require distributional assumptions on the data and are limited to a high-dimensional setting, where the input dimension scales at least logarithmically in the number of samples . In this work we remove both of these requirements and instead provide bounds in terms of a measure of the collinearity of the data: notably these bounds hold with high probability even when is held constant versus . We prove our results through a novel application of the hemisphere transform.
Paper Structure (39 sections, 45 theorems, 319 equations)

This paper contains 39 sections, 45 theorems, 319 equations.

Key Result

Theorem 1

Let $d \geq 3$, $\epsilon \in (0,1)$, and $\delta, \delta' \in (0, \sqrt{2})$. Suppose that ${\bm{x}}_1, \cdots, {\bm{x}}_n \in \mathbb{S}^{d -1}$ are $\delta$-separated and $\min_{i \neq k}\|{\bm{x}}_i - {\bm{x}}_k\| \leq \delta'$. Define If $d_1 \gtrsim \frac{\|{\bm{X}}\|^2 }{\lambda}\log \frac{n}{\epsilon}$, then with probability at least $1 - \epsilon$,

Theorems & Definitions (77)

  • Theorem 1
  • Corollary 1
  • Lemma 1
  • Lemma 1
  • Lemma 1
  • Lemma 1
  • Lemma 1
  • Theorem 2
  • Lemma 2
  • Lemma 2
  • ...and 67 more