Bounds for the smallest eigenvalue of the NTK for arbitrary spherical data of arbitrary dimension
Kedar Karhadkar, Michael Murray, Guido Montúfar
TL;DR
This work addresses the problem of bounding the smallest eigenvalue of the neural tangent kernel (NTK) for neural networks trained with gradient descent. It introduces a geometry-driven approach based on a hemisphere transform and spherical-harmonic decomposition to obtain lower and upper bounds that depend on data collinearity, rather than distributional data assumptions, and that hold even when the input dimension $d_0$ is fixed. For shallow ReLU networks, the authors show $\lambda_{\min}(\mathbf{K}) = \tilde{\Omega}(d_0^{-3}\delta^2)$ under width $d_1 = \tilde{\Omega}(\|\mathbf{X}\|^2 d_0^3 \delta^{-2})$, with an upper bound of $O(\delta')$, and extend these insights to deep networks under a pyramidal width condition to obtain $\lambda_{\min}(\mathbf{K}) = \tilde{\Omega}(d_0^{-3}\delta^{4})$ (scaling with depth as $O(L)$). A corollary recovers known rates in the uniform-sphere setting, showing tightness up to logarithmic factors for certain regimes. This work broadens NTK conditioning analysis beyond distributional data assumptions and high-dimensional regimes, enabling global convergence and memorization results under more general data geometry.
Abstract
Bounds on the smallest eigenvalue of the neural tangent kernel (NTK) are a key ingredient in the analysis of neural network optimization and memorization. However, existing results require distributional assumptions on the data and are limited to a high-dimensional setting, where the input dimension $d_0$ scales at least logarithmically in the number of samples $n$. In this work we remove both of these requirements and instead provide bounds in terms of a measure of the collinearity of the data: notably these bounds hold with high probability even when $d_0$ is held constant versus $n$. We prove our results through a novel application of the hemisphere transform.
