Efficient Analysis of the Distilled Neural Tangent Kernel
Jamie Mahowald, Brian Bell, Alex Ho, Michael Geyer
TL;DR
This work addresses the prohibitive cost of computing NTKs for large neural networks by introducing the Distilled Neural Tangent Kernel (DNTK), a pipeline that fuses dataset distillation, random projection, and gradient distillation to compress data, gradients, and inducing points. By exploiting redundancy at data, parameter, and gradient-subspace levels, DNTK delivers up to several orders of magnitude reductions in computation and storage while preserving kernel structure and predictive accuracy. The approach is underpinned by a theoretical framework linking bilevel dataset distillation to tangent-feature subspaces and a spectral analysis revealing low effective ranks in per-class NTKs. Empirical results on ImageNette with ResNet-18 demonstrate high fidelity with dramatic reductions in required gradients and kernel size, enabling scalable NTK-based analyses for practical networks and datasets.
Abstract
Neural tangent kernel (NTK) methods are computationally limited by the need to evaluate large Jacobians across many data points. Existing approaches reduce this cost primarily through projecting and sketching the Jacobian. We show that NTK computation can also be reduced by compressing the data dimension itself using NTK-tuned dataset distillation. We demonstrate that the neural tangent space spanned by the input data can be induced by dataset distillation, yielding a 20-100$\times$ reduction in required Jacobian calculations. We further show that per-class NTK matrices have low effective rank that is preserved by this reduction. Building on these insights, we propose the distilled neural tangent kernel (DNTK), which combines NTK-tuned dataset distillation with state-of-the-art projection methods to reduce up NTK computational complexity by up to five orders of magnitude while preserving kernel structure and predictive performance.
