The Positivity of the Neural Tangent Kernel
Luís Carvalho, João L. Costa, José Mourão, Gonçalo Oliveira
TL;DR
The paper proves that the Neural Tangent Kernel (NTK) of deep, wide feedforward networks is strictly positive definite for any depth when the activation is non-polynomial, under standard initialization. The authors combine a warm-up analysis of a one-hidden-layer case, a novel polynomial-function characterization showing that certain linear identities force the activation to be a polynomial, and an inductive framework that propagates positivity through the NTK recurrences across layers. This yields conditions under which gradient descent can drive training loss to zero in the infinite-width limit, removing several earlier data- or architecture-specific restrictions. The results illuminate the role of biases in ensuring positivity and broaden the class of activations for which memorization is guaranteed in wide networks, with implications for understanding learning dynamics and generalization in deep learning.
Abstract
The Neural Tangent Kernel (NTK) has emerged as a fundamental concept in the study of wide Neural Networks. In particular, it is known that the positivity of the NTK is directly related to the memorization capacity of sufficiently wide networks, i.e., to the possibility of reaching zero loss in training, via gradient descent. Here we will improve on previous works and obtain a sharp result concerning the positivity of the NTK of feedforward networks of any depth. More precisely, we will show that, for any non-polynomial activation function, the NTK is strictly positive definite. Our results are based on a novel characterization of polynomial functions which is of independent interest.
