The Positivity of the Neural Tangent Kernel

Luís Carvalho; João L. Costa; José Mourão; Gonçalo Oliveira

The Positivity of the Neural Tangent Kernel

Luís Carvalho, João L. Costa, José Mourão, Gonçalo Oliveira

TL;DR

The paper proves that the Neural Tangent Kernel (NTK) of deep, wide feedforward networks is strictly positive definite for any depth when the activation is non-polynomial, under standard initialization. The authors combine a warm-up analysis of a one-hidden-layer case, a novel polynomial-function characterization showing that certain linear identities force the activation to be a polynomial, and an inductive framework that propagates positivity through the NTK recurrences across layers. This yields conditions under which gradient descent can drive training loss to zero in the infinite-width limit, removing several earlier data- or architecture-specific restrictions. The results illuminate the role of biases in ensuring positivity and broaden the class of activations for which memorization is guaranteed in wide networks, with implications for understanding learning dynamics and generalization in deep learning.

Abstract

The Neural Tangent Kernel (NTK) has emerged as a fundamental concept in the study of wide Neural Networks. In particular, it is known that the positivity of the NTK is directly related to the memorization capacity of sufficiently wide networks, i.e., to the possibility of reaching zero loss in training, via gradient descent. Here we will improve on previous works and obtain a sharp result concerning the positivity of the NTK of feedforward networks of any depth. More precisely, we will show that, for any non-polynomial activation function, the NTK is strictly positive definite. Our results are based on a novel characterization of polynomial functions which is of independent interest.

The Positivity of the Neural Tangent Kernel

TL;DR

Abstract

Paper Structure (10 sections, 16 theorems, 87 equations)

This paper contains 10 sections, 16 theorems, 87 equations.

Introduction
Feedforward Neural Networks and the Neural tangent kernel
Main results
Related Work
Paper overview
The Positivity of the NTK I: warm-up with an instructive special case
Two characterizations of polynomial functions.
The Positivity of the NTK II: the general case
Networks with biases
Networks with no biases

Key Result

Theorem 1

Consider an architecture with activated biases, i.e. $\beta\neq 0$, and a continuous, almost everywhere differentiable and non-polynomial activation function $\sigma$. Then, the NTK $\Theta^{(L)}_{\infty}$ is (in the sense of Definition defSPD) a strictly positive definite Kernel for all $L\geq 2$.

Theorems & Definitions (29)

Definition 1
Theorem 1: Positivity of the NTK for networks with biases
Remark 1
Theorem 2: Positivity of the NTK for networks with no biases
Remark 2
Theorem 3
Theorem 4
proof
Lemma 1
proof
...and 19 more

The Positivity of the Neural Tangent Kernel

TL;DR

Abstract

The Positivity of the Neural Tangent Kernel

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (29)