A Unified Kernel for Neural Network Learning

Shao-Qun Zhang; Zong-Yi Chen; Yong-Ming Tian; Xun Lu

A Unified Kernel for Neural Network Learning

Shao-Qun Zhang, Zong-Yi Chen, Yong-Ming Tian, Xun Lu

TL;DR

This work introduces the Unified Neural Kernel (UNK), a kernel induced by the inner product of produced variables and governed by gradient-descent dynamics with a multiplier $\lambda$ on the initial parameters, designed to bridge Neural Network Gaussian Processes (NNGP) and Neural Tangent Kernels (NTK). The UNK recovers the NTK in the $\lambda=0$ or $t=0$ limits and converges to the NNGP as $t\to\infty$ when $\lambda\neq 0$, providing a unified view of neural-kernel learning. The authors establish the existence, limiting behavior, convergence, and uniform tightness of the UNK and present explicit examples (including an $L_2$-regularizer case and a $t'$-based updating scheme) with supporting corollaries; they also validate the approach with MNIST-like experiments, showing improvements in pre-trained model fine-tuning. Overall, the work offers a cohesive theoretical framework for neural-kernel learning with practical implications for kernel-based inference and transfer learning.

Abstract

Past decades have witnessed a great interest in the distinction and connection between neural network learning and kernel learning. Recent advancements have made theoretical progress in connecting infinite-wide neural networks and Gaussian processes. Two predominant approaches have emerged: the Neural Network Gaussian Process (NNGP) and the Neural Tangent Kernel (NTK). The former, rooted in Bayesian inference, represents a zero-order kernel, while the latter, grounded in the tangent space of gradient descents, is a first-order kernel. In this paper, we present the Unified Neural Kernel (UNK), which {is induced by the inner product of produced variables and characterizes the learning dynamics of neural networks with gradient descents and parameter initialization.} The proposed UNK kernel maintains the limiting properties of both NNGP and NTK, exhibiting behaviors akin to NTK with a finite learning step and converging to NNGP as the learning step approaches infinity. Besides, we also theoretically characterize the uniform tightness and learning convergence of the UNK kernel, providing comprehensive insights into this unified kernel. Experimental results underscore the effectiveness of our proposed method.

A Unified Kernel for Neural Network Learning

TL;DR

This work introduces the Unified Neural Kernel (UNK), a kernel induced by the inner product of produced variables and governed by gradient-descent dynamics with a multiplier

on the initial parameters, designed to bridge Neural Network Gaussian Processes (NNGP) and Neural Tangent Kernels (NTK). The UNK recovers the NTK in the

limits and converges to the NNGP as

when

, providing a unified view of neural-kernel learning. The authors establish the existence, limiting behavior, convergence, and uniform tightness of the UNK and present explicit examples (including an

-regularizer case and a

-based updating scheme) with supporting corollaries; they also validate the approach with MNIST-like experiments, showing improvements in pre-trained model fine-tuning. Overall, the work offers a cohesive theoretical framework for neural-kernel learning with practical implications for kernel-based inference and transfer learning.

Abstract

Paper Structure (27 sections, 10 theorems, 64 equations, 2 figures, 4 tables, 1 algorithm)

This paper contains 27 sections, 10 theorems, 64 equations, 2 figures, 4 tables, 1 algorithm.

Introduction
Preliminary
NNGP and NTK
Neural Network Gaussian Process (NNGP)
Neural Tangent Kernel (NTK)
Related Studies
The Unified Kernel
The Existence of UNK
Examples and Corollaries
Example related to $L_2$ regularizer
Example related to $t'$
Convergence and Uniform Tightness
Experiments
Convergence Effects of Various Multipliers $\lambda$
Correlation between Initialized and Optimized Parameters
...and 12 more sections

Key Result

Theorem 1

For a network of depth $L$ with a Lipschitz activation $\phi$ and in the limit of the layer width $n_1, \dots, n_{L-1} \to \infty$, Eq. eq:lamda induces a kernel with the following form, for $l\in[L]$ and $t\geq 0$, where $\rho_t$ is the correlation multipliers of variables along training epoch $t$, $\sigma_0^2$ and $\sigma_t^2$ denote the variable variances along training epoch 0 and $t$, respec

Figures (2)

Figure 1: The accuracy curves with various multipliers $\lambda \in \{0.001, 0.01, 0.1, 0, 1, 10\}$, where the x- and y-axes denote the epoch and accuracy, respectively. Training accuracy curves provided (a) Baseline-$\Theta_0$, (b) Baseline-$\Theta_{t'}$, and (c) Grid Search. Testing accuracy curves provided (e) Baseline-$\Theta_0$, (f) Baseline-$\Theta_{t'}$, and (g) Grid Search. Comparison (d) training and (h) testing accuracy curves between Baseline-$\Theta_0$, Grid-0.001, and Grid-0.01.
Figure 2: Histograms of training correlation of (a) Grid-0.001 and (c) Grid-0.01, testing correlation of (b) Grid-0.001 and (d) Grid-0.01, where x- and y-axes denote the number of instances and the corresponding correlation, respectively.

Theorems & Definitions (11)

Theorem 1
Corollary 1
Corollary 2
Theorem 2
Theorem 3
Lemma 1
Lemma 2
Lemma 3
Lemma 4
Lemma 5
...and 1 more

A Unified Kernel for Neural Network Learning

TL;DR

Abstract

A Unified Kernel for Neural Network Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (11)