Table of Contents
Fetching ...

Depth-induced NTK: Bridging Over-parameterized Neural Networks and Deep Neural Kernels

Yong-Ming Tian, Shuang Liang, Shao-Qun Zhang, Feng-Lei Fan

TL;DR

This work addresses the gap in understanding depth within neural kernel theory by introducing a depth-induced NTK, NTK_(d), derived from a shortcut-related architecture and provably convergent to a Gaussian process as depth and shortcut count grow. It establishes existence, spectral bounds, and training invariance for NTK_(d), showing the kernel remains well-conditioned and interpretable during training. Empirical results on sine regression and image datasets demonstrate that NTK_(d) achieves competitive performance with the traditional width-based NTK while offering enhanced stability and a clearer link between depth and representation learning. The study advances neural kernel theory by elucidating depth-focused scaling laws and opens pathways to analyze deep networks beyond the infinite-width paradigm.

Abstract

While deep learning has achieved remarkable success across a wide range of applications, its theoretical understanding of representation learning remains limited. Deep neural kernels provide a principled framework to interpret over-parameterized neural networks by mapping hierarchical feature transformations into kernel spaces, thereby combining the expressive power of deep architectures with the analytical tractability of kernel methods. Recent advances, particularly neural tangent kernels (NTKs) derived by gradient inner products, have established connections between infinitely wide neural networks and nonparametric Bayesian inference. However, the existing NTK paradigm has been predominantly confined to the infinite-width regime, while overlooking the representational role of network depth. To address this gap, we propose a depth-induced NTK kernel based on a shortcut-related architecture, which converges to a Gaussian process as the network depth approaches infinity. We theoretically analyze the training invariance and spectrum properties of the proposed kernel, which stabilizes the kernel dynamics and mitigates degeneration. Experimental results further underscore the effectiveness of our proposed method. Our findings significantly extend the existing landscape of the neural kernel theory and provide an in-depth understanding of deep learning and the scaling law.

Depth-induced NTK: Bridging Over-parameterized Neural Networks and Deep Neural Kernels

TL;DR

This work addresses the gap in understanding depth within neural kernel theory by introducing a depth-induced NTK, NTK_(d), derived from a shortcut-related architecture and provably convergent to a Gaussian process as depth and shortcut count grow. It establishes existence, spectral bounds, and training invariance for NTK_(d), showing the kernel remains well-conditioned and interpretable during training. Empirical results on sine regression and image datasets demonstrate that NTK_(d) achieves competitive performance with the traditional width-based NTK while offering enhanced stability and a clearer link between depth and representation learning. The study advances neural kernel theory by elucidating depth-focused scaling laws and opens pathways to analyze deep networks beyond the infinite-width paradigm.

Abstract

While deep learning has achieved remarkable success across a wide range of applications, its theoretical understanding of representation learning remains limited. Deep neural kernels provide a principled framework to interpret over-parameterized neural networks by mapping hierarchical feature transformations into kernel spaces, thereby combining the expressive power of deep architectures with the analytical tractability of kernel methods. Recent advances, particularly neural tangent kernels (NTKs) derived by gradient inner products, have established connections between infinitely wide neural networks and nonparametric Bayesian inference. However, the existing NTK paradigm has been predominantly confined to the infinite-width regime, while overlooking the representational role of network depth. To address this gap, we propose a depth-induced NTK kernel based on a shortcut-related architecture, which converges to a Gaussian process as the network depth approaches infinity. We theoretically analyze the training invariance and spectrum properties of the proposed kernel, which stabilizes the kernel dynamics and mitigates degeneration. Experimental results further underscore the effectiveness of our proposed method. Our findings significantly extend the existing landscape of the neural kernel theory and provide an in-depth understanding of deep learning and the scaling law.

Paper Structure

This paper contains 25 sections, 12 theorems, 81 equations, 10 figures, 2 tables.

Key Result

Theorem 1

(Existence of Depth-induced NTK) Provided the shortcut-related network defined by Eqs. eq:forward and eq:shortcut, if the following conditions hold there derives an NTK kernel that converges to a Gaussian distribution in the limit of the number of shortcut connections as well as the network depth going to infinity, that is, $K\rightarrow +\infty$ as well as $L\rightarrow +\infty$. Formally, for a

Figures (10)

  • Figure 1: The transition of research focus from width to depth in both neural networks and neural kernels.
  • Figure 2: Illustration of the shortcut-related network architecture that derives an NTK by increasing depth.
  • Figure 3: The schematic diagram of all the theoretical results in this paper.
  • Figure 4: Fitting curves of kernel regression with NTK$_{(w)}$ and NTK$_{(d)}$ on sine function.
  • Figure 5: The error bar plots of the regression accuracy of NTK$_{(w)}$ and NTK$_{(d)}$ on MNIST, Fashion-MNIST, and CIFAR-10 data sets.
  • ...and 5 more figures

Theorems & Definitions (14)

  • Definition 1
  • Definition 2
  • Theorem 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Lemma 4
  • ...and 4 more