Table of Contents
Fetching ...

Neural Hilbert Ladders: Multi-Layer Neural Networks in Function Space

Zhengdao Chen

TL;DR

This work introduces the Neural Hilbert Ladder (NHL), a hierarchical, width-unbounded framework that represents multi-layer neural networks as a sequence of nested RKHSs, yielding an infinite union of function spaces \mathcal{F}^{(L)}. It establishes static correspondences between L-layer NNs and NHLs, derives generalization guarantees via Rademacher complexity, and demonstrates depth-dependent capacity through depth separation under ReLU. In the mean-field limit, training dynamics become a non-Markovian functional gradient flow that evolves the NHL kernels, capturing feature learning beyond lazy training. The paper also provides two numerical experiments illustrating feature learning and kernel alignment, and it situates NHL within a broad landscape of kernel-based and mean-field analyses while outlining avenues for future work and relaxation of assumptions. Overall, NHL offers a rigorous, depth-aware, function-space view of deep networks with quantitative bounds on approximation and generalization, highlighting the role of depth in shaping representational capacity.

Abstract

To characterize the function space explored by neural networks (NNs) is an important aspect of learning theory. In this work, noticing that a multi-layer NN generates implicitly a hierarchy of reproducing kernel Hilbert spaces (RKHSs) - named a neural Hilbert ladder (NHL) - we define the function space as an infinite union of RKHSs, which generalizes the existing Barron space theory of two-layer NNs. We then establish several theoretical properties of the new space. First, we prove a correspondence between functions expressed by L-layer NNs and those belonging to L-level NHLs. Second, we prove generalization guarantees for learning an NHL with a controlled complexity measure. Third, we derive a non-Markovian dynamics of random fields that governs the evolution of the NHL which is induced by the training of multi-layer NNs in an infinite-width mean-field limit. Fourth, we show examples of depth separation in NHLs under the ReLU activation function. Finally, we perform numerical experiments to illustrate the feature learning aspect of NN training through the lens of NHLs.

Neural Hilbert Ladders: Multi-Layer Neural Networks in Function Space

TL;DR

This work introduces the Neural Hilbert Ladder (NHL), a hierarchical, width-unbounded framework that represents multi-layer neural networks as a sequence of nested RKHSs, yielding an infinite union of function spaces \mathcal{F}^{(L)}. It establishes static correspondences between L-layer NNs and NHLs, derives generalization guarantees via Rademacher complexity, and demonstrates depth-dependent capacity through depth separation under ReLU. In the mean-field limit, training dynamics become a non-Markovian functional gradient flow that evolves the NHL kernels, capturing feature learning beyond lazy training. The paper also provides two numerical experiments illustrating feature learning and kernel alignment, and it situates NHL within a broad landscape of kernel-based and mean-field analyses while outlining avenues for future work and relaxation of assumptions. Overall, NHL offers a rigorous, depth-aware, function-space view of deep networks with quantitative bounds on approximation and generalization, highlighting the role of depth in shaping representational capacity.

Abstract

To characterize the function space explored by neural networks (NNs) is an important aspect of learning theory. In this work, noticing that a multi-layer NN generates implicitly a hierarchy of reproducing kernel Hilbert spaces (RKHSs) - named a neural Hilbert ladder (NHL) - we define the function space as an infinite union of RKHSs, which generalizes the existing Barron space theory of two-layer NNs. We then establish several theoretical properties of the new space. First, we prove a correspondence between functions expressed by L-layer NNs and those belonging to L-level NHLs. Second, we prove generalization guarantees for learning an NHL with a controlled complexity measure. Third, we derive a non-Markovian dynamics of random fields that governs the evolution of the NHL which is induced by the training of multi-layer NNs in an infinite-width mean-field limit. Fourth, we show examples of depth separation in NHLs under the ReLU activation function. Finally, we perform numerical experiments to illustrate the feature learning aspect of NN training through the lens of NHLs.
Paper Structure (61 sections, 24 theorems, 127 equations, 3 figures)

This paper contains 61 sections, 24 theorems, 127 equations, 3 figures.

Key Result

Lemma 1

There exists a unique Hilbert space, $\mathcal{H}$, consisting of functions on $\mathcal{X}$ and equipped with the inner product $\langle \cdot , \cdot \rangle_{\mathcal{H}}$, which satisfies the following properties:

Figures (3)

  • Figure 1: Illustration of an NHL, as defined in Definition \ref{['def:nhl']}. Each $\mathcal{H}^{(l)}$ is an RKHS; each $\mu^{(l)}$ is a probability measure on $\mathcal{H}^{(l)}$; each kernel function $\kappa^{(l)}$ is defined by $\mu^{(l)}$ through \ref{['eq:kappa_mu']} and, in turn, defines $\mathcal{H}^{(l+1)}$ as its RKHS.
  • Figure 2: Learning trajectories of linear $3$-layer NN versus the NHL dynamics. Solid: $3$-layer linear NNs trained by GD with width $64$ and $8192$. Dashed: numerical integration of the NHL dynamics derived in Section \ref{['sec:linear']}. Dot-dashed: linear regression (LR) under population loss.
  • Figure 3: Results of GD training of $3$-layer NN with ReLU activation. (a): Target versus learned function. (b): Pre-activation values across neurons in the second hidden layer on two training data points, $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$, before and after training. (c): The kernel function of the second hidden layer, $\kappa^{(2)}_{t}(\boldsymbol{x}, \boldsymbol{x}')$, after training (red means a higher value). (d): Training and test errors and the CKA scores of $\kappa^{(1)}_{t}$ and $\kappa^{(2)}_{t}$ with respect to the target function over time, averaged over $10$ runs.

Theorems & Definitions (31)

  • Lemma 1: Moore-Aronszajn
  • Definition 2
  • Theorem 3
  • Remark 4
  • Proposition 5
  • Lemma 6
  • Proposition 7
  • Theorem 8
  • Theorem 9
  • Theorem 10
  • ...and 21 more