Table of Contents
Fetching ...

Enhanced Feature Learning via Regularisation: Integrating Neural Networks and Kernel Methods

Bertille Follain, Francis Bach

TL;DR

This work introduces Brownian kernel neural networks (BKerNN), a fusion of kernel methods and mean-field neural networks that learns feature projections via projected Brownian kernels and optimises a Sobolev-based regularisation. It shows that BKerNN can be formulated as both a learned-kernel ridge regression and a neural-network-like infinite-width model, with practical particle-based optimisation and proximal steps. Theoretical results establish high-probability risk bounds under subgaussian data, with dimension-dependent and dimension-independent Gaussian complexity bounds, and explicit rates for risk convergence to the optimum. Empirical evaluations demonstrate BKerNN's robustness and competitive performance against kernel ridge regression and ReLU networks on synthetic and real data, particularly in feature-learning and high-dimensional regimes. The framework also explores a family of penalties to promote feature learning and sparsity, offering practical avenues for adaptivity in multi-index-like models.

Abstract

We propose a new method for feature learning and function estimation in supervised learning via regularised empirical risk minimisation. Our approach considers functions as expectations of Sobolev functions over all possible one-dimensional projections of the data. This framework is similar to kernel ridge regression, where the kernel is $\mathbb{E}_w ( k^{(B)}(w^\top x,w^\top x^\prime))$, with $k^{(B)}(a,b) := \min(|a|, |b|)\mathds{1}_{ab>0}$ the Brownian kernel, and the distribution of the projections $w$ is learnt. This can also be viewed as an infinite-width one-hidden layer neural network, optimising the first layer's weights through gradient descent and explicitly adjusting the non-linearity and weights of the second layer. We introduce a gradient-based computational method for the estimator, called Brownian Kernel Neural Network (BKerNN), using particles to approximate the expectation, where the positive homogeneity of the Brownian kernel \red{leads to improved robustness to local minima}. Using Rademacher complexity, we show that BKerNN's expected risk converges to the minimal risk with explicit high-probability rates of $O( \min((d/n)^{1/2}, n^{-1/6}))$ (up to logarithmic factors). Numerical experiments confirm our optimisation intuitions, and BKerNN outperforms kernel ridge regression, and favourably compares to a one-hidden layer neural network with ReLU activations in various settings and real data sets.

Enhanced Feature Learning via Regularisation: Integrating Neural Networks and Kernel Methods

TL;DR

This work introduces Brownian kernel neural networks (BKerNN), a fusion of kernel methods and mean-field neural networks that learns feature projections via projected Brownian kernels and optimises a Sobolev-based regularisation. It shows that BKerNN can be formulated as both a learned-kernel ridge regression and a neural-network-like infinite-width model, with practical particle-based optimisation and proximal steps. Theoretical results establish high-probability risk bounds under subgaussian data, with dimension-dependent and dimension-independent Gaussian complexity bounds, and explicit rates for risk convergence to the optimum. Empirical evaluations demonstrate BKerNN's robustness and competitive performance against kernel ridge regression and ReLU networks on synthetic and real data, particularly in feature-learning and high-dimensional regimes. The framework also explores a family of penalties to promote feature learning and sparsity, offering practical avenues for adaptivity in multi-index-like models.

Abstract

We propose a new method for feature learning and function estimation in supervised learning via regularised empirical risk minimisation. Our approach considers functions as expectations of Sobolev functions over all possible one-dimensional projections of the data. This framework is similar to kernel ridge regression, where the kernel is , with the Brownian kernel, and the distribution of the projections is learnt. This can also be viewed as an infinite-width one-hidden layer neural network, optimising the first layer's weights through gradient descent and explicitly adjusting the non-linearity and weights of the second layer. We introduce a gradient-based computational method for the estimator, called Brownian Kernel Neural Network (BKerNN), using particles to approximate the expectation, where the positive homogeneity of the Brownian kernel \red{leads to improved robustness to local minima}. Using Rademacher complexity, we show that BKerNN's expected risk converges to the minimal risk with explicit high-probability rates of (up to logarithmic factors). Numerical experiments confirm our optimisation intuitions, and BKerNN outperforms kernel ridge regression, and favourably compares to a one-hidden layer neural network with ReLU activations in various settings and real data sets.
Paper Structure (61 sections, 18 theorems, 130 equations, 5 figures, 1 algorithm)

This paper contains 61 sections, 18 theorems, 130 equations, 5 figures, 1 algorithm.

Key Result

Lemma 3

$\mathcal{F}_\infty$ is a vector space and $\max(f(0), \Omega_0(f))$ is a norm on $\mathcal{F}_\infty$. For $f \in \mathcal{F}_\infty$, the function $f$ is $1/2$-Hölder continuous with constant $\Omega_0(f)$, i.e., $|f(x) - f(x^\prime)| \leq \Omega_0(f) \sqrt{\|x - x^\prime\|^*}$.

Figures (5)

  • Figure 1: MSE across optimisation procedure for different kernels.
  • Figure 2: Influence of parameters: left: $m$, middle: $\lambda$, right: type of penalty.
  • Figure 3: Comparison to neural network on 1D examples.
  • Figure 4: Performance comparison across varying sample sizes and dimensions.
  • Figure 5: Comparison of $R^2$ scores on real data sets.

Theorems & Definitions (22)

  • Definition 1: Infinite-Width Function Space
  • Definition 2: Finite-Width Function Space
  • Lemma 3: Properties of Functions in $\mathcal{F}_\infty$
  • Lemma 4: Functions Spaces Included in $\mathcal{F}_\infty$
  • Lemma 5: Kernel Formulation of Finite-Width
  • Lemma 6: Kernel Formulation of Infinite-Width
  • Lemma 7: Optimisation for Fixed Particles
  • Lemma 8: Gradient of $G$
  • Lemma 9: Proximal Operators
  • Definition 10: Gaussian Complexity
  • ...and 12 more