Table of Contents
Fetching ...

Fredholm integral equations for function approximation and the training of neural networks

Patrick Gelß, Aizhan Issagali, Ralf Kornhuber

TL;DR

This paper introduces a Fredholm-integral equation framework for function approximation and training of large, high-dimensional shallow neural networks, recasting the discrete, nonlinear training problem into a linear continuous problem via Ritz–Galerkin discretization and Tikhonov regularization. By interpreting networks as mean-field like kernels and employing functional tensor networks, the authors solve high-dimensional linear systems efficiently with tensor-train methods, and then sample discrete network parameters to form predictive models. The approach yields competitive results on bank note authentication, concrete strength prediction, and MNIST classification, demonstrating practical viability without bespoke feature engineering. The work opens avenues for deep Fredholm networks and further enhancements in tensor factorization, sampling strategies, and mean-field analyses, offering a scalable alternative to conventional gradient-based training in certain regimes.

Abstract

We present a novel and mathematically transparent approach to function approximation and the training of large, high-dimensional neural networks, based on the approximate least-squares solution of associated Fredholm integral equations of the first kind by Ritz-Galerkin discretization, Tikhonov regularization and tensor-train methods. Practical application to supervised learning problems of regression and classification type confirm that the resulting algorithms are competitive with state-of-the-art neural network-based methods.

Fredholm integral equations for function approximation and the training of neural networks

TL;DR

This paper introduces a Fredholm-integral equation framework for function approximation and training of large, high-dimensional shallow neural networks, recasting the discrete, nonlinear training problem into a linear continuous problem via Ritz–Galerkin discretization and Tikhonov regularization. By interpreting networks as mean-field like kernels and employing functional tensor networks, the authors solve high-dimensional linear systems efficiently with tensor-train methods, and then sample discrete network parameters to form predictive models. The approach yields competitive results on bank note authentication, concrete strength prediction, and MNIST classification, demonstrating practical viability without bespoke feature engineering. The work opens avenues for deep Fredholm networks and further enhancements in tensor factorization, sampling strategies, and mean-field analyses, offering a scalable alternative to conventional gradient-based training in certain regimes.

Abstract

We present a novel and mathematically transparent approach to function approximation and the training of large, high-dimensional neural networks, based on the approximate least-squares solution of associated Fredholm integral equations of the first kind by Ritz-Galerkin discretization, Tikhonov regularization and tensor-train methods. Practical application to supervised learning problems of regression and classification type confirm that the resulting algorithms are competitive with state-of-the-art neural network-based methods.
Paper Structure (28 sections, 122 equations, 4 figures, 2 tables)

This paper contains 28 sections, 122 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Graphical notation of tensor formats: (a) A tensor in rank-one format, given by the tensor product of $p$ vectors. (b) A tensor in the canonical format, given by the contraction of $p$ matrices on the common rank index. (c) A tensor in TT format, given by a network of pairwise coupled tensors. Here, the first and the last TT core are regarded as matrices, because $r_0 = r_p = 1$. (d) A tensor in FTT format, given by a TT-like network of tensors with one continuous mode. Discrete modes are represented by straight lines and continuous modes by zigzag lines.
  • Figure 2: Graphical notation of the functional decomposition of $\mathbf{\Psi}$ corresponding to the kernel $\psi(x, \eta)$ and data points $(x_j)_{j=1}^M$: (a) Using the canonical format for $\hat{\mathbf{\Psi}}$ with cores coupled by a common rank. (b) Using the FTT format with a chain-like coupling. black circles represent the cores of $\hat{\mathbf{\Psi}}$, each having one continuous mode represented by a zigzag line. Contraction with $\Delta$ (green circle) merges all free modes of $\hat{\mathbf{\Psi}}$ into one.
  • Figure 3: Graphical notation of tensor-based counterpart of the (underdetermined) system $A U = b$: The cores of $\mathbf{\Psi}$ (black circles) are contracted with the cores of $\mathbf{\Phi}$ (orange circles) by integrating over common modes. Together with the delta tensor (green circle), this system builds the tensor operator $\mathbf{A}$. The coefficient tensor $\mathbf{U}$ (white circles) is approximated in the TT format. Here, $\hat{\mathbf{\Psi}}$ is given in canonical format but could also be defined as functional tensor train, see Section \ref{['sec: tensor factorization']}.
  • Figure 4: Results for the MNIST data set: Classification rates obtained from the Fredholm network over the number $M$ of training images in comparison with Keras networks with increasing number $N_K$ of nodes in the hidden layer.

Theorems & Definitions (4)

  • proof
  • proof
  • proof
  • proof