Equidistribution-based training of Free Knot Splines and ReLU Neural Networks

Simone Appella; Simon Arridge; Chris Budd; Teo Deveney; Lisa Maria Kreusser

Equidistribution-based training of Free Knot Splines and ReLU Neural Networks

Simone Appella, Simon Arridge, Chris Budd, Teo Deveney, Lisa Maria Kreusser

TL;DR

This work shows that univariate shallow ReLU networks, though theoretically expressive, suffer from severe ill-conditioning under standard $L_2$ training as width grows. By leveraging the formal equivalence between shallow ReLU NNs and Free Knot Splines (FKS), the authors develop a two-level, equidistribution-based training procedure: first optimize knot/breakpoint locations using an equidistribution loss, then solve a well-conditioned linear (or nearly linear) problem for the weights. The key contributions include a rigorous conditioning analysis, an equidistribution-based loss, and effective preconditioning that turns the ReLU training problem into a stable FKS-like problem, achieving near-optimal approximations across a suite of target functions and extending insights to deeper networks. These results offer practical strategies for reliable, fast training of shallow NNs and have implications for PINNs and neural operators in univariate settings, with potential extension to higher dimensions and activations.

Abstract

We consider the problem of univariate nonlinear function approximation using shallow neural networks (NN) with a rectified linear unit (ReLU) activation function. We show that the $L_2$ based approximation problem is ill-conditioned and the behaviour of optimisation algorithms used in training these networks degrades rapidly as the width of the network increases. This can lead to significantly poorer approximation in practice than expected from the theoretical expressivity of the ReLU architecture and traditional methods such as univariate Free Knot Splines (FKS). Univariate shallow ReLU NNs and FKS span the same function space, and thus have the same theoretical expressivity. However, the FKS representation remains well-conditioned as the number of knots increases. We leverage the theory of optimal piecewise linear interpolants to improve the training procedure for ReLU NNs. Using the equidistribution principle, we propose a two-level procedure for training the FKS by first solving the nonlinear problem of finding the optimal knot locations of the interpolating FKS, and then determine the optimal weights and knots of the FKS by solving a nearly linear, well-conditioned problem. The training of the FKS gives insights into how we can train a ReLU NN effectively, with an equally accurate approximation. We combine the training of the ReLU NN with an equidistribution-based loss to find the breakpoints of the ReLU functions. This is then combined with preconditioning the ReLU NN approximation to find the scalings of the ReLU functions. This fast, well-conditioned and reliable method finds an accurate shallow ReLU NN approximation to a univariate target function. We test this method on a series of regular, singular, and rapidly varying target functions and obtain good results, realising the expressivity of the shallow ReLU network in all cases. We then extend our results to deeper networks.

Equidistribution-based training of Free Knot Splines and ReLU Neural Networks

TL;DR

This work shows that univariate shallow ReLU networks, though theoretically expressive, suffer from severe ill-conditioning under standard

training as width grows. By leveraging the formal equivalence between shallow ReLU NNs and Free Knot Splines (FKS), the authors develop a two-level, equidistribution-based training procedure: first optimize knot/breakpoint locations using an equidistribution loss, then solve a well-conditioned linear (or nearly linear) problem for the weights. The key contributions include a rigorous conditioning analysis, an equidistribution-based loss, and effective preconditioning that turns the ReLU training problem into a stable FKS-like problem, achieving near-optimal approximations across a suite of target functions and extending insights to deeper networks. These results offer practical strategies for reliable, fast training of shallow NNs and have implications for PINNs and neural operators in univariate settings, with potential extension to higher dimensions and activations.

Abstract

We consider the problem of univariate nonlinear function approximation using shallow neural networks (NN) with a rectified linear unit (ReLU) activation function. We show that the

based approximation problem is ill-conditioned and the behaviour of optimisation algorithms used in training these networks degrades rapidly as the width of the network increases. This can lead to significantly poorer approximation in practice than expected from the theoretical expressivity of the ReLU architecture and traditional methods such as univariate Free Knot Splines (FKS). Univariate shallow ReLU NNs and FKS span the same function space, and thus have the same theoretical expressivity. However, the FKS representation remains well-conditioned as the number of knots increases. We leverage the theory of optimal piecewise linear interpolants to improve the training procedure for ReLU NNs. Using the equidistribution principle, we propose a two-level procedure for training the FKS by first solving the nonlinear problem of finding the optimal knot locations of the interpolating FKS, and then determine the optimal weights and knots of the FKS by solving a nearly linear, well-conditioned problem. The training of the FKS gives insights into how we can train a ReLU NN effectively, with an equally accurate approximation. We combine the training of the ReLU NN with an equidistribution-based loss to find the breakpoints of the ReLU functions. This is then combined with preconditioning the ReLU NN approximation to find the scalings of the ReLU functions. This fast, well-conditioned and reliable method finds an accurate shallow ReLU NN approximation to a univariate target function. We test this method on a series of regular, singular, and rapidly varying target functions and obtain good results, realising the expressivity of the shallow ReLU network in all cases. We then extend our results to deeper networks.

Paper Structure (37 sections, 55 equations, 7 figures, 2 tables)

This paper contains 37 sections, 55 equations, 7 figures, 2 tables.

Introduction
Overview
Classical approximation theory
Free Knot Splines (FKS)
Combining NN and classical adaptive approximation
Contributions
Outline
ReLU NN and free knot splines (FKS) as nonlinear function approximators
ReLU Neural Networks ( ReLU NN)
Linear spline approximations
Relations between the ReLU NN and the FKS representations
Equivalence of shallow ReLU NN and the FKS representations
Weights for the IFKS and equivalent ReLU NN
Training the ReLU NN and the FKS
Finding the optimal knots of the interpolating FKS using equidistributon based loss functions
...and 22 more sections

Figures (7)

Figure 1: (Left) A comparison of the convergence of the approximation of the target function $u(x) = \tanh(100(x-1/4))$ with various architectures. This shows (green) the poor performance of the ReLU NN trained using the regular $L_2^2$ loss function, the better, but still poor, performance of the regular spline with the same loss function (green dashed). We see the still poor performance of the ReLU NN and the FKS trained using the standard $L_2^2$ loss function (blue,blue dashed) We also see the far better performance of the FKS trained with a two level approach (red dashed) but the much poorer performance of the ReLU NN using the two-level approach without pre-conditioning (red). A preconditioned ReLU NN architecture trained in the two-level approach shows similar optimal performance to the FKS. (Right) A comparison of the training times and convergence of the weights of the FKS and a shallow ReLU NN for the target function $u(x) = \tanh(100(x-1/4))$ with $N=64$, following the calculation of the knots of the interpolating FKS. The FKS trains very rapidly, whereas the ReLU NN trains slowly due to ill-conditioning.
Figure 2: The condition number of the normal equations for the ReLU NN network of width $N$, showing that it is ${\cal O}(N^4).$
Figure 3: Comparison of (top) Knot evolution with standard Gaussian initialisation and (bottom) the trained approximation with the knot points indicated for different target functions.
Figure 4: Comparison of (top) Knot evolution, with initialisation in $[0,1]$ and (bottom) the trained approximation with the knot points indicated for different target functions.
Figure 5: Comparison of the $L_2^2$ loss for linear spline approximations of four different target functions with different numbers of knots: blue - PWL interpolant on a uniform mesh, orange- optimal IFKS, green - optimal FKS
...and 2 more figures

Theorems & Definitions (4)

Definition 1: Free knot splines (FKS)
Definition 2: Fixed knot piecewise linear interpolant (PWL)
Definition 3: Interpolating free knot splines (FKS)
Example 1

Equidistribution-based training of Free Knot Splines and ReLU Neural Networks

TL;DR

Abstract

Equidistribution-based training of Free Knot Splines and ReLU Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (7)

Theorems & Definitions (4)