Equidistribution-based training of Free Knot Splines and ReLU Neural Networks
Simone Appella, Simon Arridge, Chris Budd, Teo Deveney, Lisa Maria Kreusser
TL;DR
This work shows that univariate shallow ReLU networks, though theoretically expressive, suffer from severe ill-conditioning under standard $L_2$ training as width grows. By leveraging the formal equivalence between shallow ReLU NNs and Free Knot Splines (FKS), the authors develop a two-level, equidistribution-based training procedure: first optimize knot/breakpoint locations using an equidistribution loss, then solve a well-conditioned linear (or nearly linear) problem for the weights. The key contributions include a rigorous conditioning analysis, an equidistribution-based loss, and effective preconditioning that turns the ReLU training problem into a stable FKS-like problem, achieving near-optimal approximations across a suite of target functions and extending insights to deeper networks. These results offer practical strategies for reliable, fast training of shallow NNs and have implications for PINNs and neural operators in univariate settings, with potential extension to higher dimensions and activations.
Abstract
We consider the problem of univariate nonlinear function approximation using shallow neural networks (NN) with a rectified linear unit (ReLU) activation function. We show that the $L_2$ based approximation problem is ill-conditioned and the behaviour of optimisation algorithms used in training these networks degrades rapidly as the width of the network increases. This can lead to significantly poorer approximation in practice than expected from the theoretical expressivity of the ReLU architecture and traditional methods such as univariate Free Knot Splines (FKS). Univariate shallow ReLU NNs and FKS span the same function space, and thus have the same theoretical expressivity. However, the FKS representation remains well-conditioned as the number of knots increases. We leverage the theory of optimal piecewise linear interpolants to improve the training procedure for ReLU NNs. Using the equidistribution principle, we propose a two-level procedure for training the FKS by first solving the nonlinear problem of finding the optimal knot locations of the interpolating FKS, and then determine the optimal weights and knots of the FKS by solving a nearly linear, well-conditioned problem. The training of the FKS gives insights into how we can train a ReLU NN effectively, with an equally accurate approximation. We combine the training of the ReLU NN with an equidistribution-based loss to find the breakpoints of the ReLU functions. This is then combined with preconditioning the ReLU NN approximation to find the scalings of the ReLU functions. This fast, well-conditioned and reliable method finds an accurate shallow ReLU NN approximation to a univariate target function. We test this method on a series of regular, singular, and rapidly varying target functions and obtain good results, realising the expressivity of the shallow ReLU network in all cases. We then extend our results to deeper networks.
