Function Gradient Approximation with Random Shallow ReLU Networks with Control Applications
Andrew Lamperski, Siddharth Salapaka
TL;DR
The paper addresses rigorous guarantees for simultaneous function and gradient approximation using shallow ReLU networks with randomly generated input parameters in control settings. It proves high-probability bounds showing the function error scales as $O\left((1/m)^{1/2}\right)$ and the gradient error scales as $O\left(\left(\dfrac{\log m}{m}\right)^{1/2}\right)$ as the neuron count $m$ grows, under a Fourier-smoothness assumption and appropriate sampling. The approach relies on an integral representation and an importance-sampling estimator, with analysis based on functional Hoeffding bounds and Rademacher complexity/VC-dimension arguments, and it demonstrates applicability to policy evaluation problems in continuous time. Numerical experiments illustrate the bounds in a simple scalar setting, confirming the predicted trends but highlighting that constants are currently conservative. Overall, the work extends prior function-approximation results to gradient-aware guarantees, providing a principled pathway for control-theoretic applications while signaling room for improvement in constants and scalability.
Abstract
Neural networks are widely used to approximate unknown functions in control. A common neural network architecture uses a single hidden layer (i.e. a shallow network), in which the input parameters are fixed in advance and only the output parameters are trained. The typical formal analysis asserts that if output parameters exist to approximate the unknown function with sufficient accuracy, then desired control performance can be achieved. A long-standing theoretical gap was that no conditions existed to guarantee that, for the fixed input parameters, required accuracy could be obtained by training the output parameters. Our recent work has partially closed this gap by demonstrating that if input parameters are chosen randomly, then for any sufficiently smooth function, with high-probability there are output parameters resulting in $O((1/m)^{1/2})$ approximation errors, where $m$ is the number of neurons. However, some applications, notably continuous-time value function approximation, require that the network approximates the both the unknown function and its gradient with sufficient accuracy. In this paper, we show that randomly generated input parameters and trained output parameters result in gradient errors of $O((\log(m)/m)^{1/2})$, and additionally, improve the constants from our prior work. We show how to apply the result to policy evaluation problems.
