Function Gradient Approximation with Random Shallow ReLU Networks with Control Applications

Andrew Lamperski; Siddharth Salapaka

Function Gradient Approximation with Random Shallow ReLU Networks with Control Applications

Andrew Lamperski, Siddharth Salapaka

TL;DR

The paper addresses rigorous guarantees for simultaneous function and gradient approximation using shallow ReLU networks with randomly generated input parameters in control settings. It proves high-probability bounds showing the function error scales as $O\left((1/m)^{1/2}\right)$ and the gradient error scales as $O\left(\left(\dfrac{\log m}{m}\right)^{1/2}\right)$ as the neuron count $m$ grows, under a Fourier-smoothness assumption and appropriate sampling. The approach relies on an integral representation and an importance-sampling estimator, with analysis based on functional Hoeffding bounds and Rademacher complexity/VC-dimension arguments, and it demonstrates applicability to policy evaluation problems in continuous time. Numerical experiments illustrate the bounds in a simple scalar setting, confirming the predicted trends but highlighting that constants are currently conservative. Overall, the work extends prior function-approximation results to gradient-aware guarantees, providing a principled pathway for control-theoretic applications while signaling room for improvement in constants and scalability.

Abstract

Neural networks are widely used to approximate unknown functions in control. A common neural network architecture uses a single hidden layer (i.e. a shallow network), in which the input parameters are fixed in advance and only the output parameters are trained. The typical formal analysis asserts that if output parameters exist to approximate the unknown function with sufficient accuracy, then desired control performance can be achieved. A long-standing theoretical gap was that no conditions existed to guarantee that, for the fixed input parameters, required accuracy could be obtained by training the output parameters. Our recent work has partially closed this gap by demonstrating that if input parameters are chosen randomly, then for any sufficiently smooth function, with high-probability there are output parameters resulting in $O((1/m)^{1/2})$ approximation errors, where $m$ is the number of neurons. However, some applications, notably continuous-time value function approximation, require that the network approximates the both the unknown function and its gradient with sufficient accuracy. In this paper, we show that randomly generated input parameters and trained output parameters result in gradient errors of $O((\log(m)/m)^{1/2})$, and additionally, improve the constants from our prior work. We show how to apply the result to policy evaluation problems.

Function Gradient Approximation with Random Shallow ReLU Networks with Control Applications

TL;DR

and the gradient error scales as

as the neuron count

grows, under a Fourier-smoothness assumption and appropriate sampling. The approach relies on an integral representation and an importance-sampling estimator, with analysis based on functional Hoeffding bounds and Rademacher complexity/VC-dimension arguments, and it demonstrates applicability to policy evaluation problems in continuous time. Numerical experiments illustrate the bounds in a simple scalar setting, confirming the predicted trends but highlighting that constants are currently conservative. Overall, the work extends prior function-approximation results to gradient-aware guarantees, providing a principled pathway for control-theoretic applications while signaling room for improvement in constants and scalability.

Abstract

approximation errors, where

is the number of neurons. However, some applications, notably continuous-time value function approximation, require that the network approximates the both the unknown function and its gradient with sufficient accuracy. In this paper, we show that randomly generated input parameters and trained output parameters result in gradient errors of

, and additionally, improve the constants from our prior work. We show how to apply the result to policy evaluation problems.

Paper Structure (10 sections, 4 theorems, 50 equations, 3 figures)

This paper contains 10 sections, 4 theorems, 50 equations, 3 figures.

Introduction
Function and Gradient Approximation
Background
A Function and Gradient Approximation Result
Technical Lemmas
Proof of Theorem \ref{['thm:main']}
Application to Policy Evaluation
General Theory
Numerical Example
Conclusion

Key Result

Theorem 1

Let Assumptions as:smoothness and as:positive hold. Let $(\boldsymbol{\alpha}_1,\mathbf{t}_1),\ldots,(\boldsymbol{\alpha}_m,\mathbf{t}_m)$ be a collection of independent, identically distributed samples from $P$. There is a vector $a\in\mathbb{R}^n$ and a number $b\in \mathbb{R}$ such that for all $ satisfies both of the following inequalities simultaneously with probability at least $1-\delta$.

Figures (3)

Figure 1: $V_\phi$ and its modified version.
Figure 2: Function Approximation Error
Figure 3: Gradient Approximation Error

Theorems & Definitions (7)

Theorem 1
Lemma 1
proof
Lemma 2
proof
Lemma 3
proof

Function Gradient Approximation with Random Shallow ReLU Networks with Control Applications

TL;DR

Abstract

Function Gradient Approximation with Random Shallow ReLU Networks with Control Applications

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (7)