Table of Contents
Fetching ...

Random ReLU Neural Networks as Non-Gaussian Processes

Rahul Parhi, Pakshal Bohra, Ayoub El Biari, Mehrsa Pourya, Michael Unser

TL;DR

This work models shallow ReLU networks with random, linearly activated components as generalized stochastic processes and proves they are well-defined, non-Gaussian objects with an explicit characteristic functional. The model uses a Poisson-type random measure for activation thresholds, yielding a width that is random and controlled by a rate $\lambda$, while the ReLU neural network is obtained via a whitening operator relating impulsive white noise to a CPwL process. The paper establishes isotropy and wide-sense self-similarity with $H=\tfrac{3}{2}$ and derives a simple closed-form autocovariance, with non-asymptotic properties tied to the Poisson width; asymptotic analysis reveals convergence to Gaussian limits when $\mathbf{P}_V$ is Gaussian and potentially to non-Gaussian limits for symmetric $\alpha$-stable laws, depending on scaling. These results both extend classical wide-network Gaussian limits and show that wide networks can converge to non-Gaussian processes, offering a new perspective on the stochastic structure of neural networks and suggesting directions for deeper architectures.

Abstract

We consider a large class of shallow neural networks with randomly initialized parameters and rectified linear unit activation functions. We prove that these random neural networks are well-defined non-Gaussian processes. As a by-product, we demonstrate that these networks are solutions to stochastic differential equations driven by impulsive white noise (combinations of random Dirac measures). These processes are parameterized by the law of the weights and biases as well as the density of activation thresholds in each bounded region of the input domain. We prove that these processes are isotropic and wide-sense self-similar with Hurst exponent 3/2. We also derive a remarkably simple closed-form expression for their autocovariance function. Our results are fundamentally different from prior work in that we consider a non-asymptotic viewpoint: The number of neurons in each bounded region of the input domain (i.e., the width) is itself a random variable with a Poisson law with mean proportional to the density parameter. Finally, we show that, under suitable hypotheses, as the expected width tends to infinity, these processes can converge in law not only to Gaussian processes, but also to non-Gaussian processes depending on the law of the weights. Our asymptotic results provide a new take on several classical results (wide networks converge to Gaussian processes) as well as some new ones (wide networks can converge to non-Gaussian processes).

Random ReLU Neural Networks as Non-Gaussian Processes

TL;DR

This work models shallow ReLU networks with random, linearly activated components as generalized stochastic processes and proves they are well-defined, non-Gaussian objects with an explicit characteristic functional. The model uses a Poisson-type random measure for activation thresholds, yielding a width that is random and controlled by a rate , while the ReLU neural network is obtained via a whitening operator relating impulsive white noise to a CPwL process. The paper establishes isotropy and wide-sense self-similarity with and derives a simple closed-form autocovariance, with non-asymptotic properties tied to the Poisson width; asymptotic analysis reveals convergence to Gaussian limits when is Gaussian and potentially to non-Gaussian limits for symmetric -stable laws, depending on scaling. These results both extend classical wide-network Gaussian limits and show that wide networks can converge to non-Gaussian processes, offering a new perspective on the stochastic structure of neural networks and suggesting directions for deeper architectures.

Abstract

We consider a large class of shallow neural networks with randomly initialized parameters and rectified linear unit activation functions. We prove that these random neural networks are well-defined non-Gaussian processes. As a by-product, we demonstrate that these networks are solutions to stochastic differential equations driven by impulsive white noise (combinations of random Dirac measures). These processes are parameterized by the law of the weights and biases as well as the density of activation thresholds in each bounded region of the input domain. We prove that these processes are isotropic and wide-sense self-similar with Hurst exponent 3/2. We also derive a remarkably simple closed-form expression for their autocovariance function. Our results are fundamentally different from prior work in that we consider a non-asymptotic viewpoint: The number of neurons in each bounded region of the input domain (i.e., the width) is itself a random variable with a Poisson law with mean proportional to the density parameter. Finally, we show that, under suitable hypotheses, as the expected width tends to infinity, these processes can converge in law not only to Gaussian processes, but also to non-Gaussian processes depending on the law of the weights. Our asymptotic results provide a new take on several classical results (wide networks converge to Gaussian processes) as well as some new ones (wide networks can converge to non-Gaussian processes).
Paper Structure (18 sections, 8 theorems, 74 equations, 4 figures)

This paper contains 18 sections, 8 theorems, 74 equations, 4 figures.

Key Result

Proposition 4

[proposition]prop:Radon The operator $\mathop{\mathrm{\mathscr{R}}}\nolimits$ continuously maps $\mathcal{S}(\mathbb{R}^d)$ into $\mathcal{S}({\mathbb{S}^{d-1} \times \mathbb{R}})$. Moreover, on $\mathcal{S}(\mathbb{R}^d)$. The underlying operatorsNon-integer powers of $(-\Delta)$ and $(-\partial_t^2)$ are understood in the Fourier domain. are the Laplacian $\Delta = \sum_{n=1}^d \partial_{x_n}^2

Figures (4)

  • Figure 1: $\mathbf{P}_V$ is Gaussian.
  • Figure 2: $\mathbf{P}_V$ is symmetric ($\alpha = 1.25$)-stable.
  • Figure 3: $\mathbf{P}_V$ is Gaussian.
  • Figure 4: $\mathbf{P}_V$ is symmetric ($\alpha = 1.25$)-stable.

Theorems & Definitions (13)

  • Definition 1
  • Definition 2
  • Definition 3
  • Proposition 4: LudwigRadonGelfandIntegralGeometryHelgasonIntegralGeometry
  • Definition 5
  • Proposition 6
  • Proposition 7
  • Remark 8
  • Theorem 9
  • Theorem 10
  • ...and 3 more