Distributionally robust approximation property of neural networks

Mihriban Ceylan; David J. Prömel

Distributionally robust approximation property of neural networks

Mihriban Ceylan, David J. Prömel

TL;DR

The paper addresses distributional uncertainty in neural network function approximation by proving universal approximation theorems in Orlicz spaces and a distributionally robust universal approximation theorem over weakly compact families of finite Borel measures. It shows that networks are dense in the Orlicz heart $M^\varphi(\mu)$ for architectures including bounded-activation nets, ReLU nets, non-polynomial activations, and functional-input nets, with density measured via the gauge norm $N_{\varphi,\mu}$. A robust UAT is proved: for a weakly compact set $\mathcal{M}$ with associated Young pair $(\varphi_{\mathcal{M}},\psi_{\mathcal{M}})$ and target $f$, there exist networks $\eta$ such that $\sup_{\nu\in\mathcal{M}} \|f-\eta\|_{L^1(\nu)}<\varepsilon$ under various activation regimes. By linking Orlicz-space universality to distributional robustness, the work extends classical $L^p$ results and provides a theoretical basis for distributionally robust learning and analysis in settings with ambiguity about data distributions. These results broaden the applicability of neural networks under distributional shifts and non-standard growth conditions, offering a rigorous basis for robust approximation in learning and related PDE contexts.

Abstract

The universal approximation property uniformly with respect to weakly compact families of measures is established for several classes of neural networks. To that end, we prove that these neural networks are dense in Orlicz spaces, thereby extending classical universal approximation theorems even beyond the traditional $L^p$-setting. The covered classes of neural networks include widely used architectures like feedforward neural networks with non-polynomial activation functions, deep narrow networks with ReLU activation functions and functional input neural networks.

Distributionally robust approximation property of neural networks

TL;DR

for architectures including bounded-activation nets, ReLU nets, non-polynomial activations, and functional-input nets, with density measured via the gauge norm

. A robust UAT is proved: for a weakly compact set

with associated Young pair

and target

, there exist networks

such that

under various activation regimes. By linking Orlicz-space universality to distributional robustness, the work extends classical

results and provides a theoretical basis for distributionally robust learning and analysis in settings with ambiguity about data distributions. These results broaden the applicability of neural networks under distributional shifts and non-standard growth conditions, offering a rigorous basis for robust approximation in learning and related PDE contexts.

Abstract

-setting. The covered classes of neural networks include widely used architectures like feedforward neural networks with non-polynomial activation functions, deep narrow networks with ReLU activation functions and functional input neural networks.

Distributionally robust approximation property of neural networks

TL;DR

Abstract

Distributionally robust approximation property of neural networks

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (27)