Table of Contents
Fetching ...

Regularity and tailored regularization of Deep Neural Networks, with application to parametric PDEs in uncertainty quantification

Alexander Keller, Frances Y. Kuo, Dirk Nuyens, Ian H. Sloan

TL;DR

The paper develops a rigorous regularity theory for deep neural surrogates of high-dimensional, smooth target maps, introducing a periodic DNN variant and a tailored lattice training approach. It proves explicit mixed-derivative bounds that depend on network parameters and activation choices, and shows that, with derivative-matching restrictions, the generalization error decays as ${\mathcal E}_G \le {\tt tol} + {\mathcal O}(N^{-r/2})$ with $r=1/p^*$, independent of dimension. The work applies the theory to parametric elliptic PDEs in uncertainty quantification and demonstrates through numerical experiments that tailored regularization improves performance, especially when using lattice (QMC) training points. The results provide a principled, dimension-robust framework for building accurate DNN surrogates in high-dimensional, physics-informed settings, with broader relevance to QMC-based training and polynomial- or sparse-grid-like approximations.

Abstract

In this paper we consider Deep Neural Networks (DNNs) with a smooth activation function as surrogates for high-dimensional functions that are somewhat smooth but costly to evaluate. We consider the standard (non-periodic) DNNs as well as propose a new model of periodic DNNs which are especially suited for a class of periodic target functions when Quasi-Monte Carlo lattice points are used as training points. The primary contribution of this paper is the derivation of explicit bounds for all mixed derivatives of DNNs with respect to their input parameters. The bounds depend on the neural network parameters as well as the choice of activation function, with explicit constants. These bounds are fully general and remain independent of both the target function and the training data. By imposing restrictions on the network parameters to match the regularity features of the target functions, we prove that DNNs with $N$ tailor-constructed lattice training points can achieve the generalization error (or $L_2$ approximation error) bound ${\tt tol} + \mathcal{O}(N^{-r/2})$, where ${\tt tol}\in (0,1)$ is the tolerance achieved by the training error in practice, and $r = 1/p^*$, with $p^*$ being the ``summability exponent'' of a sequence that characterises the decay of the input variables in the target functions, and with the implied constant independent of the dimensionality of the input data. We apply our analysis to popular models of parametric elliptic PDEs in uncertainty quantification. In our numerical experiments, we restrict the network parameters during training by adding tailored regularization terms, and we show that for an algebraic equation mimicking the parametric PDE problems the DNNs trained with tailored regularization perform significantly better.

Regularity and tailored regularization of Deep Neural Networks, with application to parametric PDEs in uncertainty quantification

TL;DR

The paper develops a rigorous regularity theory for deep neural surrogates of high-dimensional, smooth target maps, introducing a periodic DNN variant and a tailored lattice training approach. It proves explicit mixed-derivative bounds that depend on network parameters and activation choices, and shows that, with derivative-matching restrictions, the generalization error decays as with , independent of dimension. The work applies the theory to parametric elliptic PDEs in uncertainty quantification and demonstrates through numerical experiments that tailored regularization improves performance, especially when using lattice (QMC) training points. The results provide a principled, dimension-robust framework for building accurate DNN surrogates in high-dimensional, physics-informed settings, with broader relevance to QMC-based training and polynomial- or sparse-grid-like approximations.

Abstract

In this paper we consider Deep Neural Networks (DNNs) with a smooth activation function as surrogates for high-dimensional functions that are somewhat smooth but costly to evaluate. We consider the standard (non-periodic) DNNs as well as propose a new model of periodic DNNs which are especially suited for a class of periodic target functions when Quasi-Monte Carlo lattice points are used as training points. The primary contribution of this paper is the derivation of explicit bounds for all mixed derivatives of DNNs with respect to their input parameters. The bounds depend on the neural network parameters as well as the choice of activation function, with explicit constants. These bounds are fully general and remain independent of both the target function and the training data. By imposing restrictions on the network parameters to match the regularity features of the target functions, we prove that DNNs with tailor-constructed lattice training points can achieve the generalization error (or approximation error) bound , where is the tolerance achieved by the training error in practice, and , with being the ``summability exponent'' of a sequence that characterises the decay of the input variables in the target functions, and with the implied constant independent of the dimensionality of the input data. We apply our analysis to popular models of parametric elliptic PDEs in uncertainty quantification. In our numerical experiments, we restrict the network parameters during training by adding tailored regularization terms, and we show that for an algebraic equation mimicking the parametric PDE problems the DNNs trained with tailored regularization perform significantly better.

Paper Structure

This paper contains 24 sections, 8 theorems, 64 equations, 4 figures.

Key Result

Theorem 2.1

Let the sequences $(\beta_j)_{j\ge 1}$, $(R_\ell)_{\ell\ge 1}$, $(A_n)_{n\ge 1}$ be defined as in eq:beta, eq:R, eq:sigma, respectively. For any depth $\ell\ge 1$, any component $1\le p\le d_{\ell+1}$ of $G_\theta^{[\ell]}({\boldsymbol{y}})$, and any multiindex ${\boldsymbol{\nu}}\in{\mathcal{I}}$ w In both cases the sequence $\Gamma_n^{[\ell]}$ is defined recursively by

Figures (4)

  • Figure 1: Values of ${\mathcal{E}}_T$ ($\circ$ green circles), $\widetilde{{\mathcal{E}}}_G$ ($\bullet$ red dots), and $|\widetilde{{\mathcal{E}}}_G - {\mathcal{E}}_T|$ ($\blacksquare$ blue squares) as $N$ increases, with $L = 3$, $N_{\rm obs} = 1$, $d_\ell = 32$, $s = 50$.
  • Figure 2: Values of ${\mathcal{E}}_T$ ($\circ$ green circles), $\widetilde{{\mathcal{E}}}_G$ ($\bullet$ red dots), and $|\widetilde{{\mathcal{E}}}_G - {\mathcal{E}}_T|$ ($\blacksquare$ blue squares) as $N$ increases, with $L = 12$, $N_{\rm obs} = 1$, $d_\ell = 30$, $s = 50$.
  • Figure 3: Values of $\log(\beta_j)$ ($\bullet$ blue dots) and $\log(b_j/L)$ (black line) for $j=1,\ldots,s$ for one random initialization and one random shift, with $L = 3$, $N_{\rm obs} = 1$, $d_\ell = 32$, $s = 50$, $N=2^5$.
  • Figure 4: Values of $\log(\beta_j)$ ($\bullet$ blue dots) and $\log(b_j/L)$ (black line) for $j=1,\ldots,s$, for one random initialization and one random shift, with $L = 12$, $N_{\rm obs}\! =\! 1$, $d_\ell\! =\! 30$, $s = 50$, $N=2^5$.

Theorems & Definitions (8)

  • Theorem 2.1: Regularity bounds for DNNs
  • Theorem 2.2: Regularity bounds for DNNs with $A_n$ of the form \ref{['eq:common']}
  • Theorem 3.1: Norm bounds for DNNs
  • Theorem 3.2: Match regularity of DNNs to observables
  • Theorem 3.3: Tailored lattice training points
  • Lemma 6.1
  • Lemma 6.2
  • Lemma 6.3