Table of Contents
Fetching ...

Differentiable Neural Networks with RePU Activation: with Applications to Score Estimation and Isotonic Regression

Guohao Shen, Yuling Jiao, Yuanyuan Lin, Jian Huang

TL;DR

This work develops a rigorous theory for differentiable neural networks with RePU activations, showing that network derivatives admit efficient Mixed RePU representations and establishing complexity and approximation bounds that jointly address functions and their derivatives. It introduces deep score estimation (DSME) and penalized deep isotonic regression (PDIR), providing non-asymptotic excess risk bounds and minimax-optimal rates under $C^s$-smooth targets, with robustness to misspecification and improvements when the data lie near low-dimensional manifolds. A central advance is the simultaneous approximation of $C^s$ functions and their derivatives, enabled by explicit RePU architectures and polynomial-representation techniques, along with a manifold-aware analysis that mitigates the curse of dimensionality. The results have practical implications for high-dimensional derivative-based estimation tasks, diffusion-based generative modeling, and shape-constrained regression, offering theoreticallygrounded, scalable tools for score estimation and isotonic regression in complex settings.

Abstract

We study the properties of differentiable neural networks activated by rectified power unit (RePU) functions. We show that the partial derivatives of RePU neural networks can be represented by RePUs mixed-activated networks and derive upper bounds for the complexity of the function class of derivatives of RePUs networks. We establish error bounds for simultaneously approximating $C^s$ smooth functions and their derivatives using RePU-activated deep neural networks. Furthermore, we derive improved approximation error bounds when data has an approximate low-dimensional support, demonstrating the ability of RePU networks to mitigate the curse of dimensionality. To illustrate the usefulness of our results, we consider a deep score matching estimator (DSME) and propose a penalized deep isotonic regression (PDIR) using RePU networks. We establish non-asymptotic excess risk bounds for DSME and PDIR under the assumption that the target functions belong to a class of $C^s$ smooth functions. We also show that PDIR achieves the minimax optimal convergence rate and has a robustness property in the sense it is consistent with vanishing penalty parameters even when the monotonicity assumption is not satisfied. Furthermore, if the data distribution is supported on an approximate low-dimensional manifold, we show that DSME and PDIR can mitigate the curse of dimensionality.

Differentiable Neural Networks with RePU Activation: with Applications to Score Estimation and Isotonic Regression

TL;DR

This work develops a rigorous theory for differentiable neural networks with RePU activations, showing that network derivatives admit efficient Mixed RePU representations and establishing complexity and approximation bounds that jointly address functions and their derivatives. It introduces deep score estimation (DSME) and penalized deep isotonic regression (PDIR), providing non-asymptotic excess risk bounds and minimax-optimal rates under -smooth targets, with robustness to misspecification and improvements when the data lie near low-dimensional manifolds. A central advance is the simultaneous approximation of functions and their derivatives, enabled by explicit RePU architectures and polynomial-representation techniques, along with a manifold-aware analysis that mitigates the curse of dimensionality. The results have practical implications for high-dimensional derivative-based estimation tasks, diffusion-based generative modeling, and shape-constrained regression, offering theoreticallygrounded, scalable tools for score estimation and isotonic regression in complex settings.

Abstract

We study the properties of differentiable neural networks activated by rectified power unit (RePU) functions. We show that the partial derivatives of RePU neural networks can be represented by RePUs mixed-activated networks and derive upper bounds for the complexity of the function class of derivatives of RePUs networks. We establish error bounds for simultaneously approximating smooth functions and their derivatives using RePU-activated deep neural networks. Furthermore, we derive improved approximation error bounds when data has an approximate low-dimensional support, demonstrating the ability of RePU networks to mitigate the curse of dimensionality. To illustrate the usefulness of our results, we consider a deep score matching estimator (DSME) and propose a penalized deep isotonic regression (PDIR) using RePU networks. We establish non-asymptotic excess risk bounds for DSME and PDIR under the assumption that the target functions belong to a class of smooth functions. We also show that PDIR achieves the minimax optimal convergence rate and has a robustness property in the sense it is consistent with vanishing penalty parameters even when the monotonicity assumption is not satisfied. Furthermore, if the data distribution is supported on an approximate low-dimensional manifold, we show that DSME and PDIR can mitigate the curse of dimensionality.
Paper Structure (29 sections, 23 theorems, 171 equations, 19 figures, 4 tables)

This paper contains 29 sections, 23 theorems, 171 equations, 19 figures, 4 tables.

Key Result

Theorem 1

Let $\mathcal{F}:=\mathcal{F}_{\mathcal{D},\mathcal{W}, \mathcal{U},\mathcal{S},\mathcal{B},\mathcal{B}^\prime}$ be a class of RePU $\sigma_p$ activated neural networks $f:\mathcal{X}\to\mathbb{R}$ with depth (number of hidden layer) $\mathcal{D}$, width (maximum width of hidden layer) $\mathcal{W}$

Figures (19)

  • Figure 1: Examples of PDIR estimates. In all figures, the data points are depicted as grey dots, the underlying regression functions are plotted as solid black curves, and PDIR estimates with different levels of penalty parameter $\lambda$ are plotted as colored curves. In the top two figures, data are generated from models with monotonic regression functions. In the bottom left figure, the target function is a constant. In the bottom right figure, the model is misspecified, in which the underlying regression function is not monotonic. Small values of $\lambda$ can lead to non-monotonic estimated functions.
  • Figure S2: Univariate data generation models. The target functions are depicted by solid curves in blue and instance samples with size $n=64$ are depicted as black dots.
  • Figure S3: An instance of the estimated curves for the "Linear", "Exp", "Step" and "Constant" models when sample size $n=64$. The training data is depicted as grey dots. The target functions are depicted as dashed curves in black, and the estimated functions are represented by solid curves with different colors.
  • Figure S4: Heatmaps for the target function $f_0$, the observed training data, and its deep isotonic regression and isotonic least squares estimate (isotonic LSE) under model (a) when $d=2$ and $n=64$.
  • Figure S5: 3D surface plots for the target function $f_0$, the observed training data, and its deep isotonic regression and isotonic least squares estimate (isotonic LSE) under model (a) when $d=2$ and $n=64$.
  • ...and 14 more figures

Theorems & Definitions (36)

  • Theorem 1: Neural networks for partial derivatives
  • Lemma 2: Pseudo dimension of Mixed RePUs multilayer perceptrons
  • Theorem 3: Representation of Polynomials by RePU networks
  • Definition 4: Multivariate differentiable class $C^s$
  • Theorem 5
  • Remark 6
  • Theorem 8: Improved approximation results
  • Lemma 10
  • Remark 11
  • Remark 12
  • ...and 26 more