Semi-Supervised Deep Sobolev Regression: Estimation and Variable Selection by ReQU Neural Network

Zhao Ding; Chenguang Duan; Yuling Jiao; Jerry Zhijian Yang

Semi-Supervised Deep Sobolev Regression: Estimation and Variable Selection by ReQU Neural Network

Zhao Ding, Chenguang Duan, Yuling Jiao, Jerry Zhijian Yang

TL;DR

This paper introduces SDORE, a semi-supervised deep Sobolev regressor based on ReQU networks that penalizes gradient norm to jointly estimate a regression function and its gradient. By incorporating unlabeled data to approximate the Sobolev penalty, the method achieves minimax-optimal $L^{2}$ convergence for the function and provides convergence guarantees for the gradient plug-in estimator under domain shift. The authors establish oracle inequalities and rate results for both DORE and its semi-supervised variant, showing advantages of unlabeled data and deriving guidance on regularization and network size. The framework is extended to nonparametric variable selection via derivative-based sparsity, with theoretical guarantees and extensive numerical validation. Overall, the work advances theory for neural network-based Sobolev regression, clarifies when unlabeled data helps, and demonstrates practical benefits for high-dimensional, nonparametric settings.

Abstract

We propose SDORE, a Semi-supervised Deep Sobolev Regressor, for the nonparametric estimation of the underlying regression function and its gradient. SDORE employs deep ReQU neural networks to minimize the empirical risk with gradient norm regularization, allowing the approximation of the regularization term by unlabeled data. Our study includes a thorough analysis of the convergence rates of SDORE in $L^{2}$-norm, achieving the minimax optimality. Further, we establish a convergence rate for the associated plug-in gradient estimator, even in the presence of significant domain shift. These theoretical findings offer valuable insights for selecting regularization parameters and determining the size of the neural network, while showcasing the provable advantage of leveraging unlabeled data in semi-supervised learning. To the best of our knowledge, SDORE is the first provable neural network-based approach that simultaneously estimates the regression function and its gradient, with diverse applications such as nonparametric variable selection. The effectiveness of SDORE is validated through an extensive range of numerical simulations.

Semi-Supervised Deep Sobolev Regression: Estimation and Variable Selection by ReQU Neural Network

TL;DR

convergence for the function and provides convergence guarantees for the gradient plug-in estimator under domain shift. The authors establish oracle inequalities and rate results for both DORE and its semi-supervised variant, showing advantages of unlabeled data and deriving guidance on regularization and network size. The framework is extended to nonparametric variable selection via derivative-based sparsity, with theoretical guarantees and extensive numerical validation. Overall, the work advances theory for neural network-based Sobolev regression, clarifies when unlabeled data helps, and demonstrates practical benefits for high-dimensional, nonparametric settings.

Abstract

-norm, achieving the minimax optimality. Further, we establish a convergence rate for the associated plug-in gradient estimator, even in the presence of significant domain shift. These theoretical findings offer valuable insights for selecting regularization parameters and determining the size of the neural network, while showcasing the provable advantage of leveraging unlabeled data in semi-supervised learning. To the best of our knowledge, SDORE is the first provable neural network-based approach that simultaneously estimates the regression function and its gradient, with diverse applications such as nonparametric variable selection. The effectiveness of SDORE is validated through an extensive range of numerical simulations.

Paper Structure (36 sections, 20 theorems, 166 equations, 4 figures, 2 tables)

This paper contains 36 sections, 20 theorems, 166 equations, 4 figures, 2 tables.

Introduction
Contributions
Main Results Overview
Preliminaries and notations
Organization
Related Work
Nonparametric Derivative Estimation
Local Polynomial Regression
Smoothing Splines
Kernel Ridge Regression
Nonparametric Regression using Deep Neural Network
Nonparametric Vairable Selection
Semi-Supervised Learning
Deep Sobolev Regression
Deep Sobolev regressor
...and 21 more sections

Key Result

Lemma 3.1

Suppose Assumption assumption:bounded:density:ratio holds and $f_{0}\in L^{2}(\mu_{X})$. Then eq:population:risk:regularization has a unique minimizer in $H^{1}(\nu_{X})$. Furthermore, the minimizer $f^{\lambda}$ satisfies $f^{\lambda}\in H^{2}(\nu_{X})$.

Figures (4)

Figure 1: Numerical results of Example \ref{['example:onedim']}. (left) Scatter plot of noisy observations (paired data used for supervised learning), line plot of the ground-truth regression function and its values predicted by least-squares (LS) regression and SDORE. (right) The ground truth derivative function and its estimated values by LS and SDORE.
Figure 2: Numerical results of Example \ref{['example:selection']}. The empirical mean square of the partial derivatives of the regression function $f_{0}$ (which depends only on the $x_1$ to $x_4$), estimated by least-squares fitting (LS, left) and SDORE (right). The dashed line is the 75% quantile threshold for variable selection. We also report the mean selection error (SE) for the estimated derivative function and the root mean squared prediction error (PE) for the primitive function by each method.
Figure 3: Effect of the regularization technique on function fitting for a toy problem $f_0(x) = x_1^2$. (a) Landscape of the primitive function and its partial derivatives. The train samples are plotted in black dots. (b) least-squares fitting estimation. (c) DORE estimation. (d) SDORE estimation.
Figure 4: (left) Empirical mean square of the partial derivatives estimated by least-squares regression (LS) and SDORE on a variable selection problem in $\mathbb{R}^{10}$ where $f_0$ is dependent on the $x_1$ to $x_4$. The dashed line is the 75 % quantile threshold for variable selection. (center) Mean variable selection error for the estimated derivative function on test set. (right) Root mean squared prediction error for the primitive function on test set.

Theorems & Definitions (48)

Definition 1: Continuous functions space
Definition 2: Deep ReQU neural network
Definition 3: Empirical covering number
Lemma 3.1: Existence and uniqueness of population risk minimizer
Lemma 4.1: Oracle inequality
Lemma 4.2: Approximation with gradient constraints
Theorem 4.3: Convergence rates
Lemma 5.1
Lemma 5.2: Oracle inequality
Lemma 5.3: Approximation in $H^{1}$-norm
...and 38 more

Semi-Supervised Deep Sobolev Regression: Estimation and Variable Selection by ReQU Neural Network

TL;DR

Abstract

Semi-Supervised Deep Sobolev Regression: Estimation and Variable Selection by ReQU Neural Network

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (48)