Sparse deep neural networks for nonparametric estimation in high-dimensional sparse regression

Dongya Wu; Xin Li

Sparse deep neural networks for nonparametric estimation in high-dimensional sparse regression

Dongya Wu, Xin Li

TL;DR

The paper tackles parameter identifiability in deep neural networks used for high-dimensional sparse regression by shifting focus to nonparametric estimation of input partial derivatives. It proves model convergence under an $\ell_1$-norm constraint and smooth activation, with rate $O\left(\frac{\sqrt{\log(P)\mathbb{E}\|x\|_{\infty}^{2}}}{\sqrt{n}}\right)$, and shows partial derivatives converge at a slower rate of $O(n^{-1/4})$, a result enabled by bounding gradient norms and divergences. These findings, together with theoretical guarantees and experiments, support the interpretability of deep nets through variable importance derived from derivatives. The work suggests that reliable nonparametric information about input effects can be extracted even when parameter estimation is hampered by unidentifiability, offering a path toward nonlinear variable selection in high dimensions.

Abstract

Generalization theory has been established for sparse deep neural networks under high-dimensional regime. Beyond generalization, parameter estimation is also important since it is crucial for variable selection and interpretability of deep neural networks. Current theoretical studies concerning parameter estimation mainly focus on two-layer neural networks, which is due to the fact that the convergence of parameter estimation heavily relies on the regularity of the Hessian matrix, while the Hessian matrix of deep neural networks is highly singular. To avoid the unidentifiability of deep neural networks in parameter estimation, we propose to conduct nonparametric estimation of partial derivatives with respect to inputs. We first show that model convergence of sparse deep neural networks is guaranteed in that the sample complexity only grows with the logarithm of the number of parameters or the input dimension when the $\ell_{1}$-norm of parameters is well constrained. Then by bounding the norm and the divergence of partial derivatives, we establish that the convergence rate of nonparametric estimation of partial derivatives scales as $\mathcal{O}(n^{-1/4})$, a rate which is slower than the model convergence rate $\mathcal{O}(n^{-1/2})$. To the best of our knowledge, this study combines nonparametric estimation and parametric sparse deep neural networks for the first time. As nonparametric estimation of partial derivatives is of great significance for nonlinear variable selection, the current results show the promising future for the interpretability of deep neural networks.

Sparse deep neural networks for nonparametric estimation in high-dimensional sparse regression

TL;DR

-norm constraint and smooth activation, with rate

, and shows partial derivatives converge at a slower rate of

, a result enabled by bounding gradient norms and divergences. These findings, together with theoretical guarantees and experiments, support the interpretability of deep nets through variable importance derived from derivatives. The work suggests that reliable nonparametric information about input effects can be extracted even when parameter estimation is hampered by unidentifiability, offering a path toward nonlinear variable selection in high dimensions.

Abstract

-norm of parameters is well constrained. Then by bounding the norm and the divergence of partial derivatives, we establish that the convergence rate of nonparametric estimation of partial derivatives scales as

, a rate which is slower than the model convergence rate

. To the best of our knowledge, this study combines nonparametric estimation and parametric sparse deep neural networks for the first time. As nonparametric estimation of partial derivatives is of great significance for nonlinear variable selection, the current results show the promising future for the interpretability of deep neural networks.

Paper Structure (12 sections, 8 theorems, 81 equations, 1 figure)

This paper contains 12 sections, 8 theorems, 81 equations, 1 figure.

Introduction
Problem setup
Nonparametric regression problems
Sparse deep neural networks
Nonparametric estimation of partial derivatives
Main results
Convergence of the model
Convergence of partial derivatives
Experiments
Conclusion
Proof of Section \ref{['sec-model']}
Proof of Section \ref{['sec-derivative']}

Key Result

lemma 3.5

Let $\hat{f}$ be the estimated model obtained from eq-mixed-sparse. Then under Assumption bounded-assumption, it follows that

Figures (1)

Figure 1: $L^{2}$ error of model prediction and derivative estimation.

Theorems & Definitions (20)

definition 3.1: Rademacher complexity
definition 3.2: Covering number
lemma 3.5: Oracle inequality
lemma 3.6: Lipschitz property with respect to parameters
lemma 3.7: Rademacher complexity of sparse deep neural networks
theorem 3.8: Convergence of the model
remark 3.9
lemma 3.10: Bounded norm of partial derivatives
lemma 3.11: Bounded divergence of partial derivatives
remark 3.12
...and 10 more

Sparse deep neural networks for nonparametric estimation in high-dimensional sparse regression

TL;DR

Abstract

Sparse deep neural networks for nonparametric estimation in high-dimensional sparse regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (20)