Table of Contents
Fetching ...

Neural Networks Perform Sufficient Dimension Reduction

Shuntuo Xu, Zhou Yu

TL;DR

This work establishes a rigorous link between neural networks and sufficient dimension reduction in regression by showing that, under rank regularization in the first layer, the network learns a projection $B_0^{\top}x$ that captures the central mean subspace, i.e., $\Pi_{\mathcal{T}(f^*)}=\Pi_{B_0}$ at the population level. It proves both population-level unbiasedness (Theorem 1) and sample-level consistency (Theorem 2), with the estimator converging as $n$ grows under scalable network depth and width, without strong distributional assumptions on $x$. The authors validate the theory through comprehensive simulations and a real-data study on Seoul weather data, showing competitive or superior SDR performance relative to classical methods and demonstrating practical utility of NN-based SDR. They also discuss extensions toward the central subspace via kernel-based strategies, hinting at broader applicability of neural networks for SDR beyond the mean subspace.

Abstract

This paper investigates the connection between neural networks and sufficient dimension reduction (SDR), demonstrating that neural networks inherently perform SDR in regression tasks under appropriate rank regularizations. Specifically, the weights in the first layer span the central mean subspace. We establish the statistical consistency of the neural network-based estimator for the central mean subspace, underscoring the suitability of neural networks in addressing SDR-related challenges. Numerical experiments further validate our theoretical findings, and highlight the underlying capability of neural networks to facilitate SDR compared to the existing methods. Additionally, we discuss an extension to unravel the central subspace, broadening the scope of our investigation.

Neural Networks Perform Sufficient Dimension Reduction

TL;DR

This work establishes a rigorous link between neural networks and sufficient dimension reduction in regression by showing that, under rank regularization in the first layer, the network learns a projection that captures the central mean subspace, i.e., at the population level. It proves both population-level unbiasedness (Theorem 1) and sample-level consistency (Theorem 2), with the estimator converging as grows under scalable network depth and width, without strong distributional assumptions on . The authors validate the theory through comprehensive simulations and a real-data study on Seoul weather data, showing competitive or superior SDR performance relative to classical methods and demonstrating practical utility of NN-based SDR. They also discuss extensions toward the central subspace via kernel-based strategies, hinting at broader applicability of neural networks for SDR beyond the mean subspace.

Abstract

This paper investigates the connection between neural networks and sufficient dimension reduction (SDR), demonstrating that neural networks inherently perform SDR in regression tasks under appropriate rank regularizations. Specifically, the weights in the first layer span the central mean subspace. We establish the statistical consistency of the neural network-based estimator for the central mean subspace, underscoring the suitability of neural networks in addressing SDR-related challenges. Numerical experiments further validate our theoretical findings, and highlight the underlying capability of neural networks to facilitate SDR compared to the existing methods. Additionally, we discuss an extension to unravel the central subspace, broadening the scope of our investigation.

Paper Structure

This paper contains 10 sections, 5 theorems, 36 equations, 2 figures, 2 tables.

Key Result

Theorem 1

Suppose that Assumptions Asm: f0_smoothness and Asm: f0_sharpness hold. Let $f^*=\mathrm{argmin}_{f\in\mathcal{F}_{\mathcal{L}, \mathcal{M}, \mathcal{S}, \mathcal{R}}}\mathbb{E}[y-f(x)]^2$, then provided that $\mathcal{R}$ is sufficiently large, and $\mathcal{L}$ and $\mathcal{M}$ tend to infinity.

Figures (2)

  • Figure 1: Absolute cosine similarity between (i) $B_0$ and its projection on $\Pi_{W_{11}}$ (the dot line with triangle marks), (ii) $B_0$ and the leading eigenvector of $W_1W_1^{\top}$ (the solid line with square marks).
  • Figure 2: Boxplots of $\|\pi_{\hat{B}}-\pi_{B_0}\|_F$ on 100 replicates across different methods. In each panel, six methods for SDR are included, among which NN represents the neural network-based method.

Theorems & Definitions (8)

  • Theorem 1
  • Theorem 2
  • Lemma 1
  • proof : Proof of Theorem 1
  • Lemma 2
  • Lemma 3
  • proof : Proof of Lemma \ref{['Lem: excess_risk']}
  • proof : Proof of Theorem 2