Neural Networks Perform Sufficient Dimension Reduction
Shuntuo Xu, Zhou Yu
TL;DR
This work establishes a rigorous link between neural networks and sufficient dimension reduction in regression by showing that, under rank regularization in the first layer, the network learns a projection $B_0^{\top}x$ that captures the central mean subspace, i.e., $\Pi_{\mathcal{T}(f^*)}=\Pi_{B_0}$ at the population level. It proves both population-level unbiasedness (Theorem 1) and sample-level consistency (Theorem 2), with the estimator converging as $n$ grows under scalable network depth and width, without strong distributional assumptions on $x$. The authors validate the theory through comprehensive simulations and a real-data study on Seoul weather data, showing competitive or superior SDR performance relative to classical methods and demonstrating practical utility of NN-based SDR. They also discuss extensions toward the central subspace via kernel-based strategies, hinting at broader applicability of neural networks for SDR beyond the mean subspace.
Abstract
This paper investigates the connection between neural networks and sufficient dimension reduction (SDR), demonstrating that neural networks inherently perform SDR in regression tasks under appropriate rank regularizations. Specifically, the weights in the first layer span the central mean subspace. We establish the statistical consistency of the neural network-based estimator for the central mean subspace, underscoring the suitability of neural networks in addressing SDR-related challenges. Numerical experiments further validate our theoretical findings, and highlight the underlying capability of neural networks to facilitate SDR compared to the existing methods. Additionally, we discuss an extension to unravel the central subspace, broadening the scope of our investigation.
