Non-identifiability distinguishes Neural Networks among Parametric Models
Sourav Chatterjee, Timothy Sudijono
TL;DR
This paper addresses how neural networks differ from traditional parametric models in regression by analyzing population-level mean-squared error under the condition $\mathbb{E}[\mathrm{Var}(Y|X)] < \mathrm{Var}(Y)$. It proves that feedforward neural networks with mild architectural assumptions satisfy $\mathbb{E}[(\hat f(X)-Y)^2] < \mathrm{Var}(Y)$, i.e., they always learn a nontrivial relationship when one exists, while many smooth parametric models can collapse to the constant predictor $\mathbb{E}[Y]$ under local and strong identifiability. It also provides a partial converse and concrete examples contrasting logistic regression with neural nets to illustrate identifiability issues. Collectively, the results explain why neural networks can exploit data structure beyond identifiable parametric families and highlight non-identifiability as a distinguishing feature.
Abstract
One of the enduring problems surrounding neural networks is to identify the factors that differentiate them from traditional statistical models. We prove a pair of results which distinguish feedforward neural networks among parametric models at the population level, for regression tasks. Firstly, we prove that for any pair of random variables $(X,Y)$, neural networks always learn a nontrivial relationship between $X$ and $Y$, if one exists. Secondly, we prove that for reasonable smooth parametric models, under local and global identifiability conditions, there exists a nontrivial $(X,Y)$ pair for which the parametric model learns the constant predictor $\mathbb{E}[Y]$. Together, our results suggest that a lack of identifiability distinguishes neural networks among the class of smooth parametric models.
