Non-identifiability distinguishes Neural Networks among Parametric Models

Sourav Chatterjee; Timothy Sudijono

Non-identifiability distinguishes Neural Networks among Parametric Models

Sourav Chatterjee, Timothy Sudijono

TL;DR

This paper addresses how neural networks differ from traditional parametric models in regression by analyzing population-level mean-squared error under the condition $\mathbb{E}[\mathrm{Var}(Y|X)] < \mathrm{Var}(Y)$. It proves that feedforward neural networks with mild architectural assumptions satisfy $\mathbb{E}[(\hat f(X)-Y)^2] < \mathrm{Var}(Y)$, i.e., they always learn a nontrivial relationship when one exists, while many smooth parametric models can collapse to the constant predictor $\mathbb{E}[Y]$ under local and strong identifiability. It also provides a partial converse and concrete examples contrasting logistic regression with neural nets to illustrate identifiability issues. Collectively, the results explain why neural networks can exploit data structure beyond identifiable parametric families and highlight non-identifiability as a distinguishing feature.

Abstract

One of the enduring problems surrounding neural networks is to identify the factors that differentiate them from traditional statistical models. We prove a pair of results which distinguish feedforward neural networks among parametric models at the population level, for regression tasks. Firstly, we prove that for any pair of random variables $(X,Y)$, neural networks always learn a nontrivial relationship between $X$ and $Y$, if one exists. Secondly, we prove that for reasonable smooth parametric models, under local and global identifiability conditions, there exists a nontrivial $(X,Y)$ pair for which the parametric model learns the constant predictor $\mathbb{E}[Y]$. Together, our results suggest that a lack of identifiability distinguishes neural networks among the class of smooth parametric models.

Non-identifiability distinguishes Neural Networks among Parametric Models

TL;DR

This paper addresses how neural networks differ from traditional parametric models in regression by analyzing population-level mean-squared error under the condition

. It proves that feedforward neural networks with mild architectural assumptions satisfy

, i.e., they always learn a nontrivial relationship when one exists, while many smooth parametric models can collapse to the constant predictor

under local and strong identifiability. It also provides a partial converse and concrete examples contrasting logistic regression with neural nets to illustrate identifiability issues. Collectively, the results explain why neural networks can exploit data structure beyond identifiable parametric families and highlight non-identifiability as a distinguishing feature.

Abstract

, neural networks always learn a nontrivial relationship between

and

, if one exists. Secondly, we prove that for reasonable smooth parametric models, under local and global identifiability conditions, there exists a nontrivial

pair for which the parametric model learns the constant predictor

. Together, our results suggest that a lack of identifiability distinguishes neural networks among the class of smooth parametric models.

Non-identifiability distinguishes Neural Networks among Parametric Models

TL;DR

Abstract

Non-identifiability distinguishes Neural Networks among Parametric Models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (13)