Table of Contents
Fetching ...

Non-identifiability distinguishes Neural Networks among Parametric Models

Sourav Chatterjee, Timothy Sudijono

TL;DR

This paper addresses how neural networks differ from traditional parametric models in regression by analyzing population-level mean-squared error under the condition $\mathbb{E}[\mathrm{Var}(Y|X)] < \mathrm{Var}(Y)$. It proves that feedforward neural networks with mild architectural assumptions satisfy $\mathbb{E}[(\hat f(X)-Y)^2] < \mathrm{Var}(Y)$, i.e., they always learn a nontrivial relationship when one exists, while many smooth parametric models can collapse to the constant predictor $\mathbb{E}[Y]$ under local and strong identifiability. It also provides a partial converse and concrete examples contrasting logistic regression with neural nets to illustrate identifiability issues. Collectively, the results explain why neural networks can exploit data structure beyond identifiable parametric families and highlight non-identifiability as a distinguishing feature.

Abstract

One of the enduring problems surrounding neural networks is to identify the factors that differentiate them from traditional statistical models. We prove a pair of results which distinguish feedforward neural networks among parametric models at the population level, for regression tasks. Firstly, we prove that for any pair of random variables $(X,Y)$, neural networks always learn a nontrivial relationship between $X$ and $Y$, if one exists. Secondly, we prove that for reasonable smooth parametric models, under local and global identifiability conditions, there exists a nontrivial $(X,Y)$ pair for which the parametric model learns the constant predictor $\mathbb{E}[Y]$. Together, our results suggest that a lack of identifiability distinguishes neural networks among the class of smooth parametric models.

Non-identifiability distinguishes Neural Networks among Parametric Models

TL;DR

This paper addresses how neural networks differ from traditional parametric models in regression by analyzing population-level mean-squared error under the condition . It proves that feedforward neural networks with mild architectural assumptions satisfy , i.e., they always learn a nontrivial relationship when one exists, while many smooth parametric models can collapse to the constant predictor under local and strong identifiability. It also provides a partial converse and concrete examples contrasting logistic regression with neural nets to illustrate identifiability issues. Collectively, the results explain why neural networks can exploit data structure beyond identifiable parametric families and highlight non-identifiability as a distinguishing feature.

Abstract

One of the enduring problems surrounding neural networks is to identify the factors that differentiate them from traditional statistical models. We prove a pair of results which distinguish feedforward neural networks among parametric models at the population level, for regression tasks. Firstly, we prove that for any pair of random variables , neural networks always learn a nontrivial relationship between and , if one exists. Secondly, we prove that for reasonable smooth parametric models, under local and global identifiability conditions, there exists a nontrivial pair for which the parametric model learns the constant predictor . Together, our results suggest that a lack of identifiability distinguishes neural networks among the class of smooth parametric models.

Paper Structure

This paper contains 10 sections, 7 theorems, 68 equations.

Key Result

Proposition 1.1

Let $X \sim \textup{MVN}(0,I_p)$. Then there exists a random variable $Y$ such that the best fitting logistic model minimizing square loss is the constant prediction $\mathbb{E}[Y]$. That is, Explicitly, $Y$ is a random variable such that $\mathbb{E}(Y=1|X=x) = \frac{1}{2}+\epsilon g_0(x)$ for sufficiently small $\epsilon$, where $g_0$ is a bounded even function. On the other hand, for any random

Theorems & Definitions (13)

  • Proposition 1.1: Logistic model vs. neural network model
  • Theorem 2.1
  • Lemma 2.1
  • Lemma 2.2: Approximation capabilities of neural networks
  • Theorem 2.2
  • Corollary 2.1: Linear Regression
  • Corollary 2.2: Logistic Regression.
  • proof : Proof of Lemma \ref{['lemma:correlated_indicator_regression_case']}
  • proof : Proof of Lemma \ref{['lemma:approximation_neural_networks']}
  • proof : Proof of Theorem \ref{['thm:neural_networks_learn_everything']}
  • ...and 3 more