Table of Contents
Fetching ...

Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization

Yaoyu Zhang, Leyang Zhang, Zhongwang Zhang, Zhiwei Bai

TL;DR

The paper addresses data-efficient recovery of target functions by overparameterized deep neural networks. It introduces Local Linear Recovery (LLR) and the concept of optimistic sample size to quantify the best-possible data requirements, connecting recoverability to the model's tangent-space rank. Using Embedding Principles and critical mappings, it derives upper bounds on optimistic sample sizes for general DNNs, and provides exact results for two-layer tanh networks and related CNN architectures, showing that recovery can occur with far fewer samples than the total number of parameters. The work clarifies how architecture and width influence data efficiency, sets a foundation for stronger recovery guarantees, and suggests directions for extending these concepts to deeper networks.

Abstract

Determining whether deep neural network (DNN) models can reliably recover target functions at overparameterization is a critical yet complex issue in the theory of deep learning. To advance understanding in this area, we introduce a concept we term "local linear recovery" (LLR), a weaker form of target function recovery that renders the problem more amenable to theoretical analysis. In the sense of LLR, we prove that functions expressible by narrower DNNs are guaranteed to be recoverable from fewer samples than model parameters. Specifically, we establish upper limits on the optimistic sample sizes, defined as the smallest sample size necessary to guarantee LLR, for functions in the space of a given DNN. Furthermore, we prove that these upper bounds are achieved in the case of two-layer tanh neural networks. Our research lays a solid groundwork for future investigations into the recovery capabilities of DNNs in overparameterized scenarios.

Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization

TL;DR

The paper addresses data-efficient recovery of target functions by overparameterized deep neural networks. It introduces Local Linear Recovery (LLR) and the concept of optimistic sample size to quantify the best-possible data requirements, connecting recoverability to the model's tangent-space rank. Using Embedding Principles and critical mappings, it derives upper bounds on optimistic sample sizes for general DNNs, and provides exact results for two-layer tanh networks and related CNN architectures, showing that recovery can occur with far fewer samples than the total number of parameters. The work clarifies how architecture and width influence data efficiency, sets a foundation for stronger recovery guarantees, and suggests directions for extending these concepts to deeper networks.

Abstract

Determining whether deep neural network (DNN) models can reliably recover target functions at overparameterization is a critical yet complex issue in the theory of deep learning. To advance understanding in this area, we introduce a concept we term "local linear recovery" (LLR), a weaker form of target function recovery that renders the problem more amenable to theoretical analysis. In the sense of LLR, we prove that functions expressible by narrower DNNs are guaranteed to be recoverable from fewer samples than model parameters. Specifically, we establish upper limits on the optimistic sample sizes, defined as the smallest sample size necessary to guarantee LLR, for functions in the space of a given DNN. Furthermore, we prove that these upper bounds are achieved in the case of two-layer tanh neural networks. Our research lays a solid groundwork for future investigations into the recovery capabilities of DNNs in overparameterized scenarios.

Paper Structure

This paper contains 14 sections, 22 theorems, 77 equations, 4 figures, 3 tables.

Key Result

Proposition 1

Suppose we have a differentiable model $f_{\bm{\theta}}(\cdot)$ with $M$ parameters. For any target function $f^*\in \mathcal{F}$, if it has $n$-sample LLR-guarantee, then it has $n'$-sample LLR-guarantee for any $n'\geqslant n$.

Figures (4)

  • Figure 1: (Figure 2 in zhang2021embedding) Illustration of one-step splitting embedding. The black neuron in the left network is split into the blue and purple neurons in the right network. The red (green) output weight of the black neuron in the left net is splitted into two red (green) weights in the right net with ratio $\alpha$ and $(1-\alpha)$, respectively.
  • Figure 2: Illustration of architectures from fully-connected NN to CNN for comparison.
  • Figure 3: Average test error (color) for NNs of different architectures (ordinate) and sample sizes (abscissa) in fitting the target function Eq. \ref{['eq:NN_target']}. The yellow dashed line for each row indicates the model rank of the target in the corresponding NN. (a) Two-layer $1$-kernel tanh-CNN vs. two-layer $1$-kernel tanh-CNN without weight sharing vs. two-layer width-$3$ fully-connected tanh-NN. Note that these NNs are referred to as $1\textnormal{x}$ for each architecture in (b-d). (b) Two-layer $N$-kernel tanh-CNN, (c) two-layer $N$-kernel tanh-CNN without weight sharing, and (d) two-layer width-$3N$ fully-connected tanh-NN labeled by $N\textnormal{x}$ for $N=1,3,10,34,100$. For all experiments, network parameters are initialized by a normal distribution with mean $0$ and variance $10^{-20}$, and trained by full-batch gradient descent with a fine-tuned learning rate. For the training dataset and the test dataset, we construct the input data through the standard normal distribution and obtain the output values from the target function. The size of the training dataset varies whereas the size of the test dataset is fixed to $1000$. The learning rate for the experiments in each setup is fine-tuned from $0.05$ to $0.5$ for a better generalization performance.
  • Figure 4: Schematic overview of our theoretical results and interconnections.

Theorems & Definitions (54)

  • Definition 1: target set
  • Definition 2: local linear recovery (LLR) guarantee
  • Remark 3: LLR-guarantee vs. LLR-guarantee a.e.
  • Proposition 1
  • proof
  • Definition 4: optimistic sample size
  • Definition 5: model rank
  • Definition 6: empirical tangent matrix and empirical model rank
  • Lemma 7: LLR condition
  • proof
  • ...and 44 more