Table of Contents
Fetching ...

Finite Samples for Shallow Neural Networks

Yu Xia, Zhiqiang Xu

TL;DR

This work addresses the problem of uniquely identifying two-layer shallow neural networks from finitely many samples, focusing on irreducible networks under ReLU and analytic activations. It develops a complete set of criteria for when ReLU networks can be reduced and proves that finite samples cannot universally identify irreducible ReLU networks with a fixed neuron count, while also giving a sampling strategy that distinguishes a given irreducible ReLU network. In contrast, for analytic activations (sigmoid/tanh), the paper proves a constructive finite-sample identifiability result: there exists a finite deterministic set of points that guarantees exact identification of any two irreducible shallow analytic networks, with an explicit sample-size bound. The results illuminate how activation choice impacts identifiability under limited sampling and provide concrete sampling constructions with implications for exact network identification and identifiability theory.

Abstract

This paper investigates the ability of finite samples to identify two-layer irreducible shallow networks with various nonlinear activation functions, including rectified linear units (ReLU) and analytic functions such as the logistic sigmoid and hyperbolic tangent. An ``irreducible" network is one whose function cannot be represented by another network with fewer neurons. For ReLU activation functions, we first establish necessary and sufficient conditions for determining the irreducibility of a network. Subsequently, we prove a negative result: finite samples are insufficient for definitive identification of any irreducible ReLU shallow network. Nevertheless, we demonstrate that for a given irreducible network, one can construct a finite set of sampling points that can distinguish it from other network with the same neuron count. Conversely, for logistic sigmoid and hyperbolic tangent activation functions, we provide a positive result. We construct finite samples that enable the recovery of two-layer irreducible shallow analytic networks. To the best of our knowledge, this is the first study to investigate the exact identification of two-layer irreducible networks using finite sample function values. Our findings provide insights into the comparative performance of networks with different activation functions under limited sampling conditions.

Finite Samples for Shallow Neural Networks

TL;DR

This work addresses the problem of uniquely identifying two-layer shallow neural networks from finitely many samples, focusing on irreducible networks under ReLU and analytic activations. It develops a complete set of criteria for when ReLU networks can be reduced and proves that finite samples cannot universally identify irreducible ReLU networks with a fixed neuron count, while also giving a sampling strategy that distinguishes a given irreducible ReLU network. In contrast, for analytic activations (sigmoid/tanh), the paper proves a constructive finite-sample identifiability result: there exists a finite deterministic set of points that guarantees exact identification of any two irreducible shallow analytic networks, with an explicit sample-size bound. The results illuminate how activation choice impacts identifiability under limited sampling and provide concrete sampling constructions with implications for exact network identification and identifiability theory.

Abstract

This paper investigates the ability of finite samples to identify two-layer irreducible shallow networks with various nonlinear activation functions, including rectified linear units (ReLU) and analytic functions such as the logistic sigmoid and hyperbolic tangent. An ``irreducible" network is one whose function cannot be represented by another network with fewer neurons. For ReLU activation functions, we first establish necessary and sufficient conditions for determining the irreducibility of a network. Subsequently, we prove a negative result: finite samples are insufficient for definitive identification of any irreducible ReLU shallow network. Nevertheless, we demonstrate that for a given irreducible network, one can construct a finite set of sampling points that can distinguish it from other network with the same neuron count. Conversely, for logistic sigmoid and hyperbolic tangent activation functions, we provide a positive result. We construct finite samples that enable the recovery of two-layer irreducible shallow analytic networks. To the best of our knowledge, this is the first study to investigate the exact identification of two-layer irreducible networks using finite sample function values. Our findings provide insights into the comparative performance of networks with different activation functions under limited sampling conditions.

Paper Structure

This paper contains 18 sections, 11 theorems, 93 equations, 1 figure.

Key Result

Theorem 2.1

Let $f_{\mathcal{N}}$ be an admissible shallow ReLU network in the form of (eqn: fN_new). Then $f_{\mathcal{N}}$ is reducible if and only if one of the following three conditions is satisfied: For conditions (i) and (ii), the index $i_k$ is defined as:

Figures (1)

  • Figure 1: Uniquely determining parameters ${\mathcal{N}}$ using $f_{\mathcal{N}}$ values at specific sampling points. Here $f_{\mathcal{N}}({\boldsymbol x})=\sigma(\langle {{\boldsymbol a}_1,{\boldsymbol x}} \rangle+b_1)+\sigma(\langle {{\boldsymbol a}_2,{\boldsymbol x}} \rangle+b_2)$ where ${\boldsymbol a}_1=(1,1)^T, b_1=0$ , ${\boldsymbol a}_2=(1,-1)^T, b_2=0$, $m=2$, and $d=2$. Red points: Sampling points from the proof of Theorem \ref{['th:samplingc']}, used to determine $f_{\mathcal{N}}$ values along lines ${\mathcal{L}}_j, j=1,\ldots,4$. Blue points: Intersections of ${\mathcal{L}}_j$ with ${\mathcal{H}}({\boldsymbol a}_1,b_1)$ and ${\mathcal{H}}({\boldsymbol a}_2,b_2)$, derived from $f_{\mathcal{N}}$ values on ${\mathcal{L}}_j, j=1,\ldots,4$. These intersections uniquely determine ${\mathcal{H}}({\boldsymbol a}_1, b_1)$, ${\mathcal{H}}({\boldsymbol a}_2, b_2)$, and consequently, ${\mathcal{N}}$.

Theorems & Definitions (34)

  • Definition 1.1
  • Definition 2.1: Admissible shallow ReLU network
  • Theorem 2.1
  • proof
  • Remark 2.1
  • Theorem 2.2
  • proof
  • Definition 2.2
  • Theorem 2.3
  • proof
  • ...and 24 more