Table of Contents
Fetching ...

Linear Independence of Generalized Neurons and Related Functions

Leyang Zhang

Abstract

The linear independence of neurons plays a significant role in theoretical analysis of neural networks. Specifically, given neurons $H_1, ..., H_n: \bR^N \times \bR^d \to \bR$, we are interested in the following question: when are $\{H_1(θ_1, \cdot), ..., H_n(θ_n, \cdot)\}$ are linearly independent as the parameters $θ_1, ..., θ_n$ of these functions vary over $\bR^N$. Previous works give a complete characterization of two-layer neurons without bias, for generic smooth activation functions. In this paper, we study the problem for neurons with arbitrary layers and widths, giving a simple but complete characterization for generic analytic activation functions.

Linear Independence of Generalized Neurons and Related Functions

Abstract

The linear independence of neurons plays a significant role in theoretical analysis of neural networks. Specifically, given neurons , we are interested in the following question: when are are linearly independent as the parameters of these functions vary over . Previous works give a complete characterization of two-layer neurons without bias, for generic smooth activation functions. In this paper, we study the problem for neurons with arbitrary layers and widths, giving a simple but complete characterization for generic analytic activation functions.
Paper Structure (13 sections, 31 theorems, 114 equations, 4 figures)

This paper contains 13 sections, 31 theorems, 114 equations, 4 figures.

Key Result

Lemma 3.1

Fix $m \in \mathbb{N}$. Given distinct $w_1, ..., w_m \in \mathbb{R}^d$. Then there is a $v \in \partial B(0,1) \subseteq \mathbb{R}^d$ such that $\langle w_1, v\rangle, ..., \langle w_m, v\rangle$ are distinct. Moreover, if $w_k, w_j$ are multiples to one another, then for any $v \in \partial B(0,1

Figures (4)

  • Figure 1: Overview and structure of this paper.
  • Figure 2: Illustration of example (a): how to construct the function sequence $\{f_n\}_{n=1}^\infty$ for $\rho(x) = e^x$.
  • Figure 3: Construction of an analytic function $\Tilde{\sigma}$ that approximates Tanh activation on an interval around 0, following Proposition \ref{['Prop function conca and S-order approx']} (a). Here we use $\sigma(x) = e^{x^2}$ and $\zeta_4$ defined as in Corollary \ref{['Cor Analytic bump function']}, with "base function" $f(x) = e^{|x|}$.
  • Figure 4: Construction of an analytic function $\Tilde{\sigma}$ that approximates Tanh activation globally on $\mathbb{R}$, following Proposition \ref{['Prop function conca and S-order approx']} (b). In the construction, we use $\sigma$ constructed as in Figure \ref{['Figure Good Analytic Function for Tanh']}. $\zeta$ is a scaling of an analytic bump function $\zeta_5$ defined in Corollary \ref{['Cor Analytic bump function']} with "base function" $f(x) = e^{|x|}$. Precisely, each function takes the form $\sigma(x) = \zeta_5(\alpha x) [\zeta_4(x) \tanh(x) + (1-\zeta_4(x))\tanh(x)] + (1 - \zeta_5(\alpha x)) e^{x^2}$, where $\alpha = 1.1, 1.3, 1.5, 2$ for the pink, yellow, orange, and green curves, respectively. As we can see, the approximation is almost indistinguishable from Tanh when $\alpha = 2$.

Theorems & Definitions (81)

  • Definition 2.1: Generalized neurons and generalized NN
  • Remark 2.1
  • Definition 2.2: fully-connected neural network
  • Remark 2.2
  • Definition 2.3: Function asymptotics
  • Definition 2.4: Hyper-polynomial growth
  • Definition 2.5: Hyper-exponential growth
  • Definition 2.6: Ordered growth
  • Lemma 3.1: dimension reduction
  • Proposition 3.1: functions of ordered growth are linearly independent
  • ...and 71 more