Exact capacity of the \emph{wide} hidden layer treelike neural networks with generic activations

Mihailo Stojnic

Exact capacity of the \emph{wide} hidden layer treelike neural networks with generic activations

Mihailo Stojnic

TL;DR

The work addresses the exact memory capacity of wide hidden-layer treelike neural networks with generic hidden activations by applying a fully lifted Random Duality Theory (RDT) framework. It develops a general mathematical formalism that connects network memorization to a free-energy-like optimization and then computes the capacity in the $d oty$ limit for several activations (ReLU, quadratic, erf, tanh) through a multi-level lifting procedure. The results demonstrate remarkably fast convergence of the lifting (about 0.1% improvement by the third level) and, for each activation, exact matches to replica theory predictions at the first two lifting levels. The practical impact lies in providing explicit, closed-form capacity characterizations with substantially reduced numerical effort, enabling precise capacity assessments for wide treelike networks and guiding theoretical understanding of memory limits in such architectures.

Abstract

Recent progress in studying \emph{treelike committee machines} (TCM) neural networks (NN) in \cite{Stojnictcmspnncaprdt23,Stojnictcmspnncapliftedrdt23,Stojnictcmspnncapdiffactrdt23} showed that the Random Duality Theory (RDT) and its a \emph{partially lifted}(pl RDT) variant are powerful tools that can be used for very precise networks capacity analysis. Here, we consider \emph{wide} hidden layer networks and uncover that certain aspects of numerical difficulties faced in \cite{Stojnictcmspnncapdiffactrdt23} miraculously disappear. In particular, we employ recently developed \emph{fully lifted} (fl) RDT to characterize the \emph{wide} ($d\rightarrow \infty$) TCM nets capacity. We obtain explicit, closed form, capacity characterizations for a very generic class of the hidden layer activations. While the utilized approach significantly lowers the amount of the needed numerical evaluations, the ultimate fl RDT usefulness and success still require a solid portion of the residual numerical work. To get the concrete capacity values, we take four very famous activations examples: \emph{\textbf{ReLU}}, \textbf{\emph{quadratic}}, \textbf{\emph{erf}}, and \textbf{\emph{tanh}}. After successfully conducting all the residual numerical work for all of them, we uncover that the whole lifting mechanism exhibits a remarkably rapid convergence with the relative improvements no better than $\sim 0.1\%$ happening already on the 3-rd level of lifting. As a convenient bonus, we also uncover that the capacity characterizations obtained on the first and second level of lifting precisely match those obtained through the statistical physics replica theory methods in \cite{ZavPeh21} for the generic and in \cite{BalMalZech19} for the ReLU activations.

Exact capacity of the \emph{wide} hidden layer treelike neural networks with generic activations

TL;DR

limit for several activations (ReLU, quadratic, erf, tanh) through a multi-level lifting procedure. The results demonstrate remarkably fast convergence of the lifting (about 0.1% improvement by the third level) and, for each activation, exact matches to replica theory predictions at the first two lifting levels. The practical impact lies in providing explicit, closed-form capacity characterizations with substantially reduced numerical effort, enabling precise capacity assessments for wide treelike networks and guiding theoretical understanding of memory limits in such architectures.

Abstract

) TCM nets capacity. We obtain explicit, closed form, capacity characterizations for a very generic class of the hidden layer activations. While the utilized approach significantly lowers the amount of the needed numerical evaluations, the ultimate fl RDT usefulness and success still require a solid portion of the residual numerical work. To get the concrete capacity values, we take four very famous activations examples: \emph{\textbf{ReLU}}, \textbf{\emph{quadratic}}, \textbf{\emph{erf}}, and \textbf{\emph{tanh}}. After successfully conducting all the residual numerical work for all of them, we uncover that the whole lifting mechanism exhibits a remarkably rapid convergence with the relative improvements no better than

happening already on the 3-rd level of lifting. As a convenient bonus, we also uncover that the capacity characterizations obtained on the first and second level of lifting precisely match those obtained through the statistical physics replica theory methods in \cite{ZavPeh21} for the generic and in \cite{BalMalZech19} for the ReLU activations.

Paper Structure (34 sections, 6 theorems, 278 equations, 2 figures, 10 tables)

This paper contains 34 sections, 6 theorems, 278 equations, 2 figures, 10 tables.

Introduction
Architecture of the wide hidden layer generically activated NNs
Assumptions related to architecture and data
Contextualization within relevant prior work
Contributions
Mathematical formalism of network functioning
Connecting network functioning and (partially reciprocal) free energy
Network memorization through the prism of sfl RDT
Basics of sfl RDT
Fitting memorization into the sfl RDT machinery
Practical utilization and numerical evaluations
ReLU activations
$r=1$ -- first level of lifting
$r=2$ -- second level of lifting
$r=3$ -- third level of lifting
...and 19 more sections

Key Result

Lemma 1

(Stojnictcmspnncapdiffactrdt23 Algebraic optimization representation) Assume a 1-hidden layer TCM with architecture $A([n,d,1];{\bf f}^{(2)})$. Any given data set $\left ({\bf x}^{(0,k)},1\right )_{k=1:m}$ can not be properly memorized by the network if where and $X\triangleq ^T$.

Figures (2)

Figure 1: Memory capacity -- treelike nets with $d\rightarrow\infty$ hidden layer neurons; different activations
Figure 2: Memory capacity -- treelike nets with $d\rightarrow\infty$ hidden layer neurons; different activations

Theorems & Definitions (12)

Lemma 1
proof
Theorem 1
proof
Corollary 1
proof
Corollary 2
proof
Theorem 2
proof
...and 2 more

Exact capacity of the \emph{wide} hidden layer treelike neural networks with generic activations

TL;DR

Abstract

Exact capacity of the \emph{wide} hidden layer treelike neural networks with generic activations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (12)