Unified Description of Learning Dynamics in the Soft Committee Machine from Finite to Ultra-Wide Regimes

Assem Afanah; Bernd Rosenow

Unified Description of Learning Dynamics in the Soft Committee Machine from Finite to Ultra-Wide Regimes

Assem Afanah, Bernd Rosenow

TL;DR

This work analyzes the soft committee machine (SCM) with ReLU activation in a student–teacher setting using an annealed statistical-mechanics framework. By introducing aggregated order parameters $(\tilde{Q}, \tilde{R}, \tilde{r})$, the authors derive a unified description that remains valid from the conventional regime $K \ll N$ to the ultra-wide regime $K \ge N$, with the dataset size encoded in $α$ and the teacher density in $γ = M/N$. A central result is a second-order phase transition at $α_c \approx 2π$ for $γ \ll 1$, while finite $γ$ erases the sharp transition and yields a smooth decrease of the generalization error $ε_g$, which in the high-data limit scales as $ε_g \propto 1/α$ independent of $K$ and $γ$. The framework integrates known results for ReLU SCMs, demonstrates universal high-data behavior, and suggests extensions to other activations and quenched analyses, highlighting how network dimensions influence learning dynamics in shallow networks.

Abstract

We study the learning dynamics of the soft committee machine (SCM) with Rectified Linear Unit (ReLU) activation using a statistical-mechanics approach within the annealed approximation. The SCM consists of a student network with $N$ input units and $K$ hidden units trained to reproduce the output of a teacher network with $M$ hidden units. We introduce a reduced set of macroscopic order parameters that yields a unified description valid from the conventional regime $K \ll N$ to the ultra-wide limit $K \ge N$. The control parameter $α$, proportional to the ratio of training samples to adjustable weights, serves as an effective measure of dataset size. For small $γ= M/N$, we recover a continuous phase transition at $α_{c} \approx 2π$ from an unspecialized, permutation-symmetric state to a specialized state in which student units align with the teacher. For finite $γ$, the transition disappears and the generalization error decreases smoothly with dataset size, reaching a low plateau when $γ=1$. In the asymptotic limit $α\to \infty$, the error scales as $\varepsilon_{g} \propto 1/α$, independent of $γ$ and $K$. The results highlight the central role of network dimensions in SCM learning and provide a framework extendable to other activations and quenched analyses.

Unified Description of Learning Dynamics in the Soft Committee Machine from Finite to Ultra-Wide Regimes

TL;DR

This work analyzes the soft committee machine (SCM) with ReLU activation in a student–teacher setting using an annealed statistical-mechanics framework. By introducing aggregated order parameters

, the authors derive a unified description that remains valid from the conventional regime

to the ultra-wide regime

, with the dataset size encoded in

and the teacher density in

. A central result is a second-order phase transition at

for

, while finite

erases the sharp transition and yields a smooth decrease of the generalization error

, which in the high-data limit scales as

independent of

and

. The framework integrates known results for ReLU SCMs, demonstrates universal high-data behavior, and suggests extensions to other activations and quenched analyses, highlighting how network dimensions influence learning dynamics in shallow networks.

Abstract

input units and

hidden units trained to reproduce the output of a teacher network with

hidden units. We introduce a reduced set of macroscopic order parameters that yields a unified description valid from the conventional regime

to the ultra-wide limit

. The control parameter

, proportional to the ratio of training samples to adjustable weights, serves as an effective measure of dataset size. For small

, we recover a continuous phase transition at

from an unspecialized, permutation-symmetric state to a specialized state in which student units align with the teacher. For finite

, the transition disappears and the generalization error decreases smoothly with dataset size, reaching a low plateau when

. In the asymptotic limit

, the error scales as

, independent of

and

. The results highlight the central role of network dimensions in SCM learning and provide a framework extendable to other activations and quenched analyses.

Unified Description of Learning Dynamics in the Soft Committee Machine from Finite to Ultra-Wide Regimes

TL;DR

Abstract

Unified Description of Learning Dynamics in the Soft Committee Machine from Finite to Ultra-Wide Regimes

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)