Noise-induced degeneration in online learning
Yuzuru Sato, Daiji Tsutsui, Akio Fujiwara
TL;DR
This work analyzes plateau phenomena in online SGD for a minimal three-layer perceptron (Fukumizu-Amari model) through the lens of random dynamical systems. It shows that, under finite data, SGD trajectories are globally attracted to degenerate subspaces and can exhibit noise-induced degeneration that further confines dynamics to multiply degenerated manifolds, with local attraction governed by a two-dimensional map. A key finding is the existence of an optimal noise level that minimizes the escape time from the degenerated subspace, contrasting with traditional Kramers-type escape pictures. The results suggest that degeneration and noise interactions are fundamental to online learning dynamics and may help explain generalization and behavior in larger, deeper networks.
Abstract
In order to elucidate the plateau phenomena caused by vanishing gradient, we herein analyse stability of stochastic gradient descent near degenerated subspaces in a multi-layer perceptron. In stochastic gradient descent for Fukumizu-Amari model, which is the minimal multi-layer perceptron showing non-trivial plateau phenomena, we show that (1) attracting regions exist in multiply degenerated subspaces, (2) a strong plateau phenomenon emerges as a noise-induced synchronisation, which is not observed in deterministic gradient descent, (3) an optimal fluctuation exists to minimise the escape time from the degenerated subspace. The noise-induced degeneration observed herein is expected to be found in a broad class of machine learning via neural networks.
