Network Dynamics-Based Framework for Understanding Deep Neural Networks
Yuchen Lin, Yong Zhang, Sihan Feng, Hong Zhao
TL;DR
This work proposes a dynamical-systems framework for deep learning centered on two neuron-level transformation modes: order-preserving OPT and non-order-preserving NPT. It introduces Rank Probability Distribution (RPD) and Linear Substitution Map (L-Map) to quantify layer-wise nonlinearity and linearization, and defines attraction basins in both sample and weight spaces to assess robustness and stability. Through analyses of shallow networks and deeper DNNs, the study links OPT/NPT balance and basin dynamics to learning phases, depth/width effects, and phenomena like grokking, showing BN and training strategies crucially shape phase transitions. The framework offers actionable insights for architecture design, initialization schemes, and training protocols to optimize generalization and stability in deep learning systems.
Abstract
Advancements in artificial intelligence call for a deeper understanding of the fundamental mechanisms underlying deep learning. In this work, we propose a theoretical framework to analyze learning dynamics through the lens of dynamical systems theory. We redefine the notions of linearity and nonlinearity in neural networks by introducing two fundamental transformation units at the neuron level: order-preserving transformations and non-order-preserving transformations. Different transformation modes lead to distinct collective behaviors in weight vector organization, different modes of information extraction, and the emergence of qualitatively different learning phases. Transitions between these phases may occur during training, accounting for key phenomena such as grokking. To further characterize generalization and structural stability, we introduce the concept of attraction basins in both sample and weight spaces. The distribution of neurons with different transformation modes across layers, along with the structural characteristics of the two types of attraction basins, forms a set of core metrics for analyzing the performance of learning models. Hyperparameters such as depth, width, learning rate, and batch size act as control variables for fine-tuning these metrics. Our framework not only sheds light on the intrinsic advantages of deep learning, but also provides a novel perspective for optimizing network architectures and training strategies.
