New Evidence of the Two-Phase Learning Dynamics of Neural Networks
Zhanpeng Zhou, Yongyi Yang, Mahito Sugiyama, Junchi Yan
TL;DR
The paper investigates how deep neural networks learn by introducing an interval-wise analysis framework that compares network states across training time windows. It uncovers two robust, two-phase phenomena: the Chaos Effect, where small parameter perturbations before an inflection point can cause large divergences, and the Cone Effect, where after the transition the functional trajectory is confined to a narrow cone in function space as the eNTK evolves. The study uses CIFAR-10 experiments with VGG-16 and ResNet-20 and defines metrics such as parameter dissimilarity, kernel distance, loss barriers, and disagreement rate to characterize the dynamics. These findings offer a structural dynamical view of learning—late training remains nonlinear yet constrained—highlighting practical implications for training strategies and setting directions for future theoretical analysis.
Abstract
Understanding how deep neural networks learn remains a fundamental challenge in modern machine learning. A growing body of evidence suggests that training dynamics undergo a distinct phase transition, yet our understanding of this transition is still incomplete. In this paper, we introduce an interval-wise perspective that compares network states across a time window, revealing two new phenomena that illuminate the two-phase nature of deep learning. i) \textbf{The Chaos Effect.} By injecting an imperceptibly small parameter perturbation at various stages, we show that the response of the network to the perturbation exhibits a transition from chaotic to stable, suggesting there is an early critical period where the network is highly sensitive to initial conditions; ii) \textbf{The Cone Effect.} Tracking the evolution of the empirical Neural Tangent Kernel (eNTK), we find that after this transition point the model's functional trajectory is confined to a narrow cone-shaped subset: while the kernel continues to change, it gets trapped into a tight angular region. Together, these effects provide a structural, dynamical view of how deep networks transition from sensitive exploration to stable refinement during training.
