On the Cone Effect in the Learning Dynamics

Zhanpeng Zhou; Yongyi Yang; Jie Ren; Mahito Sugiyama; Junchi Yan

On the Cone Effect in the Learning Dynamics

Zhanpeng Zhou, Yongyi Yang, Jie Ren, Mahito Sugiyama, Junchi Yan

TL;DR

The paper investigates how the empirical Neural Tangent Kernel ($eNTK$) evolves during real-world neural network training, identifying a two-phase learning pattern: Phase I with strong nonlinear dynamics (the rich regime) and Phase II with continued but constrained evolution (the cone effect) that yields advantages over fully linearized training. By formalizing training via gradient flow and introducing metrics like the kernel distance $S(oldsymbol{ heta}, oldsymbol{ heta}')$ and kernel velocity $v(t)$, the authors empirically demonstrate that Phase II exhibits a constrained evolution of the $eNTK$, forming a cone-like trajectory in the kernel space. They show non-linear benefits of the cone effect through switching experiments that start with standard training and then switch to linearized training, achieving better performance than entirely lazy training. The work suggests that the cone effect is not universal and calls for identifying factors that govern its emergence, highlighting implications for understanding feature learning and designing training protocols beyond the lazy regime.

Abstract

Understanding the learning dynamics of neural networks is a central topic in the deep learning community. In this paper, we take an empirical perspective to study the learning dynamics of neural networks in real-world settings. Specifically, we investigate the evolution process of the empirical Neural Tangent Kernel (eNTK) during training. Our key findings reveal a two-phase learning process: i) in Phase I, the eNTK evolves significantly, signaling the rich regime, and ii) in Phase II, the eNTK keeps evolving but is constrained in a narrow space, a phenomenon we term the cone effect. This two-phase framework builds on the hypothesis proposed by Fort et al. (2020), but we uniquely identify the cone effect in Phase II, demonstrating its significant performance advantages over fully linearized training.

On the Cone Effect in the Learning Dynamics

TL;DR

The paper investigates how the empirical Neural Tangent Kernel (

) evolves during real-world neural network training, identifying a two-phase learning pattern: Phase I with strong nonlinear dynamics (the rich regime) and Phase II with continued but constrained evolution (the cone effect) that yields advantages over fully linearized training. By formalizing training via gradient flow and introducing metrics like the kernel distance

and kernel velocity

, the authors empirically demonstrate that Phase II exhibits a constrained evolution of the

, forming a cone-like trajectory in the kernel space. They show non-linear benefits of the cone effect through switching experiments that start with standard training and then switch to linearized training, achieving better performance than entirely lazy training. The work suggests that the cone effect is not universal and calls for identifying factors that govern its emergence, highlighting implications for understanding feature learning and designing training protocols beyond the lazy regime.

On the Cone Effect in the Learning Dynamics

TL;DR

Abstract

On the Cone Effect in the Learning Dynamics

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)