Quadratic models for understanding catapult dynamics of neural networks
Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin
TL;DR
This paper introduces Neural Quadratic Models (NQMs) as a second-order Taylor approximation of neural networks to study optimization and generalization beyond the infinite-width linear regime. It proves that NQMs can exhibit catapult dynamics under large learning rates, derives detailed single- and multi-example dynamics, and connects these behaviors to broader general quadratic models and wide networks. Empirically, NQMs mirror neural networks in generalization improvements observed in the catapult regime, across various architectures and datasets, and outperform the linear NTK benchmark in this regime. The work suggests that quadratic models provide a tractable, informative lens for understanding finite-width neural networks and motivates future exploration of their induced kernels and potential for representation learning.
Abstract
While neural networks can be approximated by linear models as their width increases, certain properties of wide neural networks cannot be captured by linear models. In this work we show that recently proposed Neural Quadratic Models can exhibit the "catapult phase" [Lewkowycz et al. 2020] that arises when training such models with large learning rates. We then empirically show that the behaviour of neural quadratic models parallels that of neural networks in generalization, especially in the catapult phase regime. Our analysis further demonstrates that quadratic models can be an effective tool for analysis of neural networks.
