Gradients are Not All You Need
Luke Metz, C. Daniel Freeman, Samuel S. Schoenholz, Tal Kachman
TL;DR
This work analyzes why differentiating through iterative dynamical systems can fail due to chaotic dynamics, tracing failures to the spectrum of the recurrent Jacobian and demonstrating gradient explosion across neural, physical, and learning-to-leach domains. It surveys a spectrum of remedies, from designing well-behaved systems and employing proxy objectives to truncation, gradient clipping, and ergodic-system methods, while also highlighting black-box gradient alternatives. The key contribution is a spectrum-based diagnostic plus a practical toolbox for mitigating gradient pathologies in differentiable simulations, with empirical demonstrations in physics, meta-learning, and molecular dynamics. The findings urge practitioners to assess the Jacobian spectrum before applying end-to-end differentiation and to adopt alternative gradient strategies when chaos dominates, thereby providing a spectrum-aware path to robust optimization in chaotic systems.
Abstract
Differentiable programming techniques are widely used in the community and are responsible for the machine learning renaissance of the past several decades. While these methods are powerful, they have limits. In this short report, we discuss a common chaos based failure mode which appears in a variety of differentiable circumstances, ranging from recurrent neural networks and numerical physics simulation to training learned optimizers. We trace this failure to the spectrum of the Jacobian of the system under study, and provide criteria for when a practitioner might expect this failure to spoil their differentiation based optimization algorithms.
