Gradient Descent as Loss Landscape Navigation: a Normative Framework for Deriving Learning Rules
John J. Vastola, Samuel J. Gershman, Kanaka Rajan
TL;DR
This work introduces a normative, continuous-time framework that treats learning rules as optimal-control policies navigating partially observable loss landscapes. By varying planning horizon, ambient geometry, and belief updating, it unifies gradient descent, momentum, natural gradient descent, Adam, and continual-learning strategies as special cases of a single objective. The approach clarifies how geometry and observability shape learning dynamics, and provides principled grounds for deriving or comparing learning rules beyond empirical tuning. It also connects these ideas to physics and biology, suggesting broader implications for designing adaptive algorithms and interpreting brain-inspired plasticity under realistic constraints.
Abstract
Learning rules -- prescriptions for updating model parameters to improve performance -- are typically assumed rather than derived. Why do some learning rules work better than others, and under what assumptions can a given rule be considered optimal? We propose a theoretical framework that casts learning rules as policies for navigating (partially observable) loss landscapes, and identifies optimal rules as solutions to an associated optimal control problem. A range of well-known rules emerge naturally within this framework under different assumptions: gradient descent from short-horizon optimization, momentum from longer-horizon planning, natural gradients from accounting for parameter space geometry, non-gradient rules from partial controllability, and adaptive optimizers like Adam from online Bayesian inference of loss landscape shape. We further show that continual learning strategies like weight resetting can be understood as optimal responses to task uncertainty. By unifying these phenomena under a single objective, our framework clarifies the computational structure of learning and offers a principled foundation for designing adaptive algorithms.
