Learning Rate Schedules in the Presence of Distribution Shift
Matthew Fahrbach, Adel Javanmard, Vahab Mirrokni, Pratik Worah
TL;DR
This work investigates how to set learning-rate schedules for SGD when data distributions shift over time. It introduces a dynamic-regret framework and develops an SDE-based analysis for online linear regression to derive optimal adaptive schedules, then extends to general convex and non-convex losses with provable upper and lower bounds that scale with distribution shift and gradient noise. The results show that optimal rates typically increase with shift, and they demonstrate the effectiveness of the proposed schedules in high-dimensional regression and a flow cytometry streaming task, including neural-network examples. The findings offer practical guidance for online learning systems facing non-stationary environments, connecting online optimization, stochastic calculus, and control-theoretic planning to design robust, shift-aware training strategies.
Abstract
We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift and we give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically increase in the presence of distribution shift. Finally, we provide experiments for high-dimensional regression models and neural networks to illustrate these learning rate schedules and their cumulative regret.
