Learning Rate Scheduling with Matrix Factorization for Private Training
Nikita P. Kalinin, Joel Daniel Andersson
TL;DR
This work addresses private training under differential privacy when using learning rate schedules. It develops general upper and lower bounds for MaxSE and MeanSE for a broad class of schedulers and introduces a learning-rate–aware Toeplitz factorization that is memory-efficient. Theoretical results show optimal or improved error rates for exponential decays, with multi-epoch extensions via banded inverses, and experiments on CIFAR-10 and IMDB validate accuracy gains over baseline prefix-sum factorizations. The findings advance private training by marrying practical LR schedules with correlated noise through tailored factorizations, enabling higher utility under strict privacy constraints.
Abstract
We study differentially private model training with stochastic gradient descent under learning rate scheduling and correlated noise. Although correlated noise, in particular via matrix factorizations, has been shown to improve accuracy, prior theoretical work focused primarily on the prefix-sum workload. That workload assumes a constant learning rate, whereas in practice learning rate schedules are widely used to accelerate training and improve convergence. We close this gap by deriving general upper and lower bounds for a broad class of learning rate schedules in both single- and multi-epoch settings. Building on these results, we propose a learning-rate-aware factorization that achieves improvements over prefix-sum factorizations under both MaxSE and MeanSE error metrics. Our theoretical analysis yields memory-efficient constructions suitable for practical deployment, and experiments on CIFAR-10 and IMDB datasets confirm that schedule-aware factorizations improve accuracy in private training.
