Temporal-Difference Variational Continual Learning
Luckeciano C. Melo, Alessandro Abate, Yarin Gal
TL;DR
This work tackles catastrophic forgetting in Bayesian continual learning by addressing compounding posterior-approximation errors in Variational Continual Learning (VCL). It introduces Temporal-Difference Variational Continual Learning (TD-VCL), a family of objectives that bootstraps posterior updates using multiple past posteriors and draws a principled link to Temporal-Difference methods. The framework includes two concrete instantiations: n-Step KL Regularization and TD($\lambda$)-VCL, which represent a spectrum from vanilla VCL to multi-step KL regularization, effectively balancing plasticity and memory stability. Empirical results on hard CL benchmarks demonstrate that TD-VCL and its TD variants outperform strong Bayesian baselines, with robustness to boundary assumptions and favorable per-task performance, highlighting practical impact for uncertainty-aware continual learning systems. Overall, the approach unites variational inference and TD bootstrapping to yield scalable, memory-stable continual learners with improved resistance to forgetting in non-stationary environments.
Abstract
Machine Learning models in real-world applications must continuously learn new tasks to adapt to shifts in the data-generating distribution. Yet, for Continual Learning (CL), models often struggle to balance learning new tasks (plasticity) with retaining previous knowledge (memory stability). Consequently, they are susceptible to Catastrophic Forgetting, which degrades performance and undermines the reliability of deployed systems. In the Bayesian CL literature, variational methods tackle this challenge by employing a learning objective that recursively updates the posterior distribution while constraining it to stay close to its previous estimate. Nonetheless, we argue that these methods may be ineffective due to compounding approximation errors over successive recursions. To mitigate this, we propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations, preventing individual errors from dominating future posterior updates and compounding over time. We reveal insightful connections between these objectives and Temporal-Difference methods, a popular learning mechanism in Reinforcement Learning and Neuroscience. Experiments on challenging CL benchmarks show that our approach effectively mitigates Catastrophic Forgetting, outperforming strong Variational CL methods.
