Generalisation Guarantees for Continual Learning with Orthogonal Gradient Descent
Mehdi Abbana Bennani, Thang Doan, Masashi Sugiyama
TL;DR
This work develops a theoretical NTK-based framework for Continual Learning and analyzes Orthogonal Gradient Descent (OGD) within that regime. It derives a closed-form, recursive kernel regression description across tasks, establishing linear convergence under suitable learning-rate conditions and proving a no-forgetting property for OGD with infinite memory. The authors also provide generalisation bounds showing task similarity, via the NTK, governs learning performance, and they demonstrate that NTK variation can limit forgetting protection, motivating the OGD+ variant. Empirical results on standard benchmarks validate the theory, revealing that over-parameterization and controlled NTK drift improve robustness to forgetting, while curriculum-like task sequences can influence generalisation through task similarity. Overall, the paper links Continual Learning, kernel methods, and curriculum design to offer provable learning guarantees and practical insights for managing forgetting in sequential task settings.
Abstract
In Continual Learning settings, deep neural networks are prone to Catastrophic Forgetting. Orthogonal Gradient Descent was proposed to tackle the challenge. However, no theoretical guarantees have been proven yet. We present a theoretical framework to study Continual Learning algorithms in the Neural Tangent Kernel regime. This framework comprises closed form expression of the model through tasks and proxies for Transfer Learning, generalisation and tasks similarity. In this framework, we prove that OGD is robust to Catastrophic Forgetting then derive the first generalisation bound for SGD and OGD for Continual Learning. Finally, we study the limits of this framework in practice for OGD and highlight the importance of the Neural Tangent Kernel variation for Continual Learning with OGD.
