On the Convergence of Continual Federated Learning Using Incrementally Aggregated Gradients
Satish Kumar Keshri, Nazreen Shah, Ranjitha Prasad
TL;DR
This paper tackles continual federated learning (CFL) under streaming, non-stationary data by proposing C-FLAG, a replay-memory CFL method that combines edge-based gradient updates on memory with aggregated gradients on current data via Incrementally Aggregated Gradients (IAG). It proves non-convex convergence to a stationary point at a sublinear rate $O\left(\tfrac{1}{\sqrt{T}}\right)$ and introduces adaptive learning rates to minimize catastrophic forgetting through a forgetting term $\Gamma(t)$. Empirically, C-FLAG achieves superior average accuracy and reduced forgetting compared to state-of-the-art baselines on task-incremental and class-incremental benchmarks across IID and non-IID partitions, with ablations showing robustness to memory size and client heterogeneity. The method preserves privacy by keeping memory on-device and only exchanging gradient information, though it incurs additional communication for gradient statistics. Overall, the work provides a theoretically grounded, scalable strategy for privacy-preserving CFL with strong forgetting mitigation and practical adaptive-rate mechanisms.
Abstract
The holy grail of machine learning is to enable Continual Federated Learning (CFL) to enhance the efficiency, privacy, and scalability of AI systems while learning from streaming data. The primary challenge of a CFL system is to overcome global catastrophic forgetting, wherein the accuracy of the global model trained on new tasks declines on the old tasks. In this work, we propose Continual Federated Learning with Aggregated Gradients (C-FLAG), a novel replay-memory based federated strategy consisting of edge-based gradient updates on memory and aggregated gradients on the current data. We provide convergence analysis of the C-FLAG approach which addresses forgetting and bias while converging at a rate of $O(1/\sqrt{T})$ over $T$ communication rounds. We formulate an optimization sub-problem that minimizes catastrophic forgetting, translating CFL into an iterative algorithm with adaptive learning rates that ensure seamless learning across tasks. We empirically show that C-FLAG outperforms several state-of-the-art baselines on both task and class-incremental settings with respect to metrics such as accuracy and forgetting.
