Table of Contents
Fetching ...

On the Convergence of Continual Federated Learning Using Incrementally Aggregated Gradients

Satish Kumar Keshri, Nazreen Shah, Ranjitha Prasad

TL;DR

This paper tackles continual federated learning (CFL) under streaming, non-stationary data by proposing C-FLAG, a replay-memory CFL method that combines edge-based gradient updates on memory with aggregated gradients on current data via Incrementally Aggregated Gradients (IAG). It proves non-convex convergence to a stationary point at a sublinear rate $O\left(\tfrac{1}{\sqrt{T}}\right)$ and introduces adaptive learning rates to minimize catastrophic forgetting through a forgetting term $\Gamma(t)$. Empirically, C-FLAG achieves superior average accuracy and reduced forgetting compared to state-of-the-art baselines on task-incremental and class-incremental benchmarks across IID and non-IID partitions, with ablations showing robustness to memory size and client heterogeneity. The method preserves privacy by keeping memory on-device and only exchanging gradient information, though it incurs additional communication for gradient statistics. Overall, the work provides a theoretically grounded, scalable strategy for privacy-preserving CFL with strong forgetting mitigation and practical adaptive-rate mechanisms.

Abstract

The holy grail of machine learning is to enable Continual Federated Learning (CFL) to enhance the efficiency, privacy, and scalability of AI systems while learning from streaming data. The primary challenge of a CFL system is to overcome global catastrophic forgetting, wherein the accuracy of the global model trained on new tasks declines on the old tasks. In this work, we propose Continual Federated Learning with Aggregated Gradients (C-FLAG), a novel replay-memory based federated strategy consisting of edge-based gradient updates on memory and aggregated gradients on the current data. We provide convergence analysis of the C-FLAG approach which addresses forgetting and bias while converging at a rate of $O(1/\sqrt{T})$ over $T$ communication rounds. We formulate an optimization sub-problem that minimizes catastrophic forgetting, translating CFL into an iterative algorithm with adaptive learning rates that ensure seamless learning across tasks. We empirically show that C-FLAG outperforms several state-of-the-art baselines on both task and class-incremental settings with respect to metrics such as accuracy and forgetting.

On the Convergence of Continual Federated Learning Using Incrementally Aggregated Gradients

TL;DR

This paper tackles continual federated learning (CFL) under streaming, non-stationary data by proposing C-FLAG, a replay-memory CFL method that combines edge-based gradient updates on memory with aggregated gradients on current data via Incrementally Aggregated Gradients (IAG). It proves non-convex convergence to a stationary point at a sublinear rate and introduces adaptive learning rates to minimize catastrophic forgetting through a forgetting term . Empirically, C-FLAG achieves superior average accuracy and reduced forgetting compared to state-of-the-art baselines on task-incremental and class-incremental benchmarks across IID and non-IID partitions, with ablations showing robustness to memory size and client heterogeneity. The method preserves privacy by keeping memory on-device and only exchanging gradient information, though it incurs additional communication for gradient statistics. Overall, the work provides a theoretically grounded, scalable strategy for privacy-preserving CFL with strong forgetting mitigation and practical adaptive-rate mechanisms.

Abstract

The holy grail of machine learning is to enable Continual Federated Learning (CFL) to enhance the efficiency, privacy, and scalability of AI systems while learning from streaming data. The primary challenge of a CFL system is to overcome global catastrophic forgetting, wherein the accuracy of the global model trained on new tasks declines on the old tasks. In this work, we propose Continual Federated Learning with Aggregated Gradients (C-FLAG), a novel replay-memory based federated strategy consisting of edge-based gradient updates on memory and aggregated gradients on the current data. We provide convergence analysis of the C-FLAG approach which addresses forgetting and bias while converging at a rate of over communication rounds. We formulate an optimization sub-problem that minimizes catastrophic forgetting, translating CFL into an iterative algorithm with adaptive learning rates that ensure seamless learning across tasks. We empirically show that C-FLAG outperforms several state-of-the-art baselines on both task and class-incremental settings with respect to metrics such as accuracy and forgetting.

Paper Structure

This paper contains 21 sections, 18 theorems, 111 equations, 7 figures, 8 tables, 3 algorithms.

Key Result

Lemma 1

Suppose that the assumptions asp: L-smoothness, asp: bounded-bias hold, $\alpha_{t} < \frac{2}{L(1+m)}$ and $m\in\mathbb{R}^{+}$. For the sequence $\{\mathbf{x}_{t}\}_{t=1}^{T}$ generated by algorithm alg: A, we have where $B(t)$ is the overfitting term defined as Further, $\Gamma(t)$ is the forgetting term defined as

Figures (7)

  • Figure 1: (Left) Illustration of C-FLAG: Initialised at the optimal point of the previous tasks $\mathbf{x}^*_\mathcal{P} = \mathbf{x}_0$, at the $t$-th iteration, $i$-th client takes $E$ local steps towards its local optimal regions (pink regions). To balance learning and forgetting, C-FLAG takes a single step towards local memory and $E$ steps on the local current data. The global aggregated model moves towards a common global minima $\mathbf{x}^*_{\mathcal{P}\cup\mathcal{C}}$. (Right) Real-time surveillance where a subset of previous tasks are stored in memory until $T=0$. Data arriving thereafter is the current task $\mathcal{C}^i$.
  • Figure 2: Average accuracy across tasks for IID splits of Split-CIFAR10 (Left) and Split-CIFAR100 (Right).
  • Figure 3: Varying heterogeneity for C-FLAG, EWC-FL and FedTrk techniques on Split-CIFAR10 (Top) and Split-CIFAR100 (Bottom).
  • Figure 4: Evolution of $\Gamma(t)$ against progressing tasks on Split-CIFAR10 dataset for varying heterogeneity.
  • Figure 5: Varying clients (left) and varying memory sample size (right) for C-FLAG on non-IID partitions of Split-CIFAR10 and Split-CIFAR100 dataset.
  • ...and 2 more figures

Theorems & Definitions (29)

  • Lemma 1
  • Theorem 2
  • Lemma 3
  • Lemma 4
  • Theorem 5
  • Lemma 6
  • Lemma 7
  • Lemma 8
  • proof
  • Lemma 9
  • ...and 19 more