Buffer-based Gradient Projection for Continual Federated Learning

Shenghong Dai; Jy-yong Sohn; Yicong Chen; S M Iftekharul Alam; Ravikumar Balakrishnan; Suman Banerjee; Nageen Himayat; Kangwook Lee

Buffer-based Gradient Projection for Continual Federated Learning

Shenghong Dai, Jy-yong Sohn, Yicong Chen, S M Iftekharul Alam, Ravikumar Balakrishnan, Suman Banerjee, Nageen Himayat, Kangwook Lee

TL;DR

This work tackles continual Federated Learning (CFL) under non-IID data and without explicit task boundaries. It introduces Fed-A-GEM, a buffer-based gradient projection method that uses a globally aggregated buffer gradient to constrain local updates via a projection step, mitigating catastrophic forgetting while remaining compatible with existing CFL techniques. Empirical results across image and text benchmarks show consistent accuracy gains and reduced forgetting, including notable improvements on CIFAR-100 task-incremental learning, Tiny-ImageNet, and YahooQA, with modest computational overhead and flexible communication strategies. The approach enables boundary-free, privacy-conscious continual learning in federated settings and offers practical benefits for real-world streaming data scenarios, while suggesting directions for tighter buffers and stronger privacy guarantees in the future.

Abstract

Continual Federated Learning (CFL) is essential for enabling real-world applications where multiple decentralized clients adaptively learn from continuous data streams. A significant challenge in CFL is mitigating catastrophic forgetting, where models lose previously acquired knowledge when learning new information. Existing approaches often face difficulties due to the constraints of device storage capacities and the heterogeneous nature of data distributions among clients. While some CFL algorithms have addressed these challenges, they frequently rely on unrealistic assumptions about the availability of task boundaries (i.e., knowing when new tasks begin). To address these limitations, we introduce Fed-A-GEM, a federated adaptation of the A-GEM method (Chaudhry et al., 2019), which employs a buffer-based gradient projection approach. Fed-A-GEM alleviates catastrophic forgetting by leveraging local buffer samples and aggregated buffer gradients, thus preserving knowledge across multiple clients. Our method is combined with existing CFL techniques, enhancing their performance in the CFL context. Our experiments on standard benchmarks show consistent performance improvements across diverse scenarios. For example, in a task-incremental learning scenario using the CIFAR-100 dataset, our method can increase the accuracy by up to 27%. Our code is available at https://github.com/shenghongdai/Fed-A-GEM.

Buffer-based Gradient Projection for Continual Federated Learning

TL;DR

Abstract

Paper Structure (38 sections, 5 equations, 8 figures, 19 tables, 6 algorithms)

This paper contains 38 sections, 5 equations, 8 figures, 19 tables, 6 algorithms.

Introduction
Related Work
Continual Learning (CL)
Regularization-based methods
Architecture-based methods
Replay-based methods
General Continual Learning (GCL)
Federated Learning (FL)
Continual Federated Learning (CFL)
Preliminaries
Fed-A-GEM
Experiments
Image Classification
Settings
Overall Results
...and 23 more sections

Figures (8)

Figure 1: Challenge of Catastrophic Forgetting in Continual Federated Learning. As a motivating example, this figure shows different driving scenarios encountered by clients in an autonomous vehicle network. Each client faces diverse, dynamic environments, causing vision detection models to forget previous knowledge when learning new tasks. This issue is aggravated by the lack of task boundaries, limited buffer sizes, and non-IID data distribution across clients.
Figure 2: Illustration of the gradient projection in Eq. \ref{['proj']}. If the angle between the gradient update $g$ and global buffer gradient (considered as a reference) $g_\text{ref}$ is larger than 90$^{\circ}$, we project $g$ to minimize the interference and merely update along the directions of $\tilde{g}$ that is orthogonal to $g_\text{ref}$.
Figure 3: Change in accuracy (%) for task 1 upon completion of subsequent tasks for different buffer sizes, under S-CIFAR100 (Task-IL) setup. Fed-A-GEM with a larger buffer size ($B$) more effectively mitigates forgetting of task 1.
Figure 4: Evaluating accuracy ($\uparrow$) and forgetting ($\downarrow$) in multiple datasets with and without Fed-A-GEM using a buffer size of 200. The solid lines indicate the results obtained with our method, while the dotted lines represent the results obtained without our method. The results show a significant improvement in accuracy as well as reduced forgetting for all settings.
Figure 5: Class-IL Accuracy (%) of current task for FL and FedGP on the sequential-CIFAR100
...and 3 more figures

Buffer-based Gradient Projection for Continual Federated Learning

TL;DR

Abstract

Buffer-based Gradient Projection for Continual Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)