Task-Agnostic Federated Continual Learning via Replay-Free Gradient Projection
Seohyeon Cha, Huancheng Chen, Haris Vikalo
TL;DR
FedProTIP tackles federated continual learning under task-agnostic inference by combining gradient projection with a memory of subspace bases and a lightweight task-identity predictor. Each client performs PGD to avoid erasing past-task features, extracts compact local core bases via randomized SVD, and contributes them to a global feature subspace that guides future updates. Inference relies on subspace relevance to predict the current task and route outputs, enabling dynamic head selection without replay, generative models, or labeled task IDs. Across CIFAR100, DomainNet, and ImageNet-R, FedProTIP achieves superior average accuracy and reduced forgetting, while markedly improving communication and computation efficiency relative to state-of-the-art FCL methods, making it practical for privacy-conscious, heterogeneous FL deployments.
Abstract
Federated continual learning (FCL) enables distributed client devices to learn from streaming data across diverse and evolving tasks. A major challenge to continual learning, catastrophic forgetting, is exacerbated in decentralized settings by the data heterogeneity, constrained communication and privacy concerns. We propose Federated gradient Projection-based Continual Learning with Task Identity Prediction (FedProTIP), a novel FCL framework that mitigates forgetting by projecting client updates onto the orthogonal complement of the subspace spanned by previously learned representations of the global model. This projection reduces interference with earlier tasks and preserves performance across the task sequence. To further address the challenge of task-agnostic inference, we incorporate a lightweight mechanism that leverages core bases from prior tasks to predict task identity and dynamically adjust the global model's outputs. Extensive experiments across standard FCL benchmarks demonstrate that FedProTIP significantly outperforms state-of-the-art methods in average accuracy, particularly in settings where task identities are a priori unknown.
