Handling Spatial-Temporal Data Heterogeneity for Federated Continual Learning via Tail Anchor
Hao Yu, Xin Yang, Le Zhang, Hanlin Gu, Tianrui Li, Lixin Fan, Qiang Yang
TL;DR
This work tackles the challenge of spatial-temporal data heterogeneity in Federated Continual Learning (FCL), which induces both parameter-forgetting and output-forgetting as inputs shift across clients and tasks. It introduces Federated Tail Anchor (FedTA), a method that combines a frozen pre-trained Vision Transformer with learnable Tail Anchors, plus four components—Input Enhancement, Selective Input Knowledge Fusion, and Best Global Prototype Selection—to stabilize feature representations and class positions across time and space. Through extensive experiments on CIFAR-100 and ImageNet-R with multiple clients and tasks, FedTA achieves superior forgetting mitigation and preserves the relative geometry of features, outperforming baselines and showing favorable privacy and efficiency characteristics due to reduced replay reliance. The approach offers a scalable, privacy-conscious pathway for robust FCL in dynamic, heterogeneous environments, with practical implications for edge devices and real-world continual learning deployments.
Abstract
Federated continual learning (FCL) allows each client to continually update its knowledge from task streams, enhancing the applicability of federated learning in real-world scenarios. However, FCL needs to address not only spatial data heterogeneity between clients but also temporal data heterogeneity between tasks. In this paper, empirical experiments demonstrate that such input-level heterogeneity significantly affects the model's internal parameters and outputs, leading to severe spatial-temporal catastrophic forgetting of local and previous knowledge. To this end, we propose Federated Tail Anchor (FedTA) to mix trainable Tail Anchor with the frozen output features to adjust their position in the feature space, thereby overcoming parameter-forgetting and output-forgetting. Three novel components are also included: Input Enhancement for improving the performance of pre-trained models on downstream tasks; Selective Input Knowledge Fusion for fusion of heterogeneous local knowledge on the server; and Best Global Prototype Selection for finding the best anchor point for each class in the feature space. Extensive experiments demonstrate that FedTA not only outperforms existing FCL methods but also effectively preserves the relative positions of features.
