Federated Class-Incremental Learning with New-Class Augmented Self-Distillation
Zhiyuan Wu, Tianliu He, Sheng Sun, Yuwei Wang, Min Liu, Bo Gao, Xuefeng Jiang
TL;DR
The paper tackles catastrophic forgetting in federated class-incremental learning (FCIL) where data volume and class diversity grow over time. It proposes FedCLASS, a method that augments historical old-class logits with current new-class predictions to form a scale-aware self-distillation target, optimized via a joint loss $J^k_{Aug}=J_{CE}^k + \beta J_{KD-Aug}^k$. The authors provide a theoretical framework with assumptions and a theorem showing the augmented distillation aligns with conditional-probability modeling for old and new classes, establishing soundness. Empirically, FedCLASS achieves superior global accuracy and lower forgetting rates across multiple datasets and task settings, outperforming FedAvg and several FCIL baselines. This work enables more robust, privacy-preserving knowledge transfer in FL under evolving class distributions and memory constraints, with potential extensions to larger task horizons and long-term forgetting mitigation.
Abstract
Federated Learning (FL) enables collaborative model training among participants while guaranteeing the privacy of raw data. Mainstream FL methodologies overlook the dynamic nature of real-world data, particularly its tendency to grow in volume and diversify in classes over time. This oversight results in FL methods suffering from catastrophic forgetting, where the trained models inadvertently discard previously learned information upon assimilating new data. In response to this challenge, we propose a novel Federated Class-Incremental Learning (FCIL) method, named \underline{Fed}erated \underline{C}lass-Incremental \underline{L}earning with New-Class \underline{A}ugmented \underline{S}elf-Di\underline{S}tillation (FedCLASS). The core of FedCLASS is to enrich the class scores of historical models with new class scores predicted by current models and utilize the combined knowledge for self-distillation, enabling a more sufficient and precise knowledge transfer from historical models to current models. Theoretical analyses demonstrate that FedCLASS stands on reliable foundations, considering scores of old classes predicted by historical models as conditional probabilities in the absence of new classes, and the scores of new classes predicted by current models as the conditional probabilities of class scores derived from historical models. Empirical experiments demonstrate the superiority of FedCLASS over four baseline algorithms in reducing average forgetting rate and boosting global accuracy.
