Table of Contents
Fetching ...

Accurate Forgetting for Heterogeneous Federated Continual Learning

Abudukelimu Wuerkaixi, Sen Cui, Jingfeng Zhang, Kunda Yan, Bo Han, Gang Niu, Lei Fang, Changshui Zhang, Masashi Sugiyama

TL;DR

This paper tackles federated continual learning (FCL) under highly heterogeneous, potentially unrelated task streams by introducing accurate forgetting, a concept that embraces selective forgetting of biased past knowledge. The proposed AF-FCL method uses a global normalizing flow to perform feature-space generative replay, paired with knowledge distillation to stabilize representations and a correlation-based mechanism to weight past knowledge by its relevance to the current task. By quantifying feature credibility via latent-space densities, AF-FCL suppresses harmful memories and reuses beneficial past information, achieving superior accuracy and lower forgetting across diverse benchmarks, including EMNIST variants, CIFAR100, and ImageNet-Subset. The work demonstrates practical implications for robust, privacy-preserving collaborative learning when client data are non-IID and task distributions diverge, showing that targeted forgetting can surpass traditional memorization-focused approaches.

Abstract

Recent years have witnessed a burgeoning interest in federated learning (FL). However, the contexts in which clients engage in sequential learning remain under-explored. Bridging FL and continual learning (CL) gives rise to a challenging practical problem: federated continual learning (FCL). Existing research in FCL primarily focuses on mitigating the catastrophic forgetting issue of continual learning while collaborating with other clients. We argue that the forgetting phenomena are not invariably detrimental. In this paper, we consider a more practical and challenging FCL setting characterized by potentially unrelated or even antagonistic data/tasks across different clients. In the FL scenario, statistical heterogeneity and data noise among clients may exhibit spurious correlations which result in biased feature learning. While existing CL strategies focus on a complete utilization of previous knowledge, we found that forgetting biased information is beneficial in our study. Therefore, we propose a new concept accurate forgetting (AF) and develop a novel generative-replay method~\method~which selectively utilizes previous knowledge in federated networks. We employ a probabilistic framework based on a normalizing flow model to quantify the credibility of previous knowledge. Comprehensive experiments affirm the superiority of our method over baselines.

Accurate Forgetting for Heterogeneous Federated Continual Learning

TL;DR

This paper tackles federated continual learning (FCL) under highly heterogeneous, potentially unrelated task streams by introducing accurate forgetting, a concept that embraces selective forgetting of biased past knowledge. The proposed AF-FCL method uses a global normalizing flow to perform feature-space generative replay, paired with knowledge distillation to stabilize representations and a correlation-based mechanism to weight past knowledge by its relevance to the current task. By quantifying feature credibility via latent-space densities, AF-FCL suppresses harmful memories and reuses beneficial past information, achieving superior accuracy and lower forgetting across diverse benchmarks, including EMNIST variants, CIFAR100, and ImageNet-Subset. The work demonstrates practical implications for robust, privacy-preserving collaborative learning when client data are non-IID and task distributions diverge, showing that targeted forgetting can surpass traditional memorization-focused approaches.

Abstract

Recent years have witnessed a burgeoning interest in federated learning (FL). However, the contexts in which clients engage in sequential learning remain under-explored. Bridging FL and continual learning (CL) gives rise to a challenging practical problem: federated continual learning (FCL). Existing research in FCL primarily focuses on mitigating the catastrophic forgetting issue of continual learning while collaborating with other clients. We argue that the forgetting phenomena are not invariably detrimental. In this paper, we consider a more practical and challenging FCL setting characterized by potentially unrelated or even antagonistic data/tasks across different clients. In the FL scenario, statistical heterogeneity and data noise among clients may exhibit spurious correlations which result in biased feature learning. While existing CL strategies focus on a complete utilization of previous knowledge, we found that forgetting biased information is beneficial in our study. Therefore, we propose a new concept accurate forgetting (AF) and develop a novel generative-replay method~\method~which selectively utilizes previous knowledge in federated networks. We employ a probabilistic framework based on a normalizing flow model to quantify the credibility of previous knowledge. Comprehensive experiments affirm the superiority of our method over baselines.

Paper Structure

This paper contains 36 sections, 12 equations, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: Illustration of the FCL problem. Multiple hospitals within a federated learning network engage in the sequential acquisition of disease prediction tasks. The global memory bank, a crucial tool for the classifier in mitigating catastrophic forgetting, may possess biased features arising from statistical heterogeneity. Notably, the overall performance of the classifier could suffer degradation without strategic forgetting (The experimental verification is in Sec. \ref{['sec:4']}).
  • Figure 2: Illustration of the EMNIST-noisy dataset and results. (a) The initial several tasks in Client2 exhibit label noise. (b) The average accuracy of methods is presented with respect to an increasing number of malicious clients. The baseline methods are illustrated by dash-dotted lines, while our method is depicted with solid line.
  • Figure 3: The diagram of training the classifier locally with our method. The training objective consists of three integral components: (i@) $\mathcal{L}_{ce}^g$, representing the objective for training using features generated and estimated probabilities derived from the global NF model; (ii@) $L_{ce}^x$, corresponding to the objective for training using original data; (iii@) $L_{KD}$, which denotes the objective for knowledge distillation within the feature space.