Table of Contents
Fetching ...

Data-Free Continual Learning of Server Models in Model-Heterogeneous Cloud-Device Collaboration

Xiao Zhang, Zengzhe Chen, Yuan Yuan, Yifei Zou, Fuzhen Zhuang, Wenyu Jiao, Yuke Wang, Dongxiao Yu

TL;DR

FedDCL tackles data-free continual learning in model-heterogeneous federated settings by extracting class-specific prototypes from frozen diffusion models. It enables data-free synthetic data augmentation, exemplar-free replay, and data-free dynamic knowledge transfer between heterogeneous clients and a server through a multi-teacher distillation scheme. The framework jointly trains and preserves knowledge across sequential tasks, outperforming baselines in both cumulative accuracy and forgetting across Grayscale and RGB datasets. This approach bridges privacy, heterogeneity, and continual learning challenges, showing practical potential for dynamic cloud-device collaboration.

Abstract

The rise of cloud-device collaborative computing has enabled intelligent services to be delivered across distributed edge devices while leveraging centralized cloud resources. In this paradigm, federated learning (FL) has become a key enabler for privacy-preserving model training without transferring raw data from edge devices to the cloud. However, with the continuous emergence of new data and increasing model diversity, traditional federated learning faces significant challenges, including inherent issues of data heterogeneity, model heterogeneity and catastrophic forgetting, along with new challenge of knowledge misalignment. In this study, we introduce FedDCL, a novel framework designed to enable data-free continual learning of the server model in a model-heterogeneous federated setting. We leverage pre-trained diffusion models to extract lightweight class-specific prototypes, which confer a threefold data-free advantage, enabling: (1) generation of synthetic data for the current task to augment training and counteract non-IID data distributions; (2) exemplar-free generative replay for retaining knowledge from previous tasks; and (3) data-free dynamic knowledge transfer from heterogeneous devices to the cloud server.Experimental results on various datasets demonstrate the effectiveness of FedDCL, showcasing its potential to enhance the generalizability and practical applicability of federated cloud-device collaboration in dynamic settings.

Data-Free Continual Learning of Server Models in Model-Heterogeneous Cloud-Device Collaboration

TL;DR

FedDCL tackles data-free continual learning in model-heterogeneous federated settings by extracting class-specific prototypes from frozen diffusion models. It enables data-free synthetic data augmentation, exemplar-free replay, and data-free dynamic knowledge transfer between heterogeneous clients and a server through a multi-teacher distillation scheme. The framework jointly trains and preserves knowledge across sequential tasks, outperforming baselines in both cumulative accuracy and forgetting across Grayscale and RGB datasets. This approach bridges privacy, heterogeneity, and continual learning challenges, showing practical potential for dynamic cloud-device collaboration.

Abstract

The rise of cloud-device collaborative computing has enabled intelligent services to be delivered across distributed edge devices while leveraging centralized cloud resources. In this paradigm, federated learning (FL) has become a key enabler for privacy-preserving model training without transferring raw data from edge devices to the cloud. However, with the continuous emergence of new data and increasing model diversity, traditional federated learning faces significant challenges, including inherent issues of data heterogeneity, model heterogeneity and catastrophic forgetting, along with new challenge of knowledge misalignment. In this study, we introduce FedDCL, a novel framework designed to enable data-free continual learning of the server model in a model-heterogeneous federated setting. We leverage pre-trained diffusion models to extract lightweight class-specific prototypes, which confer a threefold data-free advantage, enabling: (1) generation of synthetic data for the current task to augment training and counteract non-IID data distributions; (2) exemplar-free generative replay for retaining knowledge from previous tasks; and (3) data-free dynamic knowledge transfer from heterogeneous devices to the cloud server.Experimental results on various datasets demonstrate the effectiveness of FedDCL, showcasing its potential to enhance the generalizability and practical applicability of federated cloud-device collaboration in dynamic settings.

Paper Structure

This paper contains 21 sections, 13 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Illustration of the three main challenges in heterogeneous federated continual learning: (1) Catastrophic forgetting, where sequential task learning overwrites past knowledge and degrades earlier task performance; (2) Model heterogeneity, where diverse device architectures hinder direct parameter aggregation; and (3) Knowledge misalignment, where public dataset, due to static nature, fail to align with evolving task domains to transfer knowledge dynamically.
  • Figure 2: The performance of a student model distilling knowledge from a teacher on MNIST using different public datasets. Redline means the teacher's accuracy, which can be seen as upper bound.
  • Figure 3: The framework of FedDCL: ① Federated Prototype Extraction. Clients extract class prototypes using a frozen pre-trained diffusion model. These prototypes enable the generation of synthetic data aligned with the current task’s knowledge, as well as the replay of previously learned knowledge. ② Augmented Local Training. On the client side, synthetic data that conform to the current task’s data distribution and preserve knowledge from previous tasks are mixed with local real data to perform local training. ③ Collaborative Traning & Feedback. On the server side, knowledge is transferred from clients’ heterogeneous models and the server’s past-task models to the current global model via the synthetic dataset; simultaneously, feedback is sent back to refine the local client models.
  • Figure 4: Per task accuracy and average accuracy comparison of various methods. The results are evaluated after sequentially learning all tasks.
  • Figure 5: Comparison of model parameter sizes
  • ...and 2 more figures