Table of Contents
Fetching ...

Masked Autoencoders are Parameter-Efficient Federated Continual Learners

Yuchen He, Xiangfeng Wang

TL;DR

Experimental results demonstrate that pMAE achieves performance comparable to existing promptbased methods and can enhance their effectiveness, particularly when using self-supervised pre-trained transformers as the backbone.

Abstract

Federated learning is a specific distributed learning paradigm in which a central server aggregates updates from multiple clients' local models, thereby enabling the server to learn without requiring clients to upload their private data, maintaining data privacy. While existing federated learning methods are primarily designed for static data, real-world applications often require clients to learn new categories over time. This challenge necessitates the integration of continual learning techniques, leading to federated continual learning (FCL). To address both catastrophic forgetting and non-IID issues, we propose to use masked autoencoders (MAEs) as parameter-efficient federated continual learners, called pMAE. pMAE learns reconstructive prompt on the client side through image reconstruction using MAE. On the server side, it reconstructs the uploaded restore information to capture the data distribution across previous tasks and different clients, using these reconstructed images to fine-tune discriminative prompt and classifier parameters tailored for classification, thereby alleviating catastrophic forgetting and non-IID issues on a global scale. Experimental results demonstrate that pMAE achieves performance comparable to existing prompt-based methods and can enhance their effectiveness, particularly when using self-supervised pre-trained transformers as the backbone. Code is available at: https://github.com/ycheoo/pMAE.

Masked Autoencoders are Parameter-Efficient Federated Continual Learners

TL;DR

Experimental results demonstrate that pMAE achieves performance comparable to existing promptbased methods and can enhance their effectiveness, particularly when using self-supervised pre-trained transformers as the backbone.

Abstract

Federated learning is a specific distributed learning paradigm in which a central server aggregates updates from multiple clients' local models, thereby enabling the server to learn without requiring clients to upload their private data, maintaining data privacy. While existing federated learning methods are primarily designed for static data, real-world applications often require clients to learn new categories over time. This challenge necessitates the integration of continual learning techniques, leading to federated continual learning (FCL). To address both catastrophic forgetting and non-IID issues, we propose to use masked autoencoders (MAEs) as parameter-efficient federated continual learners, called pMAE. pMAE learns reconstructive prompt on the client side through image reconstruction using MAE. On the server side, it reconstructs the uploaded restore information to capture the data distribution across previous tasks and different clients, using these reconstructed images to fine-tune discriminative prompt and classifier parameters tailored for classification, thereby alleviating catastrophic forgetting and non-IID issues on a global scale. Experimental results demonstrate that pMAE achieves performance comparable to existing prompt-based methods and can enhance their effectiveness, particularly when using self-supervised pre-trained transformers as the backbone. Code is available at: https://github.com/ycheoo/pMAE.

Paper Structure

This paper contains 17 sections, 6 equations, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: Illustration of federated continual learning (FCL), wherein clients sequentially learn over $T$ tasks. Each client continuously updates its local model with class-imbalanced private data specific to each task, and further transfers the well-chosen parameters to server for aggregation. The aggregated global model needs to maintain discriminability across all observed classes within the task set.
  • Figure 2: The overview of proposed pMAE framework. 1) Client: Extract labeled restore information using Eq. \ref{['eq:encoder']} and optimize prompts and classifier parameters of the local model by $\mathcal{L}_{client}$. 2) Server: Generate reconstructed images using Eq. \ref{['eq:decoder']} and fine-tune discriminative prompt and classifier parameters of the aggregated global model by $\mathcal{L}_{server}$. 3) Lightweight prompts, classifier parameters and labeled restore information are transmitted between clients and the server, while no additional data is stored on each client. 4) Images are reconstructed into tensors and subsequently used for fine-tuning, ensuring that no real images are stored on the server.
  • Figure 3: Uncurated random samples of CUB-200 images, using an MAE trained on ImageNet-100 with frozen pre-trained encoder. For each quadruplet, we show the masked image, Sup-based MAE reconstruction, iBOT-based MAE reconstruction, and the ground-truth. The masking ratio is 75%.
  • Figure 4: Uncurated random samples of ImageNet-R images, using an MAE trained on ImageNet-100 with frozen pre-trained encoder. For each quadruplet, we show the masked image, Sup-based MAE reconstruction, iBOT-based MAE reconstruction, and the ground-truth. The masking ratio is 75%.
  • Figure 5: Accuracy curves on 20-task CUB-200.
  • ...and 5 more figures