Continual Adaptation of Vision Transformers for Federated Learning

Shaunak Halbe; James Seale Smith; Junjiao Tian; Zsolt Kira

Continual Adaptation of Vision Transformers for Federated Learning

Shaunak Halbe, James Seale Smith, Junjiao Tian, Zsolt Kira

TL;DR

Continual Federated Learning (CFL) addresses learning new classes over time from non-IID, privacy-sensitive clients. The paper introduces HePCo, a prompt-based CFL method that keeps Vision Transformer backbones frozen on clients and communicates only prompts and classifier heads, while the server performs data-free latent-space generation and distillation to consolidate knowledge across heterogeneous clients. Across CIFAR-100, ImageNet-R, and DomainNet, HePCo achieves up to 7% absolute gains in final average accuracy $A_N$ and reduces forgetting $F_N$, with substantially lower communication costs than baselines. The approach preserves privacy by not sharing training data or full models and scales to longer task sequences, though it imposes server-side computation and prompts remain less interpretable.

Abstract

In this paper, we focus on the important yet understudied problem of Continual Federated Learning (CFL), where a server communicates with a set of clients to incrementally learn new concepts over time without sharing or storing any data. The complexity of this problem is compounded by challenges from both the Continual and Federated Learning perspectives. Specifically, models trained in a CFL setup suffer from catastrophic forgetting which is exacerbated by data heterogeneity across clients. Existing attempts at this problem tend to impose large overheads on clients and communication channels or require access to stored data which renders them unsuitable for real-world use due to privacy. In this paper, we attempt to tackle forgetting and heterogeneity while minimizing overhead costs and without requiring access to any stored data. We study this problem in the context of Vision Transformers and explore parameter-efficient approaches to adapt to dynamic distributions while minimizing forgetting. We achieve this by leveraging a prompting based approach (such that only prompts and classifier heads have to be communicated) and proposing a novel and lightweight generation and distillation scheme to consolidate client models at the server. We formulate this problem for image classification and establish strong baselines for comparison, conduct experiments on CIFAR-100 as well as challenging, large-scale datasets like ImageNet-R and DomainNet. Our approach outperforms both existing methods and our own baselines by as much as 7% while significantly reducing communication and client-level computation costs. Code available at https://github.com/shaunak27/hepco-fed.

Continual Adaptation of Vision Transformers for Federated Learning

TL;DR

and reduces forgetting

, with substantially lower communication costs than baselines. The approach preserves privacy by not sharing training data or full models and scales to longer task sequences, though it imposes server-side computation and prompts remain less interpretable.

Abstract

Paper Structure (17 sections, 5 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 5 equations, 3 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Problem Formulation
Background: L2P
Method
Client Side: Decomposed Prompting
Server Side: Latent Generation
Server Side: Latent Space Knowledge Distillation
Experiments
Main Results
Additional Analysis
Ablation Studies
Overhead Cost Analysis
Conclusion
Discussion
...and 2 more sections

Figures (3)

Figure 1: In Continual Federated Learning (CFL), clients learn from unique, continual data. We propose a prompt-based CFL approach, paired with a lightweight generation and distillation scheme, to consolidate client models at the server in a communication-efficient manner.
Figure 2: Latent generation and distillation with underlying decomposed prompting scheme.
Figure 3: Comparison of the methods under different category ratios.

Continual Adaptation of Vision Transformers for Federated Learning

TL;DR

Abstract

Continual Adaptation of Vision Transformers for Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)