FAA-CLIP: Federated Adversarial Adaptation of CLIP

Yihang Wu; Ahmad Chaddad; Christian Desrosiers; Tareef Daqqaq; Reem Kateb

FAA-CLIP: Federated Adversarial Adaptation of CLIP

Yihang Wu, Ahmad Chaddad, Christian Desrosiers, Tareef Daqqaq, Reem Kateb

TL;DR

FAA-CLIP tackles deploying vision-language models in federated learning by freezing the CLIP backbone and training a light-weight feature adaptation module (FAM) per client, while employing a domain adaptation module to mitigate inter-client distribution shifts. The global server aggregates only the FAM parameters, dramatically reducing communication and computation costs. Across six natural and medical imaging datasets, FAA-CLIP consistently outperforms strong FL baselines and improves calibration and ROC-AUC metrics in medical tasks. This work demonstrates that domain-adversarial adaptation paired with compact adapters can unlock CLIP's potential in privacy-preserving, heterogeneous healthcare and vision tasks, with publicly available code.

Abstract

Despite the remarkable performance of vision language models (VLMs) such as Contrastive Language Image Pre-training (CLIP), the large size of these models is a considerable obstacle to their use in federated learning (FL) systems where the parameters of local client models need to be transferred to a global server for aggregation. Another challenge in FL is the heterogeneity of data from different clients, which affects the generalization performance of the solution. In addition, natural pre-trained VLMs exhibit poor generalization ability in the medical datasets, suggests there exists a domain gap. To solve these issues, we introduce a novel method for the Federated Adversarial Adaptation (FAA) of CLIP. Our method, named FAA-CLIP, handles the large communication costs of CLIP using a light-weight feature adaptation module (FAM) for aggregation, effectively adapting this VLM to each client's data while greatly reducing the number of parameters to transfer. By keeping CLIP frozen and only updating the FAM parameters, our method is also computationally efficient. Unlike existing approaches, our FAA-CLIP method directly addresses the problem of domain shifts across clients via a domain adaptation (DA) module. This module employs a domain classifier to predict if a given sample is from the local client or the global server, allowing the model to learn domain-invariant representations. Extensive experiments on six different datasets containing both natural and medical images demonstrate that FAA-CLIP can generalize well on both natural and medical datasets compared to recent FL approaches. Our codes are available at https://github.com/AIPMLab/FAA-CLIP.

FAA-CLIP: Federated Adversarial Adaptation of CLIP

TL;DR

Abstract

FAA-CLIP: Federated Adversarial Adaptation of CLIP

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)