FLoRA: Enhancing Vision-Language Models with Parameter-Efficient Federated Learning
Duy Phuong Nguyen, J. Pablo Munoz, Ali Jannesari
TL;DR
This work tackles privacy and scalability in vision-language models by proposing FLoRA, a federated fine-tuning framework that applies Low-Rank Adaptation (LoRA) adapters to CLIP. By updating only the text-encoder LoRA parameters and aggregating with FedAvg-like server updates, FLoRA achieves substantial communication and memory savings while maintaining or improving accuracy across IID and non-IID settings. Extensive experiments across a wide range of datasets, including few-shot and pathological non-IID scenarios, demonstrate that FLoRA outperforms traditional FL baselines and offers robust, data-efficient performance. The approach delivers practical benefits for privacy-preserving, distributed multimodal learning with significantly reduced training time and bandwidth requirements.
Abstract
In the rapidly evolving field of artificial intelligence, multimodal models, e.g., integrating vision and language into visual-language models (VLMs), have become pivotal for many applications, ranging from image captioning to multimodal search engines. Among these models, the Contrastive Language-Image Pre-training (CLIP) model has demonstrated remarkable performance in understanding and generating nuanced relationships between text and images. However, the conventional training of such models often requires centralized aggregation of vast datasets, posing significant privacy and data governance challenges. To address these concerns, this paper proposes a novel approach that leverages Federated Learning and parameter-efficient adapters, i.e., Low-Rank Adaptation (LoRA), to train VLMs. This methodology preserves data privacy by training models across decentralized data sources and ensures model adaptability and efficiency through LoRA's parameter-efficient fine-tuning. Our approach accelerates training time by up to 34.72 times and requires 2.47 times less memory usage than full fine-tuning.
