FedVLM: Scalable Personalized Vision-Language Models through Federated Learning

Arkajyoti Mitra; Afia Anjum; Paul Agbaje; Mert Pesé; Habeeb Olufowobi

FedVLM: Scalable Personalized Vision-Language Models through Federated Learning

Arkajyoti Mitra, Afia Anjum, Paul Agbaje, Mert Pesé, Habeeb Olufowobi

TL;DR

The paper tackles the challenge of privately and efficiently fine-tuning large vision-language models in federated settings with non-iid data distributions. It introduces FedVLM, a federated LoRA-based fine-tuning framework, and a novel personalized LoRA variant (pLoRA) that shares only the B matrix globally while learning client-specific A_p to maximize local adaptation. Through experiments on the RLAIF-V dataset, FedVLM with pLoRA achieves significant gains over standard LoRA and other FL-based baselines, including 24.5% higher accuracy in non-iid settings and faster convergence than centralized training. This work enables scalable, privacy-preserving, personalized VLM deployment on edge devices and across distributed environments, with potential extensions to other VLM architectures and broader FL strategies.

Abstract

Vision-language models (VLMs) demonstrate impressive zero-shot and few-shot learning capabilities, making them essential for several downstream tasks. However, fine-tuning these models at scale remains challenging, particularly in federated environments where data is decentralized and non-iid across clients. Existing parameter-efficient tuning methods like LoRA (Low-Rank Adaptation) reduce computational overhead but struggle with heterogeneous client data, leading to suboptimal generalization. To address these challenges, we propose FedVLM, a federated LoRA fine-tuning framework that enables decentralized adaptation of VLMs while preserving model privacy and reducing reliance on centralized training. To further tackle data heterogeneity, we introduce personalized LoRA (pLoRA), which dynamically adapts LoRA parameters to each client's unique data distribution, significantly improving local adaptation while maintaining global model aggregation. Experiments on the RLAIF-V dataset show that pLoRA improves client-specific performance by 24.5% over standard LoRA, demonstrating superior adaptation in non-iid settings. FedVLM provides a scalable and efficient solution for fine-tuning VLMs in federated settings, advancing personalized adaptation in distributed learning scenarios.

FedVLM: Scalable Personalized Vision-Language Models through Federated Learning

TL;DR

Abstract

FedVLM: Scalable Personalized Vision-Language Models through Federated Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)