DeCAF: Decentralized Consensus-And-Factorization for Low-Rank Adaptation of Foundation Models
Nastaran Saadati, Zhanhong Jiang, Joshua R. Waite, Shreyan Ganguly, Aditya Balu, Chinmay Hegde, Soumik Sarkar
TL;DR
The paper tackles decentralized fine-tuning of foundation models using Low-Rank Adaptation (LoRA) by addressing two core challenges: lack of smoothness guarantees for low-rank adapters and consensus interference in decentralized LoRA. It introduces DLoRA with a refined smoothness-based analysis that achieves $O(1/\sqrt{T})$ convergence and a rank-sensitive behavior, and DeCAF, which employs Truncated Singular Value Decomposition (TSVD) to resolve consensus interference while preserving convergence speed. A parameter-efficient variant, DLoRA-FA, further reduces communication/computation by freezing one low-rank factor, trading some accuracy for efficiency. Empirical evaluations on vision-language and large-language tasks show DeCAF often outperforms local fine-tuning and rivals FedAvg across IID and non-IID data, with robustness to heterogeneous data distributions. Overall, the work provides a principled, scalable framework for decentralized, parameter-efficient fine-tuning of large pretrained models with theoretical guarantees and practical gains.
Abstract
Low-Rank Adaptation (LoRA) has emerged as one of the most effective, computationally tractable fine-tuning approaches for training Vision-Language Models (VLMs) and Large Language Models (LLMs). LoRA accomplishes this by freezing the pre-trained model weights and injecting trainable low-rank matrices, allowing for efficient learning of these foundation models even on edge devices. However, LoRA in decentralized settings still remains under explored, particularly for the theoretical underpinnings due to the lack of smoothness guarantee and model consensus interference (defined formally below). This work improves the convergence rate of decentralized LoRA (DLoRA) to match the rate of decentralized SGD by ensuring gradient smoothness. We also introduce DeCAF, a novel algorithm integrating DLoRA with truncated singular value decomposition (TSVD)-based matrix factorization to resolve consensus interference. Theoretical analysis shows TSVD's approximation error is bounded and consensus differences between DLoRA and DeCAF vanish as rank increases, yielding DeCAF's matching convergence rate. Extensive experiments across vision/language tasks demonstrate our algorithms outperform local training and rivals federated learning under both IID and non-IID data distributions.
