Table of Contents
Fetching ...

DeCAF: Decentralized Consensus-And-Factorization for Low-Rank Adaptation of Foundation Models

Nastaran Saadati, Zhanhong Jiang, Joshua R. Waite, Shreyan Ganguly, Aditya Balu, Chinmay Hegde, Soumik Sarkar

TL;DR

The paper tackles decentralized fine-tuning of foundation models using Low-Rank Adaptation (LoRA) by addressing two core challenges: lack of smoothness guarantees for low-rank adapters and consensus interference in decentralized LoRA. It introduces DLoRA with a refined smoothness-based analysis that achieves $O(1/\sqrt{T})$ convergence and a rank-sensitive behavior, and DeCAF, which employs Truncated Singular Value Decomposition (TSVD) to resolve consensus interference while preserving convergence speed. A parameter-efficient variant, DLoRA-FA, further reduces communication/computation by freezing one low-rank factor, trading some accuracy for efficiency. Empirical evaluations on vision-language and large-language tasks show DeCAF often outperforms local fine-tuning and rivals FedAvg across IID and non-IID data, with robustness to heterogeneous data distributions. Overall, the work provides a principled, scalable framework for decentralized, parameter-efficient fine-tuning of large pretrained models with theoretical guarantees and practical gains.

Abstract

Low-Rank Adaptation (LoRA) has emerged as one of the most effective, computationally tractable fine-tuning approaches for training Vision-Language Models (VLMs) and Large Language Models (LLMs). LoRA accomplishes this by freezing the pre-trained model weights and injecting trainable low-rank matrices, allowing for efficient learning of these foundation models even on edge devices. However, LoRA in decentralized settings still remains under explored, particularly for the theoretical underpinnings due to the lack of smoothness guarantee and model consensus interference (defined formally below). This work improves the convergence rate of decentralized LoRA (DLoRA) to match the rate of decentralized SGD by ensuring gradient smoothness. We also introduce DeCAF, a novel algorithm integrating DLoRA with truncated singular value decomposition (TSVD)-based matrix factorization to resolve consensus interference. Theoretical analysis shows TSVD's approximation error is bounded and consensus differences between DLoRA and DeCAF vanish as rank increases, yielding DeCAF's matching convergence rate. Extensive experiments across vision/language tasks demonstrate our algorithms outperform local training and rivals federated learning under both IID and non-IID data distributions.

DeCAF: Decentralized Consensus-And-Factorization for Low-Rank Adaptation of Foundation Models

TL;DR

The paper tackles decentralized fine-tuning of foundation models using Low-Rank Adaptation (LoRA) by addressing two core challenges: lack of smoothness guarantees for low-rank adapters and consensus interference in decentralized LoRA. It introduces DLoRA with a refined smoothness-based analysis that achieves convergence and a rank-sensitive behavior, and DeCAF, which employs Truncated Singular Value Decomposition (TSVD) to resolve consensus interference while preserving convergence speed. A parameter-efficient variant, DLoRA-FA, further reduces communication/computation by freezing one low-rank factor, trading some accuracy for efficiency. Empirical evaluations on vision-language and large-language tasks show DeCAF often outperforms local fine-tuning and rivals FedAvg across IID and non-IID data, with robustness to heterogeneous data distributions. Overall, the work provides a principled, scalable framework for decentralized, parameter-efficient fine-tuning of large pretrained models with theoretical guarantees and practical gains.

Abstract

Low-Rank Adaptation (LoRA) has emerged as one of the most effective, computationally tractable fine-tuning approaches for training Vision-Language Models (VLMs) and Large Language Models (LLMs). LoRA accomplishes this by freezing the pre-trained model weights and injecting trainable low-rank matrices, allowing for efficient learning of these foundation models even on edge devices. However, LoRA in decentralized settings still remains under explored, particularly for the theoretical underpinnings due to the lack of smoothness guarantee and model consensus interference (defined formally below). This work improves the convergence rate of decentralized LoRA (DLoRA) to match the rate of decentralized SGD by ensuring gradient smoothness. We also introduce DeCAF, a novel algorithm integrating DLoRA with truncated singular value decomposition (TSVD)-based matrix factorization to resolve consensus interference. Theoretical analysis shows TSVD's approximation error is bounded and consensus differences between DLoRA and DeCAF vanish as rank increases, yielding DeCAF's matching convergence rate. Extensive experiments across vision/language tasks demonstrate our algorithms outperform local training and rivals federated learning under both IID and non-IID data distributions.

Paper Structure

This paper contains 24 sections, 15 theorems, 57 equations, 4 figures, 7 tables, 4 algorithms.

Key Result

Lemma 1

Let Assumptions assum_1, assum_2, assum_3 hold. Suppose that the parameter for an agent $\mathbf{W}$ satisfies LoRA, i.e., $\mathbf{W}=\mathbf{W}_0+\frac{\eta}{r}\mathbf{B}\mathbf{A}$. Then, for any given $\bm{w}, \bm{w}'$, we have the following relationship: $\|\nabla f^i(\bm{w},\mathbf{W}_0)-\nabl

Figures (4)

  • Figure 1: DeCAF with ring topology: At step $t$, agent $i$ exchanges $\mathbf{A}^i$ and $\mathbf{B}^i$ with neighbors, performs consensus on their products, applies truncated SVD, and updates using local data. Pre-trained weights $\mathbf{W}_0$ remain frozen.
  • Figure 2: Test accuracy on the Flowers dataset with CLIP: (\ref{['fig:topology_iid']}) IID and (\ref{['fig:topology_non_iid']}) non-IID topologies using DeCAF; (\ref{['fig:algorithm']}) decentralized algorithms on non-IID ring topology (40 shots/class, rank 2).
  • Figure 3: Test accuracy on Flowers with CLIP: (\ref{['fig:low_rank']}) DeCAF with varying ranks (IID, FC); (\ref{['fig:diff_acc_dl']}) DeCAF vs. DLoRA accuracy gap; (\ref{['fig:scalability']}) DeCAF scalability with agent count (40 shots/class, rank 2).
  • Figure 4: Illustration of DLoRA-FA by using two agents: agents $i$ and $j$ share their low-rank matrices (only $\mathbf{B}$) separately during the communication step, $t$, and conduct consensus on the received matrices from neighbors with its own. Once the consensus is done, each agent use the locally owned data to update the low-rank matrix $\mathbf{B}$. During this process, the pre-trained weights $\mathbf{W}_0$ and one low-rank matrix $\mathbf{A}_0$ are frozen. In DLoRA-FA, there is no matrix factorization as in DeCAF, which has been shown in Theorem \ref{['model_interference_theo']}.

Theorems & Definitions (27)

  • Lemma 1
  • Theorem 1
  • Corollary 1
  • Definition 1
  • Proposition 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • proof
  • proof
  • ...and 17 more