Table of Contents
Fetching ...

Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation

Grigory Malinovsky, Umberto Michieli, Hasan Abed Al Kader Hammoud, Taha Ceritli, Hayder Elesedy, Mete Ozay, Peter Richtárik

TL;DR

This paper tackles convergence gaps in Low-Rank Adaptation (LoRA) methods for fine-tuning large pre-trained models. It introduces RAC-LoRA, a randomized asymmetric chain of LoRA where one matrix per block is fixed randomly while the other is trainable, preserving the low-rank structure and enabling provable convergence to the full-parameter fine-tuning (FPFT) solution. The authors develop a theory showing convergence under gradient descent and stochastic methods, with rates tied to the smallest eigenvalue of an expected projection matrix $\lambda^H_{ ext{min}}$, and extend results under the Polyak-Łojasiewicz condition to linear convergence. They further adapt RAC-LoRA to federated learning (Fed-RAC-LoRA) and demonstrate compatibility with non-convex neural nets and large language models, achieving competitive or superior performance with reduced trainable parameters. Together, these results provide a principled, scalable pathway for reliable, parameter-efficient fine-tuning across centralized and distributed settings.

Abstract

Fine-tuning has become a popular approach to adapting large foundational models to specific tasks. As the size of models and datasets grows, parameter-efficient fine-tuning techniques are increasingly important. One of the most widely used methods is Low-Rank Adaptation (LoRA), with adaptation update expressed as the product of two low-rank matrices. While LoRA was shown to possess strong performance in fine-tuning, it often under-performs when compared to full-parameter fine-tuning (FPFT). Although many variants of LoRA have been extensively studied empirically, their theoretical optimization analysis is heavily under-explored. The starting point of our work is a demonstration that LoRA and its two extensions, Asymmetric LoRA and Chain of LoRA, indeed encounter convergence issues. To address these issues, we propose Randomized Asymmetric Chain of LoRA (RAC-LoRA) -- a general optimization framework that rigorously analyzes the convergence rates of LoRA-based methods. Our approach inherits the empirical benefits of LoRA-style heuristics, but introduces several small but important algorithmic modifications which turn it into a provably convergent method. Our framework serves as a bridge between FPFT and low-rank adaptation. We provide provable guarantees of convergence to the same solution as FPFT, along with the rate of convergence. Additionally, we present a convergence analysis for smooth, non-convex loss functions, covering gradient descent, stochastic gradient descent, and federated learning settings. Our theoretical findings are supported by experimental results.

Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation

TL;DR

This paper tackles convergence gaps in Low-Rank Adaptation (LoRA) methods for fine-tuning large pre-trained models. It introduces RAC-LoRA, a randomized asymmetric chain of LoRA where one matrix per block is fixed randomly while the other is trainable, preserving the low-rank structure and enabling provable convergence to the full-parameter fine-tuning (FPFT) solution. The authors develop a theory showing convergence under gradient descent and stochastic methods, with rates tied to the smallest eigenvalue of an expected projection matrix , and extend results under the Polyak-Łojasiewicz condition to linear convergence. They further adapt RAC-LoRA to federated learning (Fed-RAC-LoRA) and demonstrate compatibility with non-convex neural nets and large language models, achieving competitive or superior performance with reduced trainable parameters. Together, these results provide a principled, scalable pathway for reliable, parameter-efficient fine-tuning across centralized and distributed settings.

Abstract

Fine-tuning has become a popular approach to adapting large foundational models to specific tasks. As the size of models and datasets grows, parameter-efficient fine-tuning techniques are increasingly important. One of the most widely used methods is Low-Rank Adaptation (LoRA), with adaptation update expressed as the product of two low-rank matrices. While LoRA was shown to possess strong performance in fine-tuning, it often under-performs when compared to full-parameter fine-tuning (FPFT). Although many variants of LoRA have been extensively studied empirically, their theoretical optimization analysis is heavily under-explored. The starting point of our work is a demonstration that LoRA and its two extensions, Asymmetric LoRA and Chain of LoRA, indeed encounter convergence issues. To address these issues, we propose Randomized Asymmetric Chain of LoRA (RAC-LoRA) -- a general optimization framework that rigorously analyzes the convergence rates of LoRA-based methods. Our approach inherits the empirical benefits of LoRA-style heuristics, but introduces several small but important algorithmic modifications which turn it into a provably convergent method. Our framework serves as a bridge between FPFT and low-rank adaptation. We provide provable guarantees of convergence to the same solution as FPFT, along with the rate of convergence. Additionally, we present a convergence analysis for smooth, non-convex loss functions, covering gradient descent, stochastic gradient descent, and federated learning settings. Our theoretical findings are supported by experimental results.

Paper Structure

This paper contains 22 sections, 2 theorems, 22 equations, 1 figure, 7 tables, 1 algorithm.

Key Result

Theorem 5.3

Suppose that Assumption asm:L-smooth and Assumption asm:lambda hold. Suppose that a stepsize $\gamma > 0$ is chosen such that $\gamma\leq \frac{1}{L}$. We choose the output of the method $\widetilde{W}^T$ uniformly at random from $W^0, W^1,\ldots,W^{T-1}$ Then, the iterate $\widetilde{W}^T$ of RAC-L

Figures (1)

  • Figure 1: Convergence of LoRA, AsymmLoRA, Chain of LoRA (COLA), and RAC-LoRA for the problem in Equation equation \ref{['eq:counter']}.

Theorems & Definitions (4)

  • Definition 4.1: Left Matrix Sampling
  • Definition 4.2: Right Matrix Sampling
  • Theorem 5.3
  • Theorem 5.5