Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation
Grigory Malinovsky, Umberto Michieli, Hasan Abed Al Kader Hammoud, Taha Ceritli, Hayder Elesedy, Mete Ozay, Peter Richtárik
TL;DR
This paper tackles convergence gaps in Low-Rank Adaptation (LoRA) methods for fine-tuning large pre-trained models. It introduces RAC-LoRA, a randomized asymmetric chain of LoRA where one matrix per block is fixed randomly while the other is trainable, preserving the low-rank structure and enabling provable convergence to the full-parameter fine-tuning (FPFT) solution. The authors develop a theory showing convergence under gradient descent and stochastic methods, with rates tied to the smallest eigenvalue of an expected projection matrix $\lambda^H_{ ext{min}}$, and extend results under the Polyak-Łojasiewicz condition to linear convergence. They further adapt RAC-LoRA to federated learning (Fed-RAC-LoRA) and demonstrate compatibility with non-convex neural nets and large language models, achieving competitive or superior performance with reduced trainable parameters. Together, these results provide a principled, scalable pathway for reliable, parameter-efficient fine-tuning across centralized and distributed settings.
Abstract
Fine-tuning has become a popular approach to adapting large foundational models to specific tasks. As the size of models and datasets grows, parameter-efficient fine-tuning techniques are increasingly important. One of the most widely used methods is Low-Rank Adaptation (LoRA), with adaptation update expressed as the product of two low-rank matrices. While LoRA was shown to possess strong performance in fine-tuning, it often under-performs when compared to full-parameter fine-tuning (FPFT). Although many variants of LoRA have been extensively studied empirically, their theoretical optimization analysis is heavily under-explored. The starting point of our work is a demonstration that LoRA and its two extensions, Asymmetric LoRA and Chain of LoRA, indeed encounter convergence issues. To address these issues, we propose Randomized Asymmetric Chain of LoRA (RAC-LoRA) -- a general optimization framework that rigorously analyzes the convergence rates of LoRA-based methods. Our approach inherits the empirical benefits of LoRA-style heuristics, but introduces several small but important algorithmic modifications which turn it into a provably convergent method. Our framework serves as a bridge between FPFT and low-rank adaptation. We provide provable guarantees of convergence to the same solution as FPFT, along with the rate of convergence. Additionally, we present a convergence analysis for smooth, non-convex loss functions, covering gradient descent, stochastic gradient descent, and federated learning settings. Our theoretical findings are supported by experimental results.
