Understanding the Learning Dynamics of LoRA: A Gradient Flow Perspective on Low-Rank Adaptation in Matrix Factorization
Ziqing Xu, Hancheng Min, Lachlan Ewen MacDonald, Jinqi Luo, Salma Tarmoun, Enrique Mallada, Rene Vidal
TL;DR
This work analyzes gradient-flow dynamics of LoRA-based fine-tuning for matrix factorization, revealing a two-phase learning process: an alignment phase where LoRA singular directions align with the fine-tuning target, and a local convergence phase with linear decay. It provides rigorous results showing that small initialization scales drive closer-to-optimal final error, and introduces a spectral initialization that enables convergence to arbitrary precision. The theory accounts for misalignment between pre-trained and fine-tuning tasks and the coupling with fixed pre-trained weights, and is corroborated by MF and image-classification experiments. The findings suggest initialization scale and spectral design crucially influence both optimization and generalization, with practical implications for efficient, accurate fine-tuning of large pre-trained models.
Abstract
Despite the empirical success of Low-Rank Adaptation (LoRA) in fine-tuning pre-trained models, there is little theoretical understanding of how first-order methods with carefully crafted initialization adapt models to new tasks. In this work, we take the first step towards bridging this gap by theoretically analyzing the learning dynamics of LoRA for matrix factorization (MF) under gradient flow (GF), emphasizing the crucial role of initialization. For small initialization, we theoretically show that GF converges to a neighborhood of the optimal solution, with smaller initialization leading to lower final error. Our analysis shows that the final error is affected by the misalignment between the singular spaces of the pre-trained model and the target matrix, and reducing the initialization scale improves alignment. To address this misalignment, we propose a spectral initialization for LoRA in MF and theoretically prove that GF with small spectral initialization converges to the fine-tuning task with arbitrary precision. Numerical experiments from MF and image classification validate our findings.
