Table of Contents
Fetching ...

Rank-Accuracy Trade-off for LoRA: A Gradient-Flow Analysis

Michael Rushka, Diego Klabjan

TL;DR

This paper analyzes the rank–accuracy trade-off in LoRA fine-tuning by formulating a gradient-flow dynamics framework for the low-rank update BA to the pretrained weights W0. It first derives gradient-flow ODEs that are invariant to whether A and B are updated simultaneously or sequentially, and then applies them to two objectives: trace-squared and Frobenius norm based low-rank approximation. The main results show zero final loss for the trace-squared objective under LoRA and quantify the relative approximation error scaling as O(r^{-1/2}) under standard spectral initialization, while for general low-rank approximation the optimal rank-r solution aligns with the top r singular values of W0 per the Eckart-Young-Mirsky theorem. Together, these findings illuminate how rank and spectral properties of W0 determine LoRA’s accuracy, offering a principled basis for parameter-efficient fine-tuning in large models and guiding future extensions to more complex losses and initialization schemes.

Abstract

Previous empirical studies have shown that LoRA achieves accuracy comparable to full-parameter methods on downstream fine-tuning tasks, even for rank-1 updates. By contrast, the theoretical underpinnings of the dependence of LoRA's accuracy on update rank remain relatively unexplored. In this work, we compare the accuracy of rank-r LoRA updates against full-parameter updates for fine-tuning tasks from a dynamical systems perspective. We perform gradient flow analysis in both full-rank and low-rank regimes to establish explicit relationships between rank and accuracy for two loss functions under LoRA. While gradient flow equations for LoRA are presented in prior work, we rigorously derive their form and show that they are identical for simultaneous and sequential LoRA parameter updates. We then use the resulting dynamical system equations to obtain closed-form relationships between LoRA rank and accuracy for trace-squared and Frobenius-norm low-rank approximation loss functions.

Rank-Accuracy Trade-off for LoRA: A Gradient-Flow Analysis

TL;DR

This paper analyzes the rank–accuracy trade-off in LoRA fine-tuning by formulating a gradient-flow dynamics framework for the low-rank update BA to the pretrained weights W0. It first derives gradient-flow ODEs that are invariant to whether A and B are updated simultaneously or sequentially, and then applies them to two objectives: trace-squared and Frobenius norm based low-rank approximation. The main results show zero final loss for the trace-squared objective under LoRA and quantify the relative approximation error scaling as O(r^{-1/2}) under standard spectral initialization, while for general low-rank approximation the optimal rank-r solution aligns with the top r singular values of W0 per the Eckart-Young-Mirsky theorem. Together, these findings illuminate how rank and spectral properties of W0 determine LoRA’s accuracy, offering a principled basis for parameter-efficient fine-tuning in large models and guiding future extensions to more complex losses and initialization schemes.

Abstract

Previous empirical studies have shown that LoRA achieves accuracy comparable to full-parameter methods on downstream fine-tuning tasks, even for rank-1 updates. By contrast, the theoretical underpinnings of the dependence of LoRA's accuracy on update rank remain relatively unexplored. In this work, we compare the accuracy of rank-r LoRA updates against full-parameter updates for fine-tuning tasks from a dynamical systems perspective. We perform gradient flow analysis in both full-rank and low-rank regimes to establish explicit relationships between rank and accuracy for two loss functions under LoRA. While gradient flow equations for LoRA are presented in prior work, we rigorously derive their form and show that they are identical for simultaneous and sequential LoRA parameter updates. We then use the resulting dynamical system equations to obtain closed-form relationships between LoRA rank and accuracy for trace-squared and Frobenius-norm low-rank approximation loss functions.
Paper Structure (22 sections, 9 theorems, 375 equations, 1 algorithm)

This paper contains 22 sections, 9 theorems, 375 equations, 1 algorithm.

Key Result

Theorem 2.4

Consider an objective function $g: \Theta \to \mathbb{R}$ satisfying Assumptions ass uniform boundedness of iterates main body--ass lipschitz smoothness main body which is minimized via Algorithm alg: lora-gradient-descent. During finetuning, the iterates produced by Algorithm alg: lora-gradient-des

Theorems & Definitions (19)

  • Theorem 2.4
  • Theorem 2.7
  • Theorem 2.8
  • Theorem 2.11
  • Remark 1.1: Product Norm on $\Theta$
  • Remark 1.3: Uniform Boundedness of Products
  • Remark 1.5: Existence of ODE Solution
  • Remark 1.6: Uniform Boundedness of Gradient for Finite Time
  • Lemma 1.8: Changes in $\theta(t)$ in Time are Bounded
  • Remark 3.2: Boundedness of Gradient for Trace Squared Loss
  • ...and 9 more