The Expressive Power of Low-Rank Adaptation

Yuchen Zeng; Kangwook Lee

The Expressive Power of Low-Rank Adaptation

Yuchen Zeng, Kangwook Lee

TL;DR

This work provides the first theoretical analysis of LoRA's expressive power for frozen pretrained networks, establishing explicit rank thresholds for exact adaptation in fully connected nets and Transformer architectures. It shows that, for FNNs, LoRA can match a target function when the per-adapter rank meets a threshold linked to network depth and width, and that for Transformer blocks, updating attention weights with LoRA suffices under a rank near half the embedding size. The paper also introduces uniform and general model-partition strategies to reduce the required rank, derives approximation bounds when the rank is below threshold, and contrasts LoRA with final-layer tuning. Empirical experiments on synthetic and real data validate the constructions and illustrate practical implications for designing LoRA adapters, including the impact of model proximity and biases on expressive power. Overall, the results illuminate why LoRA can be so effective in practice and provide a theoretical foundation for adapter design choices.

Abstract

Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method that leverages low-rank adaptation of weight matrices, has emerged as a prevalent technique for fine-tuning pre-trained models such as large language models and diffusion models. Despite its huge success in practice, the theoretical underpinnings of LoRA have largely remained unexplored. This paper takes the first step to bridge this gap by theoretically analyzing the expressive power of LoRA. We prove that, for fully connected neural networks, LoRA can adapt any model $f$ to accurately represent any smaller target model $\overline{f}$ if LoRA-rank $\geq(\text{width of }f) \times \frac{\text{depth of }\overline{f}}{\text{depth of }f}$. We also quantify the approximation error when LoRA-rank is lower than the threshold. For Transformer networks, we show any model can be adapted to a target model of the same size with rank-$(\frac{\text{embedding size}}{2})$ LoRA adapters.

The Expressive Power of Low-Rank Adaptation

TL;DR

Abstract

to accurately represent any smaller target model

if LoRA-rank

. We also quantify the approximation error when LoRA-rank is lower than the threshold. For Transformer networks, we show any model can be adapted to a target model of the same size with rank-

LoRA adapters.

Paper Structure (89 sections, 31 theorems, 95 equations, 9 figures, 3 tables)

This paper contains 89 sections, 31 theorems, 95 equations, 9 figures, 3 tables.

Introduction
Our Contributions.
Related Works
Expressive Power of Neural Networks
Expressive Power of Adaptation Methods
Notations
Warm up: Expressive Power of Linear Models with LoRA
Expressive Power of FNNs with LoRA
Problem Setting
One-Layer ReLU FNN Approximation
Multi-Layer ReLU FNN Approximation
Uniform Model Partition.
General Model Partition.
Comparison to Tuning Final Layers.
Expressive Power of Transformer Networks with LoRA
...and 74 more sections

Key Result

Theorem 1

Let ${ { \macc@depth1 \frozen@everymath{\mathgroup\macc@group} \macc@set@skewchar \macc@nested@a111{} } }$ be a target FNN and $f_0$ be an arbitrary frozen FNN. Under mild conditions on ranks and network architectures, there exist low-rank adapters such that a low-rank adapted version of $f_

Figures (9)

Figure 1: Approximation error (measured by MSE) versus LoRA-rank.
Figure 2: An example of $I_1$ and ${ { \macc@depth1 \frozen@everymath{\mathgroup\macc@group} \macc@set@skewchar \macc@nested@a111{} } }_1$ when $D= 2$.
Figure 3: Approximation error (measured by MSE) versus LoRA-rank on FNNs.
Figure 4: Log-scale MSE versus LoRA-rank on randomly initialized FNNs.
Figure 5: Approximation error (measured by MSE) versus LoRA-rank on TFNs.
...and 4 more figures

Theorems & Definitions (53)

Theorem 1: Informal
Theorem 2: Informal
Lemma 1
proof : Proof Sketch
Lemma 2
Example 1
Lemma 3
Theorem 3
Corollary 4
Theorem 5
...and 43 more

The Expressive Power of Low-Rank Adaptation

TL;DR

Abstract

The Expressive Power of Low-Rank Adaptation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (53)