Table of Contents
Fetching ...

Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying

Adithya Renduchintala, Tugrul Konuk, Oleksii Kuchaiev

TL;DR

<3-5 sentence high-level summary> Tied-LoRA introduces weight tying across all layers and selective training to drastically reduce trainable parameters in LoRA-like fine-tuning. By sharing low-rank projections A and B and optionally freezing components, TL configurations (notably TL6) achieve performance close to or sometimes surpassing LoRA while using a fraction of the parameters, especially at higher ranks. The approach is evaluated across five diverse tasks using two base LMs (GPT-2B-001 and LLaMA2-7B), showing task-dependent optimal ranks and robust efficiency gains, with translation showing notable parameter reductions (as low as 12.5%). The work suggests that weight tying plus selective training is a promising direction for scalable, cost-effective customization of large language models, with future work extending to larger bases and other PEFT methods including adapters and prefix-tuning.

Abstract

We introduce Tied-LoRA, a novel paradigm leveraging weight tying and selective training to enhance the parameter efficiency of Low-rank Adaptation (LoRA). Our exploration encompasses different plausible combinations of parameter training and freezing, coupled with weight tying, aimed at identifying the optimal trade-off between performance and the count of trainable parameters. Across $5$ diverse tasks and two foundational language models with different parameter counts, our experiments provide comprehensive insights into the inherent trade-offs between efficiency and performance. Our findings reveal a specific Tied-LoRA configuration that distinguishes itself by showcasing comparable performance to LoRA across multiple tasks while utilizing only a fraction of the parameters employed by the standard LoRA method, particularly at elevated ranks. This underscores the efficacy of Tied-LoRA in achieving impressive results with significantly reduced model complexity.

Tied-Lora: Enhancing parameter efficiency of LoRA with weight tying

TL;DR

<3-5 sentence high-level summary> Tied-LoRA introduces weight tying across all layers and selective training to drastically reduce trainable parameters in LoRA-like fine-tuning. By sharing low-rank projections A and B and optionally freezing components, TL configurations (notably TL6) achieve performance close to or sometimes surpassing LoRA while using a fraction of the parameters, especially at higher ranks. The approach is evaluated across five diverse tasks using two base LMs (GPT-2B-001 and LLaMA2-7B), showing task-dependent optimal ranks and robust efficiency gains, with translation showing notable parameter reductions (as low as 12.5%). The work suggests that weight tying plus selective training is a promising direction for scalable, cost-effective customization of large language models, with future work extending to larger bases and other PEFT methods including adapters and prefix-tuning.

Abstract

We introduce Tied-LoRA, a novel paradigm leveraging weight tying and selective training to enhance the parameter efficiency of Low-rank Adaptation (LoRA). Our exploration encompasses different plausible combinations of parameter training and freezing, coupled with weight tying, aimed at identifying the optimal trade-off between performance and the count of trainable parameters. Across diverse tasks and two foundational language models with different parameter counts, our experiments provide comprehensive insights into the inherent trade-offs between efficiency and performance. Our findings reveal a specific Tied-LoRA configuration that distinguishes itself by showcasing comparable performance to LoRA across multiple tasks while utilizing only a fraction of the parameters employed by the standard LoRA method, particularly at elevated ranks. This underscores the efficacy of Tied-LoRA in achieving impressive results with significantly reduced model complexity.
Paper Structure (26 sections, 3 equations, 3 figures, 3 tables)

This paper contains 26 sections, 3 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Schematic of our Tied-Lora paradigm, the main low-rank matrices $A$ and $B$ are tied across (indicated by the symbol) all the layers of the base language model. We use the gradient shading to indicate that these parameters can either be trained or frozen.
  • Figure 2: Plots showing the performance of the Tied-LoRA configurations averaged over tasks across all ranks.\ref{['subfig:all2b', 'subfig:all7b']} display all Tied-LoRA configurations, while \ref{['subfig:best2b', 'subfig:best7b']} display the best Tied-LoRA configurations with LoRA and Vera as baselines. \ref{['sec:detailedPlots']} contains plots for each task and base model.
  • Figure 3: Plots showing the performance of the Tied-LoRA configurations along with the baseline LoRA (${\color{freeze}\mathbf{v}}\mathbf{B}{\color{freeze}\mathbf{u}}\mathbf{A}$) for $5$ diverse tasks at $4$ different values for low-rank dimension setting. Note that we let the plot for TL3(${\color{freeze}\mathbf{v}}\mathbf{B}_{_{\text{\faChain}}}{\color{freeze}\mathbf{u}}{\color{freeze}\mathbf{A}_{_{\text{\faChain}}}}$) and TL4(${\color{freeze}\mathbf{v}}\mathbf{B}_{_{\text{\faChain}}}\mathbf{u}{\color{freeze}\mathbf{A}_{_{\text{\faChain}}}}$) go out of bounds to show details for the other curves.