Table of Contents
Fetching ...

NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models

Chenlu Guo, Yuan Wu, Yi Chang

TL;DR

This work tackles the slow convergence and initialization cost of LoRA-based PEFT for large language models by introducing a three-matrix adaptation variant (SLoRA) and a Nyström-based initialization (NLoRA). A minimalist fine-tuning option (IntTune) further reduces trainable parameters by targeting only the intermediate matrix. Empirically, SLoRA and NLoRA deliver substantial gains over LoRA on NLG and NLU tasks with modest parameter overhead, while IntTune achieves competitive results and significant efficiency gains (e.g., training time and memory). The proposed methods enable efficient adaptation of large models in resource-constrained settings, with demonstrated improvements on GSM8K, GLUE, and other benchmarks. Overall, the Nyström-inspired three-matrix approach provides a practical path to faster convergence and smaller fine-tuning footprints for SFT and PEFT of LLMs.

Abstract

Parameter-efficient fine-tuning (PEFT) is essential for adapting large language models (LLMs), with low-rank adaptation (LoRA) being the most popular approach. However, LoRA suffers from slow convergence, and some recent LoRA variants, such as PiSSA, primarily rely on Singular Value Decomposition (SVD) for initialization, leading to expensive computation. To mitigate these problems, we use the Nyström method, which follows a three-matrix manipulation. We first introduce StructuredLoRA (SLoRA), which investigates adding a small intermediate matrix between the low-rank matrices A and B. Secondly, we propose NyströmLoRA (NLoRA), which leverages Nyström-based initialization for SLoRA to improve its effectiveness and efficiency. Finally, we propose IntermediateTune (IntTune), which explores fine-tuning exclusively on the intermediate matrix of NLoRA to further boost LLM efficiency. We evaluate our methods on five natural language generation (NLG) tasks and eight natural language understanding (NLU) tasks. On GSM8K, SLoRA and NLoRA achieve accuracies of 56.48% and 57.70%, surpassing LoRA by 33.52% and 36.41%, with only 3.67 million additional trainable parameters. IntTune improves average NLG performance over LoRA by 7.45% while using only 1.25% of its parameters. These results demonstrate the efficiency and effectiveness of our approach in enhancing model performance with minimal parameter overhead.

NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models

TL;DR

This work tackles the slow convergence and initialization cost of LoRA-based PEFT for large language models by introducing a three-matrix adaptation variant (SLoRA) and a Nyström-based initialization (NLoRA). A minimalist fine-tuning option (IntTune) further reduces trainable parameters by targeting only the intermediate matrix. Empirically, SLoRA and NLoRA deliver substantial gains over LoRA on NLG and NLU tasks with modest parameter overhead, while IntTune achieves competitive results and significant efficiency gains (e.g., training time and memory). The proposed methods enable efficient adaptation of large models in resource-constrained settings, with demonstrated improvements on GSM8K, GLUE, and other benchmarks. Overall, the Nyström-inspired three-matrix approach provides a practical path to faster convergence and smaller fine-tuning footprints for SFT and PEFT of LLMs.

Abstract

Parameter-efficient fine-tuning (PEFT) is essential for adapting large language models (LLMs), with low-rank adaptation (LoRA) being the most popular approach. However, LoRA suffers from slow convergence, and some recent LoRA variants, such as PiSSA, primarily rely on Singular Value Decomposition (SVD) for initialization, leading to expensive computation. To mitigate these problems, we use the Nyström method, which follows a three-matrix manipulation. We first introduce StructuredLoRA (SLoRA), which investigates adding a small intermediate matrix between the low-rank matrices A and B. Secondly, we propose NyströmLoRA (NLoRA), which leverages Nyström-based initialization for SLoRA to improve its effectiveness and efficiency. Finally, we propose IntermediateTune (IntTune), which explores fine-tuning exclusively on the intermediate matrix of NLoRA to further boost LLM efficiency. We evaluate our methods on five natural language generation (NLG) tasks and eight natural language understanding (NLU) tasks. On GSM8K, SLoRA and NLoRA achieve accuracies of 56.48% and 57.70%, surpassing LoRA by 33.52% and 36.41%, with only 3.67 million additional trainable parameters. IntTune improves average NLG performance over LoRA by 7.45% while using only 1.25% of its parameters. These results demonstrate the efficiency and effectiveness of our approach in enhancing model performance with minimal parameter overhead.

Paper Structure

This paper contains 17 sections, 16 equations, 6 figures, 14 tables.

Figures (6)

  • Figure 1: The comparison among LoRA and our models
  • Figure 2: The comparison among Full Fine-tuning, LoRA, and SLoRA
  • Figure 3: The diagram of the Nyström-based initialization
  • Figure 4: Compare the performance of different ranks for NLoRA on NLG tasks
  • Figure 5: Comparison of GPU memory allocation and trainable parameters between IntTune and LoRA
  • ...and 1 more figures