Table of Contents
Fetching ...

NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models

Cheng Lin, Lujun Li, Dezhi Li, Jie Zou, Wei Xue, Yike Guo

TL;DR

NoRA tackles the cost of fine-tuning large-scale, pre-trained models by introducing a nested, dual-layer Low-Rank Adaptation that freezes an outer LoRA layer and trains an inner LoRA layer, with initialization guided by Singular Value Decomposition. This structure both leverages existing weight information and enables precise task-specific adjustments, reducing the number of trainable parameters. The approach is analyzed theoretically and validated across large language models, vision-language models, and image-generation tasks, where NoRA consistently outperforms LoRA and several variants while achieving substantial parameter savings. The work highlights NoRA's potential to broaden the practicality of fine-tuning for multimodal and large-scale systems, albeit with some SVD-related computational overhead and avenues for future enhancements such as AutoML and distillation.

Abstract

In this paper, we introduce Nested Low-Rank Adaptation (NoRA), a novel approach to parameter-efficient fine-tuning that extends the capabilities of Low-Rank Adaptation (LoRA) techniques. Vanilla LoRA overlooks pre-trained weight inheritance and still requires fine-tuning numerous parameters. To addresses these issues, our NoRA adopts a dual-layer nested structure with Singular Value Decomposition (SVD), effectively leveraging original matrix knowledge while reducing tunable parameters. Specifically, NoRA freezes the outer LoRA weights and utilizes an inner LoRA design, providing enhanced control over model optimization. This approach allows the model to more precisely adapt to specific tasks while maintaining a compact parameter space. By freezing outer LoRA weights and using an inner LoRA design, NoRA enables precise task adaptation with a compact parameter space. Evaluations on tasks including commonsense reasoning with large language models, fine-tuning vision-language models, and subject-driven generation demonstrate NoRA's superiority over LoRA and its variants. Code will be released upon acceptance.

NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models

TL;DR

NoRA tackles the cost of fine-tuning large-scale, pre-trained models by introducing a nested, dual-layer Low-Rank Adaptation that freezes an outer LoRA layer and trains an inner LoRA layer, with initialization guided by Singular Value Decomposition. This structure both leverages existing weight information and enables precise task-specific adjustments, reducing the number of trainable parameters. The approach is analyzed theoretically and validated across large language models, vision-language models, and image-generation tasks, where NoRA consistently outperforms LoRA and several variants while achieving substantial parameter savings. The work highlights NoRA's potential to broaden the practicality of fine-tuning for multimodal and large-scale systems, albeit with some SVD-related computational overhead and avenues for future enhancements such as AutoML and distillation.

Abstract

In this paper, we introduce Nested Low-Rank Adaptation (NoRA), a novel approach to parameter-efficient fine-tuning that extends the capabilities of Low-Rank Adaptation (LoRA) techniques. Vanilla LoRA overlooks pre-trained weight inheritance and still requires fine-tuning numerous parameters. To addresses these issues, our NoRA adopts a dual-layer nested structure with Singular Value Decomposition (SVD), effectively leveraging original matrix knowledge while reducing tunable parameters. Specifically, NoRA freezes the outer LoRA weights and utilizes an inner LoRA design, providing enhanced control over model optimization. This approach allows the model to more precisely adapt to specific tasks while maintaining a compact parameter space. By freezing outer LoRA weights and using an inner LoRA design, NoRA enables precise task adaptation with a compact parameter space. Evaluations on tasks including commonsense reasoning with large language models, fine-tuning vision-language models, and subject-driven generation demonstrate NoRA's superiority over LoRA and its variants. Code will be released upon acceptance.
Paper Structure (11 sections, 6 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 11 sections, 6 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: In the comparison between training with LoRA and NoRA, the blue modules represent the parts where parameters are frozen during training, while the orange modules indicate the components that need to be updated. Here, $r$ denotes the outer rank, and $r'$ denotes the inner rank.
  • Figure 2: Comparison of generated images from LoRA and NoRA on the subject-driven generation task.
  • Figure : PyTorch code for NoRA