Table of Contents
Fetching ...

One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning

Arnav Chavan, Zhuang Liu, Deepak Gupta, Eric Xing, Zhiqiang Shen

TL;DR

GLoRA introduces Generalized LoRA, a One-for-All PEFT framework that jointly tunes weights and activations through a unified formulation with trainable support tensors. A structural re-parameterization enables zero extra inference cost, while an evolutionary search over a large per-layer supernet yields task-specific adapters without manual hyperparameter tuning. Across VTAB-1K, large language models, few-shot, and domain-generalization benchmarks, GLoRA consistently surpasses prior PEFT methods with fewer trainable parameters. The work demonstrates strong cross-domain applicability and practical efficiency for resource-constrained deployment, supported by analysis of layer-wise adaptation and capacity expansion via VC-dimension arguments.

Abstract

We present Generalized LoRA (GLoRA), an advanced approach for universal parameter-efficient fine-tuning tasks. Enhancing Low-Rank Adaptation (LoRA), GLoRA employs a generalized prompt module to optimize pre-trained model weights and adjust intermediate activations, providing more flexibility and capability across diverse tasks and datasets. Moreover, GLoRA facilitates efficient parameter adaptation by employing a scalable, modular, layer-wise structure search that learns individual adapter of each layer. Originating from a unified mathematical formulation, GLoRA exhibits strong transfer learning, few-shot learning and domain generalization abilities, as it adapts to new tasks through not only weights but also additional dimensions like activations. Comprehensive experiments demonstrate that GLoRA outperforms all previous methods in natural, specialized, and structured vision benchmarks, achieving superior accuracy with fewer parameters and computations. The proposed method on LLaMA-1 and LLaMA-2 also show considerable enhancements compared to the original LoRA in the language domain. Furthermore, our structural re-parameterization design ensures that GLoRA incurs no extra inference cost, rendering it a practical solution for resource-limited applications. Code and models are available at: https://github.com/Arnav0400/ViT-Slim/tree/master/GLoRA.

One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning

TL;DR

GLoRA introduces Generalized LoRA, a One-for-All PEFT framework that jointly tunes weights and activations through a unified formulation with trainable support tensors. A structural re-parameterization enables zero extra inference cost, while an evolutionary search over a large per-layer supernet yields task-specific adapters without manual hyperparameter tuning. Across VTAB-1K, large language models, few-shot, and domain-generalization benchmarks, GLoRA consistently surpasses prior PEFT methods with fewer trainable parameters. The work demonstrates strong cross-domain applicability and practical efficiency for resource-constrained deployment, supported by analysis of layer-wise adaptation and capacity expansion via VC-dimension arguments.

Abstract

We present Generalized LoRA (GLoRA), an advanced approach for universal parameter-efficient fine-tuning tasks. Enhancing Low-Rank Adaptation (LoRA), GLoRA employs a generalized prompt module to optimize pre-trained model weights and adjust intermediate activations, providing more flexibility and capability across diverse tasks and datasets. Moreover, GLoRA facilitates efficient parameter adaptation by employing a scalable, modular, layer-wise structure search that learns individual adapter of each layer. Originating from a unified mathematical formulation, GLoRA exhibits strong transfer learning, few-shot learning and domain generalization abilities, as it adapts to new tasks through not only weights but also additional dimensions like activations. Comprehensive experiments demonstrate that GLoRA outperforms all previous methods in natural, specialized, and structured vision benchmarks, achieving superior accuracy with fewer parameters and computations. The proposed method on LLaMA-1 and LLaMA-2 also show considerable enhancements compared to the original LoRA in the language domain. Furthermore, our structural re-parameterization design ensures that GLoRA incurs no extra inference cost, rendering it a practical solution for resource-limited applications. Code and models are available at: https://github.com/Arnav0400/ViT-Slim/tree/master/GLoRA.
Paper Structure (23 sections, 1 theorem, 15 equations, 6 figures, 7 tables)

This paper contains 23 sections, 1 theorem, 15 equations, 6 figures, 7 tables.

Key Result

Theorem 1

Suppose $\mathbf d_\mathrm{vc}(\mathcal{H})$ is the VC dimension of any finite hypothesis $\mathcal{H}$. If $\mathcal{H}_\mathrm{i} \subseteq \mathcal{H}_\mathrm{uni}$,

Figures (6)

  • Figure 1: Schematic representation of a linear layer adapted with GLoRA.
  • Figure 2: Results on few-shot learning datasets. The baseline methods include Adapter, LoRA, VPT, NOAH. GLoRA consistently performs better across five datasets and a varying number of training examples per class. More comparisons are provided in Appendix \ref{['appendix_few_shot']}.
  • Figure 3: Distribution of GLoRA (0.86M) parameters across layer types on VTAB-1K. Q-K-V and Projection are linear layers in MHSA module and FC1 and FC2 are linear layers in MLP module.
  • Figure 4: Layerwise configuration of support tensors in GLoRA (0.86M) on VTAB-1K dataset.
  • Figure 5: Visualization of features from SVHN dataset by t-SNE van2008visualizing.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Theorem 1