SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules

Xiangyu Chen; Jing Liu; Ye Wang; Pu Perry Wang; Matthew Brand; Guanghui Wang; Toshiaki Koike-Akino

SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules

Xiangyu Chen, Jing Liu, Ye Wang, Pu Perry Wang, Matthew Brand, Guanghui Wang, Toshiaki Koike-Akino

TL;DR

A generalized framework called SuperLoRA is proposed that unifies and extends different LoRA variants, which can be realized under different hyper-parameter settings, and demonstrates superior performance for transfer learning tasks especially in the extremely few-parameter regimes.

Abstract

Low-rank adaptation (LoRA) and its variants are widely employed in fine-tuning large models, including large language models for natural language processing and diffusion models for computer vision. This paper proposes a generalized framework called SuperLoRA that unifies and extends different LoRA variants, which can be realized under different hyper-parameter settings. Introducing grouping, folding, shuffling, projecting, and tensor factoring, SuperLoRA offers high flexibility compared with other LoRA variants and demonstrates superior performance for transfer learning tasks especially in the extremely few-parameter regimes.

SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules

TL;DR

Abstract

Paper Structure (39 sections, 5 equations, 30 figures, 1 table)

This paper contains 39 sections, 5 equations, 30 figures, 1 table.

Introduction
Related Work
Methodology
Low-Rank Adaptation (LoRA)
SuperLoRA
Empirical Experiments
Classification transfer task
Settings:
Results:
Image generation transfer task
Settings:
Grouping effect:
Reshaping effect:
LoKr vs. LoNKr:
LoRTA:
...and 24 more sections

Figures (30)

Figure 1: Schematic of SuperLoRA to fine-tune multi-layer attention modules at once with vectorizing, grouping, projection, folding, and factorization.
Figure 2: Hyperparameters and notation.
Figure 3: Required number of parameters.
Figure 4: Overview of (a) LoRA; (b) LoKr; (c) LoNKr (weight-wise version, ours).
Figure 5: Classification on CIFAR100 dataset with SuperLoRA.
...and 25 more figures

SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules

TL;DR

Abstract

SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules

Authors

TL;DR

Abstract

Table of Contents

Figures (30)