Table of Contents
Fetching ...

AROMA: Autonomous Rank-one Matrix Adaptation

Hao Nan Sheng, Zhi-yong Wang, Mingrui Yang, Hing Cheung So

TL;DR

AROMA tackles the prespecified-rank bottleneck of LoRA by introducing adaptive rank growth via a dual-loop architecture that autonomously determines the number of rank-one updates. The method expresses the incremental weight update as $ΔW = ∑_{p=1}^{P} b_p a_p$, with an inner loop optimizing each rank-one component and an outer loop deciding the total count of subspaces. The training employs a Check & Merge & Reinit & Reset protocol to maintain subspace independence and promote exploration while keeping trainable parameters minimal. Empirical results on RoBERTa-base/GLUE and LLaMA3-8B/Commonsense170K show superior accuracy with far fewer trainable parameters than LoRA and AdaLoRA, along with favorable time efficiency. These findings position adaptive rank-growth PEFT as a scalable, effective approach with potential extensions to multimodal tasks and continual learning setups.

Abstract

As large language models continue to grow in size, parameter-efficient fine-tuning (PEFT) has become increasingly crucial. While low-rank adaptation (LoRA) offers a solution through low-rank updates, its static rank allocation may yield suboptimal results. Adaptive low-rank adaptation (AdaLoRA) improves this with dynamic allocation but remains sensitive to initial and target rank configurations. We introduce AROMA, a framework that automatically constructs layer-specific updates by iteratively building up rank-one components with very few trainable parameters that gradually diminish to zero. Unlike existing methods that employ rank reduction mechanisms, AROMA introduces a dual-loop architecture for rank growth. The inner loop extracts information from each rank-one subspace, while the outer loop determines the number of rank-one subspaces, i.e., the optimal rank. We reset optimizer states to maintain subspace independence. AROMA significantly reduces parameters compared to LoRA and AdaLoRA while achieving superior performance on natural language understanding and commonsense reasoning tasks, offering new insights into adaptive PEFT. The code is available at \href{https://github.com/ShuDun23/AROMA}{AROMA}.

AROMA: Autonomous Rank-one Matrix Adaptation

TL;DR

AROMA tackles the prespecified-rank bottleneck of LoRA by introducing adaptive rank growth via a dual-loop architecture that autonomously determines the number of rank-one updates. The method expresses the incremental weight update as , with an inner loop optimizing each rank-one component and an outer loop deciding the total count of subspaces. The training employs a Check & Merge & Reinit & Reset protocol to maintain subspace independence and promote exploration while keeping trainable parameters minimal. Empirical results on RoBERTa-base/GLUE and LLaMA3-8B/Commonsense170K show superior accuracy with far fewer trainable parameters than LoRA and AdaLoRA, along with favorable time efficiency. These findings position adaptive rank-growth PEFT as a scalable, effective approach with potential extensions to multimodal tasks and continual learning setups.

Abstract

As large language models continue to grow in size, parameter-efficient fine-tuning (PEFT) has become increasingly crucial. While low-rank adaptation (LoRA) offers a solution through low-rank updates, its static rank allocation may yield suboptimal results. Adaptive low-rank adaptation (AdaLoRA) improves this with dynamic allocation but remains sensitive to initial and target rank configurations. We introduce AROMA, a framework that automatically constructs layer-specific updates by iteratively building up rank-one components with very few trainable parameters that gradually diminish to zero. Unlike existing methods that employ rank reduction mechanisms, AROMA introduces a dual-loop architecture for rank growth. The inner loop extracts information from each rank-one subspace, while the outer loop determines the number of rank-one subspaces, i.e., the optimal rank. We reset optimizer states to maintain subspace independence. AROMA significantly reduces parameters compared to LoRA and AdaLoRA while achieving superior performance on natural language understanding and commonsense reasoning tasks, offering new insights into adaptive PEFT. The code is available at \href{https://github.com/ShuDun23/AROMA}{AROMA}.

Paper Structure

This paper contains 25 sections, 9 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Results for LoRA$r$=8, AdaLoRA$r$=8, and AROMA (ours) include the number of trainable parameters, total rank, rank of a specific layer and evaluation accuracy versus training step for RoBERTa-base on MRPC task. For AROMA, training of "layer.0.attention.output.dense" and "layer.9.attention.self.value" automatically terminates at 2000 and 1600 steps, respectively, while the overall training automatically stops at 2400 steps.
  • Figure 2: Workflow of AROMA. For each module, AROMA trains rank-one matrices sequentially with a dual-loop architecture. In the inner loop, a rank-one LoRA, $\bm{b}\bm{a}$, is updated, whose convergence is assessed by the inner stopping criterion. Prior to heading to next outer loop step, we check outer convergence by outer stopping criterion. If not converged, the computed rank-one components are merged and frozen, and new $\bm{b}$ and $\bm{a}$ are initialized for training with reset learning rate and optimizer states. For simplicity, we illustrate the length of inner loop to $T_{\mathrm{in}}$, though in practice, it is determined by both $T_{\mathrm{in}}$ and the inner convergence criterion.
  • Figure 3: Resultant rank and effective rank distributions for RoBERTa-base fine-tuned on MRPC task by AdaLoRA$r$=8 and AROMA, respectively. The $x$-axis represents the hidden layer index, while the $y$-axis refers to the weight matrix fine-tuned in each layer. The total rank is described by the red outer circle, whereas the effective rank is indicated by the blue inner circle. Experiment on RTE task is provided in Appendix \ref{['Apdx: RTE']}.
  • Figure 4: Cosine similarity between AROMAw/o Reset and AROMAw/ Reset for layer.10.attention.output.sense layer results for RoBERTa-base on MRPC task.
  • Figure 5: Resultant rank and effective rank distributions for RoBERTa-base fine-tuned on RTE task by AdaLoRA$r$=8 and AROMA, respectively. The $x$-axis represents the hidden layer index, while the $y$-axis refers to the weight matrix fine-tuned in each layer. The total rank is described by the red outer circle, whereas the effective rank is indicated by the blue inner circle.