Table of Contents
Fetching ...

AuroRA: Breaking Low-Rank Bottleneck of LoRA with Nonlinear Mapping

Haonan Dong, Wenhao Zhu, Guojie Song, Liang Wang

TL;DR

AuroRA addresses the LoRA low-rank bottleneck by inserting an Adaptive Nonlinear Layer between two low-rank projections, forming an MLPlike update that compresses rank while enhancing expressiveness. The approach combines a fixed tanh-based nonlinearity with a learnable spline-based component, yielding a sigma that improves approximation while maintaining bounded gradients and modest parameter overhead. Theoretical results show strictly lower approximation error than linear LoRA at the same rank and stable training dynamics, supported by analyses of parameter and compute costs. Empirically, AuroRA matches or surpasses full fine-tuning with only a small fraction of LoRA parameters and outperforms other PEFT methods across NLP and CV tasks, including scalable performance on large models and text-to-image generation scenarios.

Abstract

Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method validated across NLP and CV domains. However, LoRA faces an inherent low-rank bottleneck: narrowing its performance gap with full finetuning requires increasing the rank of its parameter matrix, resulting in significant parameter overhead. Recent linear LoRA variants have attempted to enhance expressiveness by introducing additional linear mappings; however, their composition remains inherently linear and fails to fundamentally improve LoRA's representational capacity. To address this limitation, we propose AuroRA, which incorporates an Adaptive Nonlinear Layer (ANL) between two linear projectors to capture fixed and learnable nonlinearities. This combination forms an MLP-like structure with a compressed rank, enabling flexible and precise approximation of diverse target functions while theoretically guaranteeing lower approximation errors and bounded gradients. Extensive experiments on 22 datasets and 6 pretrained models demonstrate that AuroRA: (I) not only matches or surpasses full fine-tuning performance with only 6.18% ~ 25% of LoRA's parameters but also (II) outperforms competitive PEFT methods by up to 10.88% in both NLP and CV tasks, and (III) exhibits robust performance across various rank configurations.

AuroRA: Breaking Low-Rank Bottleneck of LoRA with Nonlinear Mapping

TL;DR

AuroRA addresses the LoRA low-rank bottleneck by inserting an Adaptive Nonlinear Layer between two low-rank projections, forming an MLPlike update that compresses rank while enhancing expressiveness. The approach combines a fixed tanh-based nonlinearity with a learnable spline-based component, yielding a sigma that improves approximation while maintaining bounded gradients and modest parameter overhead. Theoretical results show strictly lower approximation error than linear LoRA at the same rank and stable training dynamics, supported by analyses of parameter and compute costs. Empirically, AuroRA matches or surpasses full fine-tuning with only a small fraction of LoRA parameters and outperforms other PEFT methods across NLP and CV tasks, including scalable performance on large models and text-to-image generation scenarios.

Abstract

Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method validated across NLP and CV domains. However, LoRA faces an inherent low-rank bottleneck: narrowing its performance gap with full finetuning requires increasing the rank of its parameter matrix, resulting in significant parameter overhead. Recent linear LoRA variants have attempted to enhance expressiveness by introducing additional linear mappings; however, their composition remains inherently linear and fails to fundamentally improve LoRA's representational capacity. To address this limitation, we propose AuroRA, which incorporates an Adaptive Nonlinear Layer (ANL) between two linear projectors to capture fixed and learnable nonlinearities. This combination forms an MLP-like structure with a compressed rank, enabling flexible and precise approximation of diverse target functions while theoretically guaranteeing lower approximation errors and bounded gradients. Extensive experiments on 22 datasets and 6 pretrained models demonstrate that AuroRA: (I) not only matches or surpasses full fine-tuning performance with only 6.18% ~ 25% of LoRA's parameters but also (II) outperforms competitive PEFT methods by up to 10.88% in both NLP and CV tasks, and (III) exhibits robust performance across various rank configurations.

Paper Structure

This paper contains 47 sections, 7 theorems, 34 equations, 6 figures, 13 tables, 1 algorithm.

Key Result

Proposition 2.1

Let $M \in \mathbb{R}^{d_{\mathrm{out}}\times d_{\mathrm{in}}}$ with $\mathrm{rank}(M) > r$. Define Then $\varepsilon_r(M) > 0$, and for our proposed update of the form where $\sigma$ is our adaptive nonlinear layer, there exists a parameter set $(A^*, B^*, \sigma^*)$ such that Hence, the approximation error is strictly below the linear rank-$r$ limit $\varepsilon_r(M)$, using the same rank $r$

Figures (6)

  • Figure 1: The trade-off between parameters and performance of various fine-tuning methods on NLP (left) and CV (right) tasks. (Left) In NLU, RoBERTa-Base is fine-tuned on cola, with LoRA ranks $r = \{2, 3, 4, 6, 8\}$. (Right) In image classification, ViT-Base is fine-tuned on dtd, with LoRA ranks $r = \{2, 4, 6, 8, 12, 16\}$.
  • Figure 2: We evaluate LoRA, MoSLoRA, and our AuroRA on dtd and resisc45 datasets, employing ViT-Base with a rank of $r=2$. (Upper) We record the $\Delta W$ at the $\{0,1,2,...,9\}$-th epochs, and perform PCA visualization on these $\Delta W$. We observe that AuroRA is capable of exploring a broader parameter space. (Lower) We present the accuracy results on both datasets.
  • Figure 3: A general comparison of LoRA and our AuroRA. (Left) In LoRA, matrices $\mathbf{A}$ and $\mathbf{B}$ act as two linear projectors, forming a two-layer linear mapping with hidden dimension $r$. (Right) Our AuroRA extends LoRA by incorporating an adaptive nonlinear layer comprising fixed and learnable nonlinearities, forming an MLP-like structure with significantly reduced hidden dimension $\widetilde{r}$ ($\widetilde{r} \ll r$).
  • Figure 4: Results of LoRA and AuroRA in the subject-driven image generation task. AuroRA aligns better with the prompt.
  • Figure 5: Performance comparison of different methods with varying ranks. We use LLaMA 3-8B as the pretrained model and fine-tune it using AuroRA, MoSLoRA, and LoRA methods on the hellaswag, winogrande, arc-e and arc-c datasets, with ranks $\{2,4,8,16\}$.
  • ...and 1 more figures

Theorems & Definitions (14)

  • Proposition 2.1: Lower Approximation Error
  • Proposition 2.2: Gradient Boundedness
  • Definition D.1: Best Linear Rank-$r$ Error
  • Definition D.3: Nonlinear Low-Rank Update
  • Lemma D.4: Piecewise Polynomial Approximation
  • Lemma D.5: Combining Fixed and Learnable Nonlinearities
  • proof : Proof of Lemma \ref{['lem:comb-F-S']}
  • proof : Proof
  • Lemma E.1
  • proof
  • ...and 4 more