AuroRA: Breaking Low-Rank Bottleneck of LoRA with Nonlinear Mapping
Haonan Dong, Wenhao Zhu, Guojie Song, Liang Wang
TL;DR
AuroRA addresses the LoRA low-rank bottleneck by inserting an Adaptive Nonlinear Layer between two low-rank projections, forming an MLPlike update that compresses rank while enhancing expressiveness. The approach combines a fixed tanh-based nonlinearity with a learnable spline-based component, yielding a sigma that improves approximation while maintaining bounded gradients and modest parameter overhead. Theoretical results show strictly lower approximation error than linear LoRA at the same rank and stable training dynamics, supported by analyses of parameter and compute costs. Empirically, AuroRA matches or surpasses full fine-tuning with only a small fraction of LoRA parameters and outperforms other PEFT methods across NLP and CV tasks, including scalable performance on large models and text-to-image generation scenarios.
Abstract
Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method validated across NLP and CV domains. However, LoRA faces an inherent low-rank bottleneck: narrowing its performance gap with full finetuning requires increasing the rank of its parameter matrix, resulting in significant parameter overhead. Recent linear LoRA variants have attempted to enhance expressiveness by introducing additional linear mappings; however, their composition remains inherently linear and fails to fundamentally improve LoRA's representational capacity. To address this limitation, we propose AuroRA, which incorporates an Adaptive Nonlinear Layer (ANL) between two linear projectors to capture fixed and learnable nonlinearities. This combination forms an MLP-like structure with a compressed rank, enabling flexible and precise approximation of diverse target functions while theoretically guaranteeing lower approximation errors and bounded gradients. Extensive experiments on 22 datasets and 6 pretrained models demonstrate that AuroRA: (I) not only matches or surpasses full fine-tuning performance with only 6.18% ~ 25% of LoRA's parameters but also (II) outperforms competitive PEFT methods by up to 10.88% in both NLP and CV tasks, and (III) exhibits robust performance across various rank configurations.
