Table of Contents
Fetching ...

AILoRA: Function-Aware Asymmetric Initialization for Low-Rank Adaptation of Large Language Models

Xiaoshuang Ji, Zhendong Zhao, Xiaoyan Gu, Xiaojun Chen, Xin Zhao, Zeyao Liu

TL;DR

AILoRA tackles the high cost of full finetuning by introducing a function-aware asymmetric initialization for LoRA modules. It leverages singular value decomposition on pretrained projection matrices to inject the dominant components of $W^Q$ and the minor components of $W^V$ into the respective LoRA updates, while freezing residual components to preserve pretrained knowledge. By aligning the initialization with the distinct roles of $W^Q$ (semantic guidance) and $W^V$ (token-level features), AILoRA achieves improved downstream adaptation and faster convergence across diverse architectures and tasks. Empirical results on NLU and NLG benchmarks demonstrate consistent gains over baselines and robust performance under varying rank and matrix-selection settings, indicating practical benefits for parameter-efficient finetuning of large language models.

Abstract

Parameter-efficient finetuning (PEFT) aims to mitigate the substantial computational and memory overhead involved in adapting large-scale pretrained models to diverse downstream tasks. Among numerous PEFT strategies, Low-Rank Adaptation (LoRA) has emerged as one of the most widely adopted approaches due to its robust empirical performance and low implementation complexity. In practical deployment, LoRA is typically applied to the $W^Q$ and $W^V$ projection matrices of self-attention modules, enabling an effective trade-off between model performance and parameter efficiency. While LoRA has achieved considerable empirical success, it still encounters challenges such as suboptimal performance and slow convergence. To address these limitations, we introduce \textbf{AILoRA}, a novel parameter-efficient method that incorporates function-aware asymmetric low-rank priors. Our empirical analysis reveals that the projection matrices $W^Q$ and $W^V$ in the self-attention mechanism exhibit distinct parameter characteristics, stemming from their functional differences. Specifically, $W^Q$ captures task-specific semantic space knowledge essential for attention distributions computation, making its parameters highly sensitive to downstream task variations. In contrast, $W^V$ encodes token-level feature representations that tend to remain stable across tasks and layers. Leveraging these insights, AILoRA performs a function-aware initialization by injecting the principal components of $W^Q$ to retain task-adaptive capacity, and the minor components of $W^V$ to preserve generalizable feature representations. This asymmetric initialization strategy enables LoRA modules to better capture the specialized roles of attention parameters, thereby enhancing both finetuning performance and convergence efficiency.

AILoRA: Function-Aware Asymmetric Initialization for Low-Rank Adaptation of Large Language Models

TL;DR

AILoRA tackles the high cost of full finetuning by introducing a function-aware asymmetric initialization for LoRA modules. It leverages singular value decomposition on pretrained projection matrices to inject the dominant components of and the minor components of into the respective LoRA updates, while freezing residual components to preserve pretrained knowledge. By aligning the initialization with the distinct roles of (semantic guidance) and (token-level features), AILoRA achieves improved downstream adaptation and faster convergence across diverse architectures and tasks. Empirical results on NLU and NLG benchmarks demonstrate consistent gains over baselines and robust performance under varying rank and matrix-selection settings, indicating practical benefits for parameter-efficient finetuning of large language models.

Abstract

Parameter-efficient finetuning (PEFT) aims to mitigate the substantial computational and memory overhead involved in adapting large-scale pretrained models to diverse downstream tasks. Among numerous PEFT strategies, Low-Rank Adaptation (LoRA) has emerged as one of the most widely adopted approaches due to its robust empirical performance and low implementation complexity. In practical deployment, LoRA is typically applied to the and projection matrices of self-attention modules, enabling an effective trade-off between model performance and parameter efficiency. While LoRA has achieved considerable empirical success, it still encounters challenges such as suboptimal performance and slow convergence. To address these limitations, we introduce \textbf{AILoRA}, a novel parameter-efficient method that incorporates function-aware asymmetric low-rank priors. Our empirical analysis reveals that the projection matrices and in the self-attention mechanism exhibit distinct parameter characteristics, stemming from their functional differences. Specifically, captures task-specific semantic space knowledge essential for attention distributions computation, making its parameters highly sensitive to downstream task variations. In contrast, encodes token-level feature representations that tend to remain stable across tasks and layers. Leveraging these insights, AILoRA performs a function-aware initialization by injecting the principal components of to retain task-adaptive capacity, and the minor components of to preserve generalizable feature representations. This asymmetric initialization strategy enables LoRA modules to better capture the specialized roles of attention parameters, thereby enhancing both finetuning performance and convergence efficiency.

Paper Structure

This paper contains 18 sections, 6 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Comparative analysis of the $W^Q$ and $W^V$ projection matrices in the self-attention mechanism. Figures (a) and (b) visualize the $W^Q$ and $W^V$ matrices across all layers of RoBERTa-large (24 decoder layers) before and after fine-tuning on the CoLA dataset, using t-SNE for dimensionality reduction. Each point represents a projection matrix from a specific layer. Figures (c) and (d) report the Frobenius norms of the weight updates $\Delta W^Q$ and $\Delta W^V$ after fine-tuning on the CoLA and SST-2 datasets, respectively.
  • Figure 2: AILoRA first performs SVD on the $W^Q$ and $W^V$ matrices. For the $W^Q$ matrices, the principal components are used to initialize the LoRA modules while keeping the remaining components frozen.In contrast, the LoRA modules of $W^V$ are initialized using the minor components, with the remaining components fixed.
  • Figure 3: The training loss and accuracy over the epochs of AILoRA and baselines.
  • Figure 4: Experiments on function-aware enhancement of $W^Q$ and $W^V$.
  • Figure 5: Comparison between AILoRA and baselines across various ranks.