Table of Contents
Fetching ...

LoR2C : Low-Rank Residual Connection Adaptation for Parameter-Efficient Fine-Tuning

Jiancheng Zhao, Xingda Yu, Yuxiang Zhang, Zhen Yang

TL;DR

LoR2C introduces a low-rank residual connection adaptation for parameter-efficient fine-tuning, combining residual pathways with low-rank matrices so that layer updates are captured by $W=BA$ with $r\ll d$, reducing tunable parameters and alleviating gradient vanishing. It further offers ShareLoR2C, MergeLoR2C, and InjectLoR2C to trade off parameter count and performance via parameter sharing, dynamic merging, and rank-aware injections, guided by the Shape of Feature Space (SFS) metric. Across GLUE with RoBERTa-base and instruction-tuning with LLAMA2-7B, LoR2C variants achieve competitive or superior results using far fewer trainable parameters than full fine-tuning or many PEFT baselines, including strong performance on BBH and HEval benchmarks. The work demonstrates practical gains in efficiency and gradient propagation for Transformer fine-tuning, highlighting a versatile direction for deploying large models under resource constraints, while noting added complexity and the need for broader scalability validation.

Abstract

In recent years, pretrained large language models have demonstrated outstanding performance across various natural language processing tasks. However, full-parameter fine-tuning methods require adjusting all model parameters, leading to immense computational resource demands. Although parameter-efficient fine-tuning methods like LoRA have significantly reduced the number of parameters, they still face challenges such as gradient vanishing and the potential for further parameter reduction. To address these issues, this paper proposes a novel parameter-efficient fine-tuning method called LoR2C (Low-Rank Residual Connection Adaptation). LoR2C introduces residual connections with low-rank matrices within the model layers, which not only reduces the number of fine-tuning parameters but also effectively alleviates the gradient vanishing problem. Additionally, this paper presents three optimization variants of LoR2C: ShareLoR2C, MergeLoR2C, and InjectLoR2C. These variants further improve parameter efficiency and model performance through parameter sharing, module merging, and injection mechanisms, respectively. Experimental results on multiple natural language understanding and natural language generation tasks demonstrate that LoR2C and its optimized variants significantly reduce parameter overhead while maintaining or even improving performance, outperforming existing mainstream parameter-efficient fine-tuning methods.Our code is publicly available at https://github.com/Oblivioniss/LoR2C.

LoR2C : Low-Rank Residual Connection Adaptation for Parameter-Efficient Fine-Tuning

TL;DR

LoR2C introduces a low-rank residual connection adaptation for parameter-efficient fine-tuning, combining residual pathways with low-rank matrices so that layer updates are captured by with , reducing tunable parameters and alleviating gradient vanishing. It further offers ShareLoR2C, MergeLoR2C, and InjectLoR2C to trade off parameter count and performance via parameter sharing, dynamic merging, and rank-aware injections, guided by the Shape of Feature Space (SFS) metric. Across GLUE with RoBERTa-base and instruction-tuning with LLAMA2-7B, LoR2C variants achieve competitive or superior results using far fewer trainable parameters than full fine-tuning or many PEFT baselines, including strong performance on BBH and HEval benchmarks. The work demonstrates practical gains in efficiency and gradient propagation for Transformer fine-tuning, highlighting a versatile direction for deploying large models under resource constraints, while noting added complexity and the need for broader scalability validation.

Abstract

In recent years, pretrained large language models have demonstrated outstanding performance across various natural language processing tasks. However, full-parameter fine-tuning methods require adjusting all model parameters, leading to immense computational resource demands. Although parameter-efficient fine-tuning methods like LoRA have significantly reduced the number of parameters, they still face challenges such as gradient vanishing and the potential for further parameter reduction. To address these issues, this paper proposes a novel parameter-efficient fine-tuning method called LoR2C (Low-Rank Residual Connection Adaptation). LoR2C introduces residual connections with low-rank matrices within the model layers, which not only reduces the number of fine-tuning parameters but also effectively alleviates the gradient vanishing problem. Additionally, this paper presents three optimization variants of LoR2C: ShareLoR2C, MergeLoR2C, and InjectLoR2C. These variants further improve parameter efficiency and model performance through parameter sharing, module merging, and injection mechanisms, respectively. Experimental results on multiple natural language understanding and natural language generation tasks demonstrate that LoR2C and its optimized variants significantly reduce parameter overhead while maintaining or even improving performance, outperforming existing mainstream parameter-efficient fine-tuning methods.Our code is publicly available at https://github.com/Oblivioniss/LoR2C.

Paper Structure

This paper contains 19 sections, 10 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Explanation of LoR2C: (a) The figure illustrates the differences in positioning between LoR2C, LoRA, and Adapter. LoR2C introduces an additional residual connection within the Transformer layers to enhance gradient propagation, effectively mitigating the gradient vanishing problem. (b) The figure provides a detailed explanation of the low-rank decomposition process of matrix W in the LoR2C module, where W=BA. By decomposing W into two low-rank matrices A and B, LoR2C further reduces the parameter count.
  • Figure 2: Heatmap of the top 50 singular values of the $W$ matrix, averaged across all 12 layers during training on the MRPC dataset. The vertical axis represents the index of singular values sorted in descending order, and the horizontal axis represents the training epochs (up to 50). The dimension of the $W$ matrix is $\text{in\_features} \times 128$ (where $\text{in\_features} = 768$). This experiment aims to explore whether the $W$ matrix exhibits low-rank properties during training. The heatmap shows that only the first few singular values have significant magnitudes, while the majority diminish rapidly, indicating that the $W$ matrix retains low-rank characteristics. Training settings include a batch size of 64, a maximum sequence length of 256, and a learning rate of $4 \times 10^{-4}$.
  • Figure 3: An illustration of the architecture of ShareLoR2C. The figure demonstrates how the matrix $A$ is shared across all layers, while each layer retains its own independent matrix $B_t$ (e.g., $B_1, B_2, B_3$).
  • Figure 4: This figure illustrates the MergeLoR2C mechanism. In the initial structure (leftmost panel), each layer contains a LoR2C module ($\text{LoR}^2\text{C}_1, \text{LoR}^2\text{C}_2, \dots, \text{LoR}^2\text{C}_6$). The SFS metricss are used to evaluate the information content of each module (second panel). Layers with the lowest combined information scores are identified, and their corresponding LoR2C modules are merged into a single module (third panel). This process is repeated multiple times, progressively reducing the number of LoR2C modules. The final architecture (rightmost panel) incorporates merged modules ($\text{LoR}^2\text{C}_1', \text{LoR}^2\text{C}_2', \text{LoR}^2\text{C}_3'$) across multiple layers, along with residual connections to preserve gradient flow and ensure parameter-efficient.
  • Figure 5: This figure illustrates the InjectLoR2C mechanism. In the initial structure (leftmost panel), each layer contains a LoR2C module ($\text{LoR}^2\text{C}_1, \text{LoR}^2\text{C}_2, \dots, \text{LoR}^2\text{C}_6$). The SFS metrics is used to evaluate the information content of each module (second panel). Based on the metric, the module with the lowest information score is identified, and its corresponding LoR2C module is replaced by a lower-rank LoRA module (third panel). This injection process is repeated multiple times, progressively reducing the rank of selected modules. The final architecture (rightmost panel) integrates the injected LoRA modules into the Transformer layers, where LoRA modules are applied to the query ($W_Q$) and value ($W_V$) weight matrices.
  • ...and 2 more figures