Table of Contents
Fetching ...

MSPLoRA: A Multi-Scale Pyramid Low-Rank Adaptation for Efficient Model Fine-Tuning

Jiancheng Zhao, Xingda Yu, Zhen Yang

TL;DR

The paper tackles the inefficiency of full fine-tuning for large language models and the rigidity of fixed-rank LoRA.It introduces MSPLoRA, a multi-scale pyramid LoRA that partitions updates into global, mid-level, and layer-specific components with rank decay to decouple information across hierarchical levels.Through extensive experiments on GLUE and instruction-following benchmarks, MSPLoRA achieves stronger performance with far fewer trainable parameters, validated by SVD and redundancy analyses.The approach provides a scalable, efficient protocol for parameter-efficient fine-tuning in large transformers, supported by ablations and analysis.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) has become an essential approach for adapting large-scale pre-trained models while reducing computational costs. Among PEFT methods, LoRA significantly reduces trainable parameters by decomposing weight updates into low-rank matrices. However, traditional LoRA applies a fixed rank across all layers, failing to account for the varying complexity of hierarchical information, which leads to inefficient adaptation and redundancy. To address this, we propose MSPLoRA (Multi-Scale Pyramid LoRA), which introduces Global Shared LoRA, Mid-Level Shared LoRA, and Layer-Specific LoRA to capture global patterns, mid-level features, and fine-grained information, respectively. This hierarchical structure reduces inter-layer redundancy while maintaining strong adaptation capability. Experiments on various NLP tasks demonstrate that MSPLoRA achieves more efficient adaptation and better performance while significantly reducing the number of trainable parameters. Furthermore, additional analyses based on Singular Value Decomposition validate its information decoupling ability, highlighting MSPLoRA as a scalable and effective optimization strategy for parameter-efficient fine-tuning in large language models. Our code is available at https://github.com/Oblivioniss/MSPLoRA.

MSPLoRA: A Multi-Scale Pyramid Low-Rank Adaptation for Efficient Model Fine-Tuning

TL;DR

The paper tackles the inefficiency of full fine-tuning for large language models and the rigidity of fixed-rank LoRA.It introduces MSPLoRA, a multi-scale pyramid LoRA that partitions updates into global, mid-level, and layer-specific components with rank decay to decouple information across hierarchical levels.Through extensive experiments on GLUE and instruction-following benchmarks, MSPLoRA achieves stronger performance with far fewer trainable parameters, validated by SVD and redundancy analyses.The approach provides a scalable, efficient protocol for parameter-efficient fine-tuning in large transformers, supported by ablations and analysis.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) has become an essential approach for adapting large-scale pre-trained models while reducing computational costs. Among PEFT methods, LoRA significantly reduces trainable parameters by decomposing weight updates into low-rank matrices. However, traditional LoRA applies a fixed rank across all layers, failing to account for the varying complexity of hierarchical information, which leads to inefficient adaptation and redundancy. To address this, we propose MSPLoRA (Multi-Scale Pyramid LoRA), which introduces Global Shared LoRA, Mid-Level Shared LoRA, and Layer-Specific LoRA to capture global patterns, mid-level features, and fine-grained information, respectively. This hierarchical structure reduces inter-layer redundancy while maintaining strong adaptation capability. Experiments on various NLP tasks demonstrate that MSPLoRA achieves more efficient adaptation and better performance while significantly reducing the number of trainable parameters. Furthermore, additional analyses based on Singular Value Decomposition validate its information decoupling ability, highlighting MSPLoRA as a scalable and effective optimization strategy for parameter-efficient fine-tuning in large language models. Our code is available at https://github.com/Oblivioniss/MSPLoRA.

Paper Structure

This paper contains 25 sections, 10 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The multi-scale pyramid structure of MSPLoRA is illustrated, where the pretrained weights remain frozen, and the LoRA components are divided into global shared LoRA, mid-level shared LoRA, and layer-specific LoRA, which respectively learn global patterns, mid-level features, and fine-grained information. Meanwhile, the structure on the right indicates that the rank of LoRA components gradually decreases from large-scale to small-scale, enabling more targeted information modeling at different hierarchical levels while reducing parameter redundancy and improving fine-tuning efficiency.
  • Figure 2: Illustration of the multi-scale pyramid structure of MSPLoRA and its application to the Transformer hierarchy.
  • Figure 3: This figure shows the SVD heatmaps of different LoRA components in MSPLoRA. The x-axis denotes training epochs, and the y-axis represents singular value dimensions, with color indicating average magnitude. The global LoRA (left) has the highest singular value intensity and active dimensions, followed by the mid-level shared LoRA (middle), while the layer-specific LoRA (right) has the lowest. This supports the effectiveness of the proposed rank settings in our multi-scale pyramid design.
  • Figure 4: KL divergence difference heatmap between MSPLoRA and standard LoRA, measuring the divergence of singular value spectra across layer pairs. Each cell $(i,j)$ indicates how much more (or less) MSPLoRA separates the layer-specific LoRA components than standard LoRA. Positive values—especially along the adjacent layer bands—suggest reduced redundancy and enhanced layer-specific modeling in MSPLoRA.