MiSS: Revisiting the Trade-off in LoRA with an Efficient Shard-Sharing Structure

Jiale Kang; Qingyu Yin

MiSS: Revisiting the Trade-off in LoRA with an Efficient Shard-Sharing Structure

Jiale Kang, Qingyu Yin

TL;DR

This work challenges the slow convergence of LoRA by proposing MiSS, a shard-sharing PEFT that updates a single small matrix to generate a large low-rank update via expansion. The efficient MiSS$^e$ variant uses input aggregation to further reduce memory and FLOPs, enabling scalable serving. Across NLP and vision benchmarks, MiSS achieves superior or competitive accuracy while significantly lowering training cost and memory footprint, placing it on an advantageous Pareto frontier compared to existing PEFT methods. The study also provides gradient-norm analyses and a thorough memory-efficiency comparison, highlighting MiSS as a practical, general PEFT solution with broad applicability.

Abstract

Low-Rank Adaptation (LoRA) is a widely adopted technique for parameter-efficient fine-tuning, but its slow convergence has spurred the development of numerous variants. Nevertheless, existing methods often fail to improve performance, memory footprint, and computational efficiency simultaneously. To address this challenge, we revisit the causes of LoRA's slow convergence. Building on these insights, we propose Matrix Shard Sharing (MiSS), which updates shards of the original weight matrix using a single shared trainable matrix $\boldsymbol{D}$, initialized to zeros. To simultaneously ensure computational efficiency, low memory footprint, and scalable serving, we introduce MiSS$^e$. Both theoretical analysis and empirical results demonstrate that our method reduces optimization complexity without compromising performance, thereby achieving a more favorable trade-off among performance, memory, and efficiency. Furthermore, we conduct a comprehensive comparative analysis of various PEFT methods, evaluating their memory usage, initialization overhead, and computational efficiency. By mapping the Pareto frontier across these dimensions, we show that MiSS occupies a favorable position, effectively capturing the advantages of prior approaches.

MiSS: Revisiting the Trade-off in LoRA with an Efficient Shard-Sharing Structure

TL;DR

This work challenges the slow convergence of LoRA by proposing MiSS, a shard-sharing PEFT that updates a single small matrix to generate a large low-rank update via expansion. The efficient MiSS

variant uses input aggregation to further reduce memory and FLOPs, enabling scalable serving. Across NLP and vision benchmarks, MiSS achieves superior or competitive accuracy while significantly lowering training cost and memory footprint, placing it on an advantageous Pareto frontier compared to existing PEFT methods. The study also provides gradient-norm analyses and a thorough memory-efficiency comparison, highlighting MiSS as a practical, general PEFT solution with broad applicability.

Abstract

, initialized to zeros. To simultaneously ensure computational efficiency, low memory footprint, and scalable serving, we introduce MiSS

. Both theoretical analysis and empirical results demonstrate that our method reduces optimization complexity without compromising performance, thereby achieving a more favorable trade-off among performance, memory, and efficiency. Furthermore, we conduct a comprehensive comparative analysis of various PEFT methods, evaluating their memory usage, initialization overhead, and computational efficiency. By mapping the Pareto frontier across these dimensions, we show that MiSS occupies a favorable position, effectively capturing the advantages of prior approaches.

Paper Structure (36 sections, 5 equations, 6 figures, 12 tables, 1 algorithm)

This paper contains 36 sections, 5 equations, 6 figures, 12 tables, 1 algorithm.

Introduction
Gradient Norm Analysis.
Efficient Implementation
Preliminaries and Related Works
Low-Rank Adaptation (LoRA).
Improvements of LoRA.
No Free Lunch: Balancing Between Adaptability and Efficiency
Empirically Benchmarking the Adaptability of LoRA Variants
Experimental Setup.
Results.
Efficiency Analysis of LoRA Variants
Metrics.
Results.
MiSS: Shard Sharing for the Performance and Efficiency Tradeoff
Method Overview
...and 21 more sections

Figures (6)

Figure 1: Comparison of initial gradient norms across different training methods and the effect of rank. Results are shown for LLaMA2-7B and Qwen3-4B on the Math and Code datasets.
Figure 2: No Free Launch Experiment. Left. The training loss curves of all methods. Middle. Initialization time w/ parameters. Right. Training time w/ parameters.
Figure 3: Left. Structural diagram of $\Delta \boldsymbol{W}$ in LoRA and MiSS. Right. PyTorch-style pseudocode illustrating the implementation of MiSS.
Figure 4: Pareto front of MiSS comparing with other PEFT methods. We select three more methods as the baseline on the balancing of memory and performance.
Figure 5: Loss curves of LLaMA2-7B fine-tuned on MetaMathQA using LoRA and MiSS(̇a) Loss vs. tokens. (b) Loss vs. training time.
...and 1 more figures

MiSS: Revisiting the Trade-off in LoRA with an Efficient Shard-Sharing Structure

TL;DR

Abstract

MiSS: Revisiting the Trade-off in LoRA with an Efficient Shard-Sharing Structure

Authors

TL;DR

Abstract

Table of Contents

Figures (6)