Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach

Yuzhu Mao; Siqi Ping; Zihao Zhao; Yang Liu; Wenbo Ding

Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach

Yuzhu Mao, Siqi Ping, Zihao Zhao, Yang Liu, Wenbo Ding

TL;DR

This work investigates the intrinsic dimension $r$ of LoRA updates and demonstrates that increasing $r$ improves generalization under a fixed trainable-parameter budget. It introduces RM-LoRA, which combines an orthogonality-promoting regularizer with gradient masking to encourage higher intrinsic rank while controlling updates, underpinned by theory linking LoRA approximation error to the $r$-th singular value of weight discrepancies. Empirically, RM-LoRA outperforms original LoRA and state‑of‑the‑art variants on open-source vision and language datasets, achieving stronger generalization with the same or smaller parameter budgets. The results suggest that focusing on intrinsic-dimension exploration can significantly enhance the practicality of parameter-efficient fine-tuning for large pre-trained models, especially on mobile and edge devices.

Abstract

Large pre-trained models, such as large language models (LLMs), present significant resource challenges for fine-tuning due to their extensive parameter sizes, especially for applications in mobile systems. To address this, Low-Rank Adaptation (LoRA) has been developed to reduce resource consumption while maintaining satisfactory fine-tuning results. Despite its effectiveness, the original LoRA method faces challenges of suboptimal performance and overfitting. This paper investigates the intrinsic dimension of the matrix updates approximated by the LoRA method and reveals the performance benefits of increasing this intrinsic dimension. By employing regularization and a gradient masking method that encourages higher intrinsic dimension, the proposed method, termed Regularized and Masked LoRA (RM-LoRA), achieves superior generalization performance with the same or lower trainable parameter budget compared to the original LoRA and its latest variants across various open-source vision and language datasets.

Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach

TL;DR

This work investigates the intrinsic dimension

of LoRA updates and demonstrates that increasing

improves generalization under a fixed trainable-parameter budget. It introduces RM-LoRA, which combines an orthogonality-promoting regularizer with gradient masking to encourage higher intrinsic rank while controlling updates, underpinned by theory linking LoRA approximation error to the

-th singular value of weight discrepancies. Empirically, RM-LoRA outperforms original LoRA and state‑of‑the‑art variants on open-source vision and language datasets, achieving stronger generalization with the same or smaller parameter budgets. The results suggest that focusing on intrinsic-dimension exploration can significantly enhance the practicality of parameter-efficient fine-tuning for large pre-trained models, especially on mobile and edge devices.

Abstract

Paper Structure (13 sections, 1 theorem, 10 equations, 1 figure, 4 tables, 1 algorithm)

This paper contains 13 sections, 1 theorem, 10 equations, 1 figure, 4 tables, 1 algorithm.

Introduction
Related Works
RM-LoRA Method
Preliminary
Influence of the Intrinsic Dimension of LoRA Adapter $\Delta \mathbf{W}$
Regularization on LoRA Weights
Gradient Masking for Partial Updates
Experiments
Experimental Setup
Image Classification
Natural Language Understanding
Question Answering
Conclusion

Key Result

Theorem 3.1

If $\sum_{l\in \mathcal{P}_i} R_l \geq \text{rank} (\bar{\mathbf{W}}_i - \prod_{l\in \mathcal{P}_i} \mathbf{W}_l)$ for all $i \in [\hat{L}]$, there exists LoRA adapters $(\Delta \mathbf{W}_l)_{l=1}^L$ with $\text{rank}(\Delta \mathbf{W}_l) \leq R_l$ and biases $(\mathbf{\hat{b}}_l)_{l=1}^L$ such Then, there exists LoRA adapters $(\Delta \mathbf{W}_l)_{l=1}^L$ with $\text{rank}(\Delta \mathbf{W

Figures (1)

Figure 1: Results with ViT model on CIFAR-100.

Theorems & Definitions (1)

Theorem 3.1: Theorem 6 in zeng2023expressive

Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach

TL;DR

Abstract

Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (1)