Table of Contents
Fetching ...

MLAE: Masked LoRA Experts for Visual Parameter-Efficient Fine-Tuning

Junjie Wang, Guangjing Yang, Wentao Chen, Huahui Yi, Xiaohu Wu, Zhouchen Lin, Qicheng Lao

TL;DR

MLAE tackles parameter redundancy in visual PEFT by decomposing LoRA updates into independent rank-1 experts and applying adaptive, expert-level masking to promote diverse, anisotropic learning. Through cellular decomposition and various masking regimes, especially stochastic masking with uniform dropout, MLAE reduces parameter similarity among experts and achieves state-of-the-art averages on VTAB-1k ($78.8\%$) and FGVC ($90.9\%$) with roughly half the trainable parameters. Analyses of attention maps and cosine similarity support a more diverse and specialized feature learning among experts. The work offers practical improvements for efficient fine-tuning of vision transformers and provides code for reproducibility.

Abstract

In response to the challenges posed by the extensive parameter updates required for full fine-tuning of large-scale pre-trained models, parameter-efficient fine-tuning (PEFT) methods, exemplified by Low-Rank Adaptation (LoRA), have emerged. LoRA simplifies the fine-tuning process but may still struggle with a certain level of redundancy in low-rank matrices and limited effectiveness from merely increasing their rank. To address these issues, a natural idea is to enhance the independence and diversity of the learning process for the low-rank matrices. Therefore, we propose Masked LoRA Experts (MLAE), an innovative approach that applies the concept of masking to visual PEFT. Our method incorporates a cellular decomposition strategy that transforms a low-rank matrix into independent rank-1 submatrices, or "experts", thus enhancing independence. Additionally, we introduce a binary mask matrix that selectively activates these experts during training to promote more diverse and anisotropic learning, based on expert-level dropout strategies. Our investigations reveal that this selective activation not only enhances performance but also fosters a more diverse acquisition of knowledge with a marked decrease in parameter similarity among MLAE, significantly boosting the quality of the model. Remarkably, MLAE achieves new state-of-the-art (SOTA) performance with an average accuracy score of 78.8% on the VTAB-1k benchmark and 90.9% on the FGVC benchmark, surpassing the previous SOTA result by an average of 0.8% on both benchmarks with approximately half parameters. Our code is available at https://github.com/jie040109/MLAE.

MLAE: Masked LoRA Experts for Visual Parameter-Efficient Fine-Tuning

TL;DR

MLAE tackles parameter redundancy in visual PEFT by decomposing LoRA updates into independent rank-1 experts and applying adaptive, expert-level masking to promote diverse, anisotropic learning. Through cellular decomposition and various masking regimes, especially stochastic masking with uniform dropout, MLAE reduces parameter similarity among experts and achieves state-of-the-art averages on VTAB-1k () and FGVC () with roughly half the trainable parameters. Analyses of attention maps and cosine similarity support a more diverse and specialized feature learning among experts. The work offers practical improvements for efficient fine-tuning of vision transformers and provides code for reproducibility.

Abstract

In response to the challenges posed by the extensive parameter updates required for full fine-tuning of large-scale pre-trained models, parameter-efficient fine-tuning (PEFT) methods, exemplified by Low-Rank Adaptation (LoRA), have emerged. LoRA simplifies the fine-tuning process but may still struggle with a certain level of redundancy in low-rank matrices and limited effectiveness from merely increasing their rank. To address these issues, a natural idea is to enhance the independence and diversity of the learning process for the low-rank matrices. Therefore, we propose Masked LoRA Experts (MLAE), an innovative approach that applies the concept of masking to visual PEFT. Our method incorporates a cellular decomposition strategy that transforms a low-rank matrix into independent rank-1 submatrices, or "experts", thus enhancing independence. Additionally, we introduce a binary mask matrix that selectively activates these experts during training to promote more diverse and anisotropic learning, based on expert-level dropout strategies. Our investigations reveal that this selective activation not only enhances performance but also fosters a more diverse acquisition of knowledge with a marked decrease in parameter similarity among MLAE, significantly boosting the quality of the model. Remarkably, MLAE achieves new state-of-the-art (SOTA) performance with an average accuracy score of 78.8% on the VTAB-1k benchmark and 90.9% on the FGVC benchmark, surpassing the previous SOTA result by an average of 0.8% on both benchmarks with approximately half parameters. Our code is available at https://github.com/jie040109/MLAE.
Paper Structure (21 sections, 5 equations, 12 figures, 11 tables)

This paper contains 21 sections, 5 equations, 12 figures, 11 tables.

Figures (12)

  • Figure 1: Our proposed Masked LoRA Experts (MLAE) framework.
  • Figure 2: Different masking strategies.
  • Figure 2: Results on FGVC benchmark. "Tuned/Total" denotes the fraction of trainable parameters.
  • Figure 3: Domain-wise average scores on VTAB-1k benchmark. Our MLAE performs best in all three domains, especially in Structured domain, surpassing GLoRA by 3.2% under a similar budget.
  • Figure 4: Comparisons of average parameter similarity between MLAE and LoRA baselines.
  • ...and 7 more figures