Table of Contents
Fetching ...

Mixture-of-Subspaces in Low-Rank Adaptation

Taiqiang Wu, Jiahao Wang, Zhe Zhao, Ngai Wong

TL;DR

MoSLoRA introduces a learnable subspace mixer to fuse multiple low-rank LoRA subspaces, yielding improved performance across language, vision-language, and diffusion-model tasks while maintaining mergeable inference and minimal parameter overhead. By reinterpreting LoRA as a subspace operation and expanding to rank-1 and composed subspaces, the method achieves richer representations without incurring latency. Empirical results across commonsense reasoning, visual instruction tuning, and subject-driven generation demonstrate consistent wins over vanilla LoRA and several baselines, with favorable initialization choices for the mixer. The work situates MoSLoRA within the parameter-efficient fine-tuning landscape, contrasting its fixed-subspace fusion with MoE approaches and highlighting practical benefits for multi-modal adaptation and robust fine-tuning.

Abstract

In this paper, we introduce a subspace-inspired Low-Rank Adaptation (LoRA) method, which is computationally efficient, easy to implement, and readily applicable to large language, multimodal, and diffusion models. Initially, we equivalently decompose the weights of LoRA into two subspaces, and find that simply mixing them can enhance performance. To study such a phenomenon, we revisit it through a fine-grained subspace lens, showing that such modification is equivalent to employing a fixed mixer to fuse the subspaces. To be more flexible, we jointly learn the mixer with the original LoRA weights, and term the method Mixture-of-Subspaces LoRA (MoSLoRA). MoSLoRA consistently outperforms LoRA on tasks in different modalities, including commonsense reasoning, visual instruction tuning, and subject-driven text-to-image generation, demonstrating its effectiveness and robustness. Codes are available at https://github.com/wutaiqiang/MoSLoRA.

Mixture-of-Subspaces in Low-Rank Adaptation

TL;DR

MoSLoRA introduces a learnable subspace mixer to fuse multiple low-rank LoRA subspaces, yielding improved performance across language, vision-language, and diffusion-model tasks while maintaining mergeable inference and minimal parameter overhead. By reinterpreting LoRA as a subspace operation and expanding to rank-1 and composed subspaces, the method achieves richer representations without incurring latency. Empirical results across commonsense reasoning, visual instruction tuning, and subject-driven generation demonstrate consistent wins over vanilla LoRA and several baselines, with favorable initialization choices for the mixer. The work situates MoSLoRA within the parameter-efficient fine-tuning landscape, contrasting its fixed-subspace fusion with MoE approaches and highlighting practical benefits for multi-modal adaptation and robust fine-tuning.

Abstract

In this paper, we introduce a subspace-inspired Low-Rank Adaptation (LoRA) method, which is computationally efficient, easy to implement, and readily applicable to large language, multimodal, and diffusion models. Initially, we equivalently decompose the weights of LoRA into two subspaces, and find that simply mixing them can enhance performance. To study such a phenomenon, we revisit it through a fine-grained subspace lens, showing that such modification is equivalent to employing a fixed mixer to fuse the subspaces. To be more flexible, we jointly learn the mixer with the original LoRA weights, and term the method Mixture-of-Subspaces LoRA (MoSLoRA). MoSLoRA consistently outperforms LoRA on tasks in different modalities, including commonsense reasoning, visual instruction tuning, and subject-driven text-to-image generation, demonstrating its effectiveness and robustness. Codes are available at https://github.com/wutaiqiang/MoSLoRA.
Paper Structure (33 sections, 18 equations, 10 figures, 8 tables)

This paper contains 33 sections, 18 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Comparison between vanilla LoRA and proposed MoSLoRA. In MoSLoRA, we employ learnable weights to mix more subspaces with negligible parameters (i.e., $(d_1+d_2+r)r$ vs $(d_1+d_2)r$ and $d_1+d_2 \gg r$ typically).
  • Figure 2: Overview of decomposing vanilla LoRA into two subspaces and mixing them. Compared to vanilla LoRA, two-subspaces-mixing LoRA contains two extra entries.
  • Figure 3: The subspace view (rank=1) and composed view for vanilla LoRA, two-subspaces-mixing LoRA, and proposed MoSLoRA. In MoSLoRA, we employ a learnable mixer to fuse more information and more flexibly.
  • Figure 4: Comparison of MoSLoRA and LoRA on the HellaSwag benchmark with fewer training samples.
  • Figure 5: Normalized performance on 6 ability dimensions in MMBench EN/CN for QLoRA and QMoSLoRA when fintuning InternLM2. MoSLoRA significantly improves the reasoning ability over LoRA.
  • ...and 5 more figures