Table of Contents
Fetching ...

Sparse Spectral LoRA: Routed Experts for Medical VLMs

Omid Nejati Manzari, Hojat Asgariandehkordi, Taha Koleilat, Yiming Xiao, Hassan Rivaz

Abstract

Large vision-language models (VLMs) excel on general benchmarks but often lack robustness in medical imaging, where heterogeneous supervision induces cross-dataset interference and sensitivity to data regime (i.e., how the supervisory signals are mixed). In realistic clinical workflows, data and tasks arrive sequentially, so naive continual training further leads to catastrophic forgetting. To address these challenges, we propose MedQwen, a parameter-efficient medical VLM that couples a spectrally routed Mixture-of-Experts (MoE) with a theoretically grounded scaling rule that aligns low-rank updates with a full-rank, fully fine-tuned MoE, without changing the base architecture. Concretely, we initialize each expert from non-overlapping singular value decomposition (SVD) segments of the pretrained weight and introduce a residual compensation and scaling scheme to enable stable expert specialization and consistent routing under distribution shift. Across 23 medical datasets covering visual question answering, report generation, radiology classification, and hallucination mitigation, MedQwen achieves strong, reliable performance: it approaches full fine-tuning on zero-shot classification with 339$\times$ fewer trainable parameters, and reduces sequential forgetting to $\sim$5\% where strong baselines degrade by $>$20-50\%.

Sparse Spectral LoRA: Routed Experts for Medical VLMs

Abstract

Large vision-language models (VLMs) excel on general benchmarks but often lack robustness in medical imaging, where heterogeneous supervision induces cross-dataset interference and sensitivity to data regime (i.e., how the supervisory signals are mixed). In realistic clinical workflows, data and tasks arrive sequentially, so naive continual training further leads to catastrophic forgetting. To address these challenges, we propose MedQwen, a parameter-efficient medical VLM that couples a spectrally routed Mixture-of-Experts (MoE) with a theoretically grounded scaling rule that aligns low-rank updates with a full-rank, fully fine-tuned MoE, without changing the base architecture. Concretely, we initialize each expert from non-overlapping singular value decomposition (SVD) segments of the pretrained weight and introduce a residual compensation and scaling scheme to enable stable expert specialization and consistent routing under distribution shift. Across 23 medical datasets covering visual question answering, report generation, radiology classification, and hallucination mitigation, MedQwen achieves strong, reliable performance: it approaches full fine-tuning on zero-shot classification with 339 fewer trainable parameters, and reduces sequential forgetting to 5\% where strong baselines degrade by 20-50\%.

Paper Structure

This paper contains 54 sections, 10 theorems, 62 equations, 12 figures, 17 tables, 1 algorithm.

Key Result

Theorem 1

Let $\tilde{W}_t$ denote the effective LoRA weight and $\tilde{g}_t$ its effective gradient, with $W_t$ and $g_t$ the corresponding quantities for full FT. Alignment follows from the pair of conditions: $\blacktriangleleft$$\blacktriangleleft$

Figures (12)

  • Figure 1: Model performances on three benchmarks when trained with different data configurations.
  • Figure 2: An overview of the proposed MedQwen approach.
  • Figure 3: SVD initialization vs. scaling $s$ and rank $r$ .
  • Figure 4: Optimization with SVD-structured MoE by separately aligning each expert. $W_{res}$ ensures the equivalent weight equals $W^{(0)}$ before optimization. Scaling aligns each expert’s equivalent gradients to those of full MoE FT.
  • Figure 5: Comparison of catastrophic forgetting across different fine-tuning methods in sequential learning. The plot illustrates the accuracy decay over 15 epochs for LoRA, MoELoRA, and our method.
  • ...and 7 more figures

Theorems & Definitions (15)

  • Theorem 1: Single model alignment.
  • Theorem 2: MoE alignment.
  • Theorem 3: Router moments.
  • Theorem 4: Residual matching objective.
  • Theorem 5
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • ...and 5 more