FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts

Heming Zou; Yunliang Zang; Wutong Xu; Yao Zhu; Xiangyang Ji

FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts

Heming Zou, Yunliang Zang, Wutong Xu, Yao Zhu, Xiangyang Ji

TL;DR

FlyLoRA tackles parameter interference and inefficiency in MoE-based LoRA by introducing an implicit, rank-wise MoE where a fixed sparse random projection acts as the router. By activating only the top-k rank-1 components after projection, FlyLoRA achieves intra-task decoupling without explicit routing parameters, and theoretical results show distance preservation and reduced gradient covariance. It also enables training-free multi-task model merging via approximate orthogonality between independent random projections, mitigating inter-task interference. Empirically, FlyLoRA improves accuracy across knowledge, science, math, and code tasks with lower activated parameter counts and demonstrates strong robustness in single-task and multi-task settings. The work blends neuroscience-inspired design with PEFT, offering a scalable, efficient approach to decoupled task learning and merging.

Abstract

Low-Rank Adaptation (LoRA) is a widely used parameter-efficient fine-tuning method for foundation models, but it suffers from parameter interference, resulting in suboptimal performance. Although Mixture-of-Experts (MoE)-based LoRA variants show promise in mitigating intra-task correlations in single-task instruction tuning, they introduce additional router parameters and remain ineffective in multi-task model merging where inter-task interference arises. Inspired by the fly olfactory circuit, we propose FlyLoRA, an implicit MoE-based LoRA variant that introduces: (1) rank-wise expert activation in the up-projection matrix, and (2) an implicit router that unifies expert routing and down-projection, where a frozen sparse random projection matrix replaces the traditional dense trainable version. This design resolves the trade-off between intra-task decorrelation and computational efficiency by eliminating the need for an explicit router, while inherently mitigating inter-task interference due to the orthogonality property of random matrices. Extensive experiments across four domains -- general knowledge understanding, scientific question answering, mathematical reasoning, and code generation -- demonstrate consistent performance improvements over existing methods. Beyond empirical gains, FlyLoRA highlights how biological structures can inspire innovations in AI technologies. Code is available at https://github.com/gfyddha/FlyLoRA.

FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts

TL;DR

Abstract

FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (9)