Low-Rank Interconnected Adaptation across Layers
Yibo Zhong, Jinman Zhao, Yao Zhou
TL;DR
The paper tackles the limited expressiveness of LoRA arising from rigid low-rank updates under fixed budgets. It introduces Lily, an interconnected PEFT framework with locally shared $A$ adapters and globally shared $B$ experts connected via a data-dependent router, enabling higher-rank updates $ΔW$ under the same or fewer parameters and enabling cross-layer information flow. Through extensive experiments across NLP, vision, and multimodal tasks, Lily consistently outperforms baselines while preserving hardware efficiency, and it provides insight into rank, granularity, and selectivity mechanisms. These results suggest Lily as a flexible, architecture-agnostic PEFT method with practical potential for scalable fine-tuning of large foundation models.
Abstract
Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning (PEFT) method that learns weight updates $ΔW = AB$ for pretrained weights $W$ through low-rank adapters $A$ and $B$. While LoRA ensures hardware efficiency, its low-rank weight updates limit adaptation performance. In this paper, we propose low-rank interconnected adaptation across layers (Lily), a novel PEFT method that introduces an interconnected framework with locally shared $A$ and globally shared $B$ experts. This structure eliminates redundant per-layer $AB$ pairs, enabling higher-rank $ΔW$ with equal or fewer parameters. To enhance expressiveness, we use data-dependent routers to determine $A$-$B$ interconnections, preventing $B$ experts from converging to the same behavior and improving representational power across domains. Experiments across modalities, architectures, and model sizes demonstrate Lily's superior performance and efficiency. GitHub: https://github.com/yibozhong/lily
