Table of Contents
Fetching ...

Low-Rank Interconnected Adaptation across Layers

Yibo Zhong, Jinman Zhao, Yao Zhou

TL;DR

The paper tackles the limited expressiveness of LoRA arising from rigid low-rank updates under fixed budgets. It introduces Lily, an interconnected PEFT framework with locally shared $A$ adapters and globally shared $B$ experts connected via a data-dependent router, enabling higher-rank updates $ΔW$ under the same or fewer parameters and enabling cross-layer information flow. Through extensive experiments across NLP, vision, and multimodal tasks, Lily consistently outperforms baselines while preserving hardware efficiency, and it provides insight into rank, granularity, and selectivity mechanisms. These results suggest Lily as a flexible, architecture-agnostic PEFT method with practical potential for scalable fine-tuning of large foundation models.

Abstract

Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning (PEFT) method that learns weight updates $ΔW = AB$ for pretrained weights $W$ through low-rank adapters $A$ and $B$. While LoRA ensures hardware efficiency, its low-rank weight updates limit adaptation performance. In this paper, we propose low-rank interconnected adaptation across layers (Lily), a novel PEFT method that introduces an interconnected framework with locally shared $A$ and globally shared $B$ experts. This structure eliminates redundant per-layer $AB$ pairs, enabling higher-rank $ΔW$ with equal or fewer parameters. To enhance expressiveness, we use data-dependent routers to determine $A$-$B$ interconnections, preventing $B$ experts from converging to the same behavior and improving representational power across domains. Experiments across modalities, architectures, and model sizes demonstrate Lily's superior performance and efficiency. GitHub: https://github.com/yibozhong/lily

Low-Rank Interconnected Adaptation across Layers

TL;DR

The paper tackles the limited expressiveness of LoRA arising from rigid low-rank updates under fixed budgets. It introduces Lily, an interconnected PEFT framework with locally shared adapters and globally shared experts connected via a data-dependent router, enabling higher-rank updates under the same or fewer parameters and enabling cross-layer information flow. Through extensive experiments across NLP, vision, and multimodal tasks, Lily consistently outperforms baselines while preserving hardware efficiency, and it provides insight into rank, granularity, and selectivity mechanisms. These results suggest Lily as a flexible, architecture-agnostic PEFT method with practical potential for scalable fine-tuning of large foundation models.

Abstract

Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning (PEFT) method that learns weight updates for pretrained weights through low-rank adapters and . While LoRA ensures hardware efficiency, its low-rank weight updates limit adaptation performance. In this paper, we propose low-rank interconnected adaptation across layers (Lily), a novel PEFT method that introduces an interconnected framework with locally shared and globally shared experts. This structure eliminates redundant per-layer pairs, enabling higher-rank with equal or fewer parameters. To enhance expressiveness, we use data-dependent routers to determine - interconnections, preventing experts from converging to the same behavior and improving representational power across domains. Experiments across modalities, architectures, and model sizes demonstrate Lily's superior performance and efficiency. GitHub: https://github.com/yibozhong/lily
Paper Structure (46 sections, 7 equations, 14 figures, 17 tables)

This paper contains 46 sections, 7 equations, 14 figures, 17 tables.

Figures (14)

  • Figure 1: Dynamics of LoRA and Lily. In this 6-layer example with a fixed overall parameter budget, LoRA allocates the same parameter budget to each layer, resulting in small rank updates for the weights. Lily overcomes this by employing a small number of shared adapters with a much larger rank, achieving higher-rank updates while using the same or even a smaller parameter budget. Considering the different characteristics, and to make the adaptation more dynamic, the adapters are mixed according to a data-dependent router, represented by $R$.
  • Figure 2: Qualitative results of subject-driven generation. Lily's results align better with prompts, featuring more accurate color, environment, and shape.
  • Figure 3: Actual rank of the weight updates. The weight updates are of shape $768 \times 768$. We run 20 epochs for COLA, MRPC, and STS-B, and 3 epochs for SST-2. It can be easily observed that the weight updates from Lily have notably higher rank than those from LoRA. Note that the reported rank is computed from accumulated weight updates over multiple epochs.
  • Figure 4: Visualization of accumulated assigned weight for $B$ experts by a router across various layers. Example here uses layer of index 2, 13 and 22 to represent shallow, middle and deep layers. The reported values are based on the accumulated router outputs over multiple epochs.
  • Figure 5: Impact of attention granularity (i.e., the choice of how many $A$s and $B$s) on the performance. We choose 12 out of 19 tasks from VTAB-1K for a comprehensive understanding.
  • ...and 9 more figures