Low-Rank Interconnected Adaptation across Layers

Yibo Zhong; Jinman Zhao; Yao Zhou

Low-Rank Interconnected Adaptation across Layers

Yibo Zhong, Jinman Zhao, Yao Zhou

TL;DR

The paper tackles the limited expressiveness of LoRA arising from rigid low-rank updates under fixed budgets. It introduces Lily, an interconnected PEFT framework with locally shared $A$ adapters and globally shared $B$ experts connected via a data-dependent router, enabling higher-rank updates $ΔW$ under the same or fewer parameters and enabling cross-layer information flow. Through extensive experiments across NLP, vision, and multimodal tasks, Lily consistently outperforms baselines while preserving hardware efficiency, and it provides insight into rank, granularity, and selectivity mechanisms. These results suggest Lily as a flexible, architecture-agnostic PEFT method with practical potential for scalable fine-tuning of large foundation models.

Abstract

Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning (PEFT) method that learns weight updates $ΔW = AB$ for pretrained weights $W$ through low-rank adapters $A$ and $B$. While LoRA ensures hardware efficiency, its low-rank weight updates limit adaptation performance. In this paper, we propose low-rank interconnected adaptation across layers (Lily), a novel PEFT method that introduces an interconnected framework with locally shared $A$ and globally shared $B$ experts. This structure eliminates redundant per-layer $AB$ pairs, enabling higher-rank $ΔW$ with equal or fewer parameters. To enhance expressiveness, we use data-dependent routers to determine $A$-$B$ interconnections, preventing $B$ experts from converging to the same behavior and improving representational power across domains. Experiments across modalities, architectures, and model sizes demonstrate Lily's superior performance and efficiency. GitHub: https://github.com/yibozhong/lily

Low-Rank Interconnected Adaptation across Layers

TL;DR

The paper tackles the limited expressiveness of LoRA arising from rigid low-rank updates under fixed budgets. It introduces Lily, an interconnected PEFT framework with locally shared

adapters and globally shared

experts connected via a data-dependent router, enabling higher-rank updates

under the same or fewer parameters and enabling cross-layer information flow. Through extensive experiments across NLP, vision, and multimodal tasks, Lily consistently outperforms baselines while preserving hardware efficiency, and it provides insight into rank, granularity, and selectivity mechanisms. These results suggest Lily as a flexible, architecture-agnostic PEFT method with practical potential for scalable fine-tuning of large foundation models.

Abstract

Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning (PEFT) method that learns weight updates

for pretrained weights

through low-rank adapters

and

. While LoRA ensures hardware efficiency, its low-rank weight updates limit adaptation performance. In this paper, we propose low-rank interconnected adaptation across layers (Lily), a novel PEFT method that introduces an interconnected framework with locally shared

and globally shared

experts. This structure eliminates redundant per-layer

pairs, enabling higher-rank

with equal or fewer parameters. To enhance expressiveness, we use data-dependent routers to determine

interconnections, preventing

experts from converging to the same behavior and improving representational power across domains. Experiments across modalities, architectures, and model sizes demonstrate Lily's superior performance and efficiency. GitHub: https://github.com/yibozhong/lily

Paper Structure (46 sections, 7 equations, 14 figures, 17 tables)

This paper contains 46 sections, 7 equations, 14 figures, 17 tables.

Introduction
Related Work
Methodology
Downward Projection and Selective Weight Allocation
Weighted Mixture of Experts and Upward Projection
Experiments
Common Sense Reasoning
Natural Language Understanding
Subject-driven Image Generation
Visual Adaptation Benchmark
Understanding Lily
Does It Have High-Rank Weight Updates?
What's the Influence of Adapter Granularity?
Does It Exhibit Selectivity?
What's the Hardware Efficiency?
...and 31 more sections

Figures (14)

Figure 1: Dynamics of LoRA and Lily. In this 6-layer example with a fixed overall parameter budget, LoRA allocates the same parameter budget to each layer, resulting in small rank updates for the weights. Lily overcomes this by employing a small number of shared adapters with a much larger rank, achieving higher-rank updates while using the same or even a smaller parameter budget. Considering the different characteristics, and to make the adaptation more dynamic, the adapters are mixed according to a data-dependent router, represented by $R$.
Figure 2: Qualitative results of subject-driven generation. Lily's results align better with prompts, featuring more accurate color, environment, and shape.
Figure 3: Actual rank of the weight updates. The weight updates are of shape $768 \times 768$. We run 20 epochs for COLA, MRPC, and STS-B, and 3 epochs for SST-2. It can be easily observed that the weight updates from Lily have notably higher rank than those from LoRA. Note that the reported rank is computed from accumulated weight updates over multiple epochs.
Figure 4: Visualization of accumulated assigned weight for $B$ experts by a router across various layers. Example here uses layer of index 2, 13 and 22 to represent shallow, middle and deep layers. The reported values are based on the accumulated router outputs over multiple epochs.
Figure 5: Impact of attention granularity (i.e., the choice of how many $A$s and $B$s) on the performance. We choose 12 out of 19 tasks from VTAB-1K for a comprehensive understanding.
...and 9 more figures

Low-Rank Interconnected Adaptation across Layers

TL;DR

Abstract

Low-Rank Interconnected Adaptation across Layers

Authors

TL;DR

Abstract

Table of Contents

Figures (14)