Table of Contents
Fetching ...

Not All Directions Matter: Toward Structured and Task-Aware Low-Rank Adaptation

Xi Xiao, Chenrui Ma, Yunbei Zhang, Chen Liu, Zhuxuanzi Wang, Yanshu Li, Lin Zhao, Guosheng Hu, Tianyang Wang, Hao Xu

Abstract

Low-Rank Adaptation (LoRA) has become a cornerstone of parameter-efficient fine-tuning (PEFT). Yet, its efficacy is hampered by two fundamental limitations: semantic drift, by treating all update directions with equal importance, and structural incoherence, from adapting layers independently, resulting in suboptimal, uncoordinated updates. To remedy these, we propose StructLoRA, a framework that addresses both limitations through a principled, dual-component design: (1) an Information Bottleneck-guided filter that prunes task-irrelevant directions to mitigate semantic drift, and (2) a lightweight, training-only graph-based coordinator that enforces inter-layer consistency to resolve structural incoherence. Extensive experiments across large language model , vision language model, and vision model (including LLaMA, LLaVA, and ViT) demonstrate that StructLoRA consistently establishes a new state-of-the-art, outperforming not only vanilla LoRA but also advanced dynamic rank allocation and sparsity-based methods. Notably, the benefits are particularly pronounced in challenging low-rank and low-data regimes. Crucially, since our proposed modules operate only during training, StructLoRA enhances performance with zero additional inference cost, advancing the focus of PEFT -- from mere parameter compression to a more holistic optimization of information quality and structural integrity.

Not All Directions Matter: Toward Structured and Task-Aware Low-Rank Adaptation

Abstract

Low-Rank Adaptation (LoRA) has become a cornerstone of parameter-efficient fine-tuning (PEFT). Yet, its efficacy is hampered by two fundamental limitations: semantic drift, by treating all update directions with equal importance, and structural incoherence, from adapting layers independently, resulting in suboptimal, uncoordinated updates. To remedy these, we propose StructLoRA, a framework that addresses both limitations through a principled, dual-component design: (1) an Information Bottleneck-guided filter that prunes task-irrelevant directions to mitigate semantic drift, and (2) a lightweight, training-only graph-based coordinator that enforces inter-layer consistency to resolve structural incoherence. Extensive experiments across large language model , vision language model, and vision model (including LLaMA, LLaVA, and ViT) demonstrate that StructLoRA consistently establishes a new state-of-the-art, outperforming not only vanilla LoRA but also advanced dynamic rank allocation and sparsity-based methods. Notably, the benefits are particularly pronounced in challenging low-rank and low-data regimes. Crucially, since our proposed modules operate only during training, StructLoRA enhances performance with zero additional inference cost, advancing the focus of PEFT -- from mere parameter compression to a more holistic optimization of information quality and structural integrity.
Paper Structure (68 sections, 1 theorem, 19 equations, 7 figures, 13 tables)

This paper contains 68 sections, 1 theorem, 19 equations, 7 figures, 13 tables.

Key Result

Theorem 1

Let $\mathcal{E}(\mathbf{U})$ be as in eq:s3 energy and update $\mathbf{U}^{+}$ by eq:s4 gradstep with $\eta\in(0,\,1/\lambda_{\max}(\mathbf{L}))$. Then hence $\mathcal{E}$ strictly decreases unless $(\mathbf{L} \otimes \mathbf{I})\mathbf{U}=\mathbf{0}$.

Figures (7)

  • Figure 1: Architectural comparison between LoRA and StructLoRA. The left illustrates the standard LoRA architecture with uniform low-rank updates, while the right shows our StructLoRA, which introduces an Information Bottleneck (IB) filter and a Graph-based Coordination mechanism. These modules selectively retain task-relevant update directions and align layer-wise updates through message passing. Both operate only during training and are removed at inference, preserving LoRA’s zero-latency efficiency.
  • Figure 2: Analysis of filtering strategies. We compare our IB-guided filter with two heuristics under the same keep ratio: Random Masking and Top-$k$ Norm (scored by $\|a_j\|_2\|b_j\|_2$ for each rank-one direction). Bars show mean performance across three random seeds; error bars indicate 95% confidence intervals.
  • Figure 3: Visual attention comparison between LoRA and StructLoRA. The top row shows Grad-CAM selvaraju2017grad heatmaps from the baseline LoRA model, while the bottom row corresponds to StructLoRA. StructLoRA produces more concentrated and semantically aligned activation regions.
  • Figure 4: Layer-wise cosine similarity of updates. StructLoRA induces a coherent block-diagonal structure, while LoRA exhibits noisy and fragmented activation patterns. See Table \ref{['tab:cosine-similarity']} for more results.
  • Figure 5: Accuracy vs. Sequence. StructLoRA consistently outperforms LoRA across longer input sequences, showing stronger robustness under extended context lengths.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Theorem 1: One-step decrease of drift energy
  • proof