Not All Directions Matter: Toward Structured and Task-Aware Low-Rank Adaptation

Xi Xiao; Chenrui Ma; Yunbei Zhang; Chen Liu; Zhuxuanzi Wang; Yanshu Li; Lin Zhao; Guosheng Hu; Tianyang Wang; Hao Xu

Not All Directions Matter: Toward Structured and Task-Aware Low-Rank Adaptation

Xi Xiao, Chenrui Ma, Yunbei Zhang, Chen Liu, Zhuxuanzi Wang, Yanshu Li, Lin Zhao, Guosheng Hu, Tianyang Wang, Hao Xu

Abstract

Low-Rank Adaptation (LoRA) has become a cornerstone of parameter-efficient fine-tuning (PEFT). Yet, its efficacy is hampered by two fundamental limitations: semantic drift, by treating all update directions with equal importance, and structural incoherence, from adapting layers independently, resulting in suboptimal, uncoordinated updates. To remedy these, we propose StructLoRA, a framework that addresses both limitations through a principled, dual-component design: (1) an Information Bottleneck-guided filter that prunes task-irrelevant directions to mitigate semantic drift, and (2) a lightweight, training-only graph-based coordinator that enforces inter-layer consistency to resolve structural incoherence. Extensive experiments across large language model , vision language model, and vision model (including LLaMA, LLaVA, and ViT) demonstrate that StructLoRA consistently establishes a new state-of-the-art, outperforming not only vanilla LoRA but also advanced dynamic rank allocation and sparsity-based methods. Notably, the benefits are particularly pronounced in challenging low-rank and low-data regimes. Crucially, since our proposed modules operate only during training, StructLoRA enhances performance with zero additional inference cost, advancing the focus of PEFT -- from mere parameter compression to a more holistic optimization of information quality and structural integrity.

Not All Directions Matter: Toward Structured and Task-Aware Low-Rank Adaptation

Abstract

Paper Structure (68 sections, 1 theorem, 19 equations, 7 figures, 13 tables)

This paper contains 68 sections, 1 theorem, 19 equations, 7 figures, 13 tables.

Introduction
Related Work
The Landscape of PEFT
The Evolution of LoRA
Methodology
Preliminaries: Low-Rank Adaptation
Stage 1: Information Bottleneck-Guided Directional Filtering
Stage 2: Graph-Based Layer Coordination
Graph construction
Message passing and reconstruction
A Minimal Theoretical View: Coordination as Laplacian Smoothing
Objective, Training, and Inference
Experiments
Experimental Setup
Models and Architectures
...and 53 more sections

Key Result

Theorem 1

Let $\mathcal{E}(\mathbf{U})$ be as in eq:s3 energy and update $\mathbf{U}^{+}$ by eq:s4 gradstep with $\eta\in(0,\,1/\lambda_{\max}(\mathbf{L}))$. Then hence $\mathcal{E}$ strictly decreases unless $(\mathbf{L} \otimes \mathbf{I})\mathbf{U}=\mathbf{0}$.

Figures (7)

Figure 1: Architectural comparison between LoRA and StructLoRA. The left illustrates the standard LoRA architecture with uniform low-rank updates, while the right shows our StructLoRA, which introduces an Information Bottleneck (IB) filter and a Graph-based Coordination mechanism. These modules selectively retain task-relevant update directions and align layer-wise updates through message passing. Both operate only during training and are removed at inference, preserving LoRA’s zero-latency efficiency.
Figure 2: Analysis of filtering strategies. We compare our IB-guided filter with two heuristics under the same keep ratio: Random Masking and Top-$k$ Norm (scored by $\|a_j\|_2\|b_j\|_2$ for each rank-one direction). Bars show mean performance across three random seeds; error bars indicate 95% confidence intervals.
Figure 3: Visual attention comparison between LoRA and StructLoRA. The top row shows Grad-CAM selvaraju2017grad heatmaps from the baseline LoRA model, while the bottom row corresponds to StructLoRA. StructLoRA produces more concentrated and semantically aligned activation regions.
Figure 4: Layer-wise cosine similarity of updates. StructLoRA induces a coherent block-diagonal structure, while LoRA exhibits noisy and fragmented activation patterns. See Table \ref{['tab:cosine-similarity']} for more results.
Figure 5: Accuracy vs. Sequence. StructLoRA consistently outperforms LoRA across longer input sequences, showing stronger robustness under extended context lengths.
...and 2 more figures

Theorems & Definitions (2)

Theorem 1: One-step decrease of drift energy
proof

Not All Directions Matter: Toward Structured and Task-Aware Low-Rank Adaptation

Abstract

Not All Directions Matter: Toward Structured and Task-Aware Low-Rank Adaptation

Authors

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (2)