Table of Contents
Fetching ...

NoRA: Breaking the Linear Ceiling of Low-Rank Adaptation via Manifold Expansion

Hung-Hsuan Chen

TL;DR

NoRA (Non-linear Rank Adaptation), a weight-level parallel adapter that injects SiLU gating and structural dropout to induce manifold expansion and activates the dormant tail of the singular value spectrum, effectively preventing the rank collapse observed in linear methods.

Abstract

Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning (PEFT). However, it faces a critical ``linear ceiling'' in complex reasoning tasks: simply increasing the rank yields diminishing returns due to intrinsic linear constraints. We introduce NoRA (Non-linear Rank Adaptation), a weight-level parallel adapter that injects SiLU gating and structural dropout to induce manifold expansion. On the SlimOrca benchmark, NoRA breaks this linear barrier: NoRA remarkably at rank 64 (PPL 3.89) outperforms LoRA at rank 512 (PPL 3.90), demonstrating superior spectral efficiency. This advantage generalizes to mathematical reasoning, where NoRA achieves a perplexity of 1.97 on MathInstruct, significantly surpassing LoRA's saturation point of 2.07. Mechanism analysis via Singular Value Decomposition (SVD) confirms that NoRA activates the dormant tail of the singular value spectrum, effectively preventing the rank collapse observed in linear methods.

NoRA: Breaking the Linear Ceiling of Low-Rank Adaptation via Manifold Expansion

TL;DR

NoRA (Non-linear Rank Adaptation), a weight-level parallel adapter that injects SiLU gating and structural dropout to induce manifold expansion and activates the dormant tail of the singular value spectrum, effectively preventing the rank collapse observed in linear methods.

Abstract

Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning (PEFT). However, it faces a critical ``linear ceiling'' in complex reasoning tasks: simply increasing the rank yields diminishing returns due to intrinsic linear constraints. We introduce NoRA (Non-linear Rank Adaptation), a weight-level parallel adapter that injects SiLU gating and structural dropout to induce manifold expansion. On the SlimOrca benchmark, NoRA breaks this linear barrier: NoRA remarkably at rank 64 (PPL 3.89) outperforms LoRA at rank 512 (PPL 3.90), demonstrating superior spectral efficiency. This advantage generalizes to mathematical reasoning, where NoRA achieves a perplexity of 1.97 on MathInstruct, significantly surpassing LoRA's saturation point of 2.07. Mechanism analysis via Singular Value Decomposition (SVD) confirms that NoRA activates the dormant tail of the singular value spectrum, effectively preventing the rank collapse observed in linear methods.
Paper Structure (51 sections, 3 equations, 9 figures, 4 tables)

This paper contains 51 sections, 3 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Architectural evolution of adaptation methods. Top Row (Module-Level): Traditional adapters operate on the output of the entire attention block. (Top-left) Serial Adapters create latency bottlenecks. (Top-right) Parallel Adapters improve efficiency but lack fine-grained control over internal projections. Bottom Row (Weight-Level): (Bottom-left) LoRA injects linear updates ($\Delta W = BA$) directly into $W_q$ and $W_v$. (Bottom-right) NoRA (Ours) operates at the same fine-grained level but introduces non-linear modeling via SiLU gating ($\sigma$) and structural Dropout ($\mathcal{D}$). This architecture combines the efficiency of weight-level injection with the high-capacity expressivity of non-linear manifolds.
  • Figure 2: The Capacity Scaling Law of SlimOrca. Although LoRA plateaus rapidly, NoRA continues to improve with rank. In particular, NoRA at rank 64 outperforms LoRA at rank 512, breaking the linear ceiling.
  • Figure 3: Scaling laws of MathInstruct. NoRA consistently outperforms LoRA across all ranks, with the gap widening at higher ranks ($r=512$). While LoRA continues to improve, it remains strictly dominated by NoRA's non-linear adaptation curve.
  • Figure 4: Spectral Signature of the SlimOrca dataset. LoRA exhibits "rank collapse," where singular values drop precipitously, indicating under-utilization of the rank budget. NoRA maintains a "heavy tail," activating a broader subspace.
  • Figure 5: Effective Rank Analysis on SlimOrca. We compare the spectral utilization of LoRA and NoRA across increasing rank budgets ($r \in \{16, 64, 128, 512\}$). As the rank budget increases to 512, LoRA's effective rank saturates significantly, plateauing around $\sim$60. This confirms the "linear ceiling" hypothesis, where linear constraints prevent the model from utilizing the additional parameter budget. Conversely, NoRA demonstrates superior spectral efficiency. Its effective rank scales with the budget, reaching over 330 at $r=512$. This validates that our non-linear architecture successfully expands the representation manifold, avoiding the collapse observed in linear adapters.
  • ...and 4 more figures