D2-LoRA: A Synergistic Approach to Differential and Directional Low-Rank Adaptation

Nozomu Fujisawa; Masaaki Kondo

D2-LoRA: A Synergistic Approach to Differential and Directional Low-Rank Adaptation

Nozomu Fujisawa, Masaaki Kondo

TL;DR

D2-LoRA achieves 76.4 percent average accuracy across eight question answering and reading comprehension benchmarks using only 5k training samples per task and two epochs, while preserving algebraic mergeability at inference with near-exact numerical equivalence.

Abstract

We systematically investigate the parameter-efficient fine-tuning design space under practical data and compute constraints, and propose D2-LoRA. D2-LoRA achieves 76.4 percent average accuracy across eight question answering and reading comprehension benchmarks using only 5k training samples per task and two epochs, while preserving algebraic mergeability at inference with near-exact numerical equivalence. The method combines signed low-rank residual updates with additive and subtractive components, together with a train-time column-wise projection that keeps each column close to its original norm. After training, the adapter is merged into a single weight matrix, adding zero inference latency. Compared with LoRA, D2-LoRA improves average accuracy by 2.2 percentage points; at matched parameter counts (LoRA rank 2r versus D2-LoRA rank r), the improvement is 1.6 points, indicating gains from architectural design rather than increased parameterization. Compared with DoRA, it matches or exceeds performance on most tasks. Beyond QA and reading comprehension, D2-LoRA improves generative tasks (plus 1.2 ROUGE-L and plus 1.1 percent win rate) and shows 36 percent lower training volatility. The merge preserves numerical fidelity (mean gap about 0.03 percentage points) and recovers about 1.91x evaluation throughput. Training overhead is 19 percent, comparable to DoRA, and decreases with longer input sequences. We provide a geometric analysis explaining how the projection stabilizes training, together with ablation studies isolating the contribution of each design component.

D2-LoRA: A Synergistic Approach to Differential and Directional Low-Rank Adaptation

TL;DR

Abstract

Paper Structure (81 sections, 5 theorems, 15 equations, 3 figures, 23 tables, 1 algorithm)

This paper contains 81 sections, 5 theorems, 15 equations, 3 figures, 23 tables, 1 algorithm.

Introduction
Contributions.
Related Work
D2-LoRA model architecture
Signed low-rank residual.
Directional projection (train time only).
Why add $\Delta W$ twice?
Configuration Knobs.
Rationale for small-minus initialization.
Theory
D2-LoRA Stability
Geometric role of $\tau$.
Benefits of directional constraints in low-data regimes.
Merge-time equivalence.
On choosing the scale τ
...and 66 more sections

Key Result

Proposition 1

For rank $r$, LoRA admits updates $\Delta W$ with $\mathop{\mathrm{rank}}\nolimits(\Delta W)\le r$. By contrast, D2-LoRA produces $\Delta W^\top=\frac{\alpha}{r}(A_+B_+-\tau A_-B_-)$ with $\mathop{\mathrm{rank}}\nolimits(\Delta W)\le 2r$. Moreover, for any rank-$r$ matrix $M$, there exist $(A_\pm,B_

Figures (3)

Figure 1: Overview of D2-LoRA. Left: columnwise magnitude $\,\mathbf m=\|W_0\|_{2,\mathrm{col}}\,$ from the pretrained weight $W_0$. Middle: differential signed low-rank residual with scale $\tau$ and a train-time directional projection that preserves the column norms of $W_0$. Right: inference uses the merged weight $\widehat{W}=W^\star+\Delta W$, so latency equals a single linear layer.
Figure 2: Training dynamics on BoolQ (Llama-3.2-3B-Instruct, $r{=}32$). (a) Loss curves: D2-LoRA converges to a lower final loss (1.894) than LoRA (1.931) or DoRA (1.907). (b) Rolling standard deviation indicates 36% lower volatility for D2-LoRA, reflecting more stable optimization. (c) Loss distribution during the final phase (steps 200--312) shows D2-LoRA exhibits a tighter concentration around convergence.
Figure 3: Fixed $\tau$ stability. Macro averages for varying $\tau$ (from Table \ref{['tab:tau-combined']}).

Theorems & Definitions (13)

Proposition 1: Expressivity gap of signed low-rank updates
proof : Proof sketch
Proposition 2: Norm preservation
proof : Proof sketch
Lemma 1: Smoothness and Lipschitz control
proof : Proof sketch
proof : Proof of Proposition \ref{['prop:expressivity']}
proof : Proof of Proposition \ref{['prop:norm']}
proof : Proof of Lemma \ref{['lem:lips']}
Theorem 1: Projected SGD on the product of spheres
...and 3 more

D2-LoRA: A Synergistic Approach to Differential and Directional Low-Rank Adaptation

TL;DR

Abstract

D2-LoRA: A Synergistic Approach to Differential and Directional Low-Rank Adaptation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (13)