Table of Contents
Fetching ...

Preference-Aligned LoRA Merging: Preserving Subspace Coverage and Addressing Directional Anisotropy

Wooseong Jeong, Wonyoung Lee, Kuk-Jin Yoon

Abstract

Merging multiple Low-Rank Adaptation (LoRA) modules is promising for constructing general-purpose systems, yet challenging because LoRA update directions span different subspaces and contribute unevenly. When merged naively, such mismatches can weaken the directions most critical to certain task losses while overemphasizing relatively less important ones, ultimately reducing the model's ability to represent all tasks faithfully. We revisit this problem through two perspectives: subspace coverage, which captures how broadly LoRA directions cover diverse representational directions, and anisotropy, which reflects the imbalance of influence across those directions. We propose TARA-Merging (Task-Rank Anisotropy Alignment), which aligns merging weights using a preference-weighted cross-entropy pseudo-loss while preserving task-relevant LoRA subspaces. This ensures broad subspace coverage and mitigates anisotropy via direction-wise reweighting. Across eight vision and six NLI benchmarks, TARA-Merging consistently outperforms vanilla and LoRA-aware baselines, demonstrating strong robustness and generalization, and highlighting the importance of addressing both subspace coverage and anisotropy in LoRA merging.

Preference-Aligned LoRA Merging: Preserving Subspace Coverage and Addressing Directional Anisotropy

Abstract

Merging multiple Low-Rank Adaptation (LoRA) modules is promising for constructing general-purpose systems, yet challenging because LoRA update directions span different subspaces and contribute unevenly. When merged naively, such mismatches can weaken the directions most critical to certain task losses while overemphasizing relatively less important ones, ultimately reducing the model's ability to represent all tasks faithfully. We revisit this problem through two perspectives: subspace coverage, which captures how broadly LoRA directions cover diverse representational directions, and anisotropy, which reflects the imbalance of influence across those directions. We propose TARA-Merging (Task-Rank Anisotropy Alignment), which aligns merging weights using a preference-weighted cross-entropy pseudo-loss while preserving task-relevant LoRA subspaces. This ensures broad subspace coverage and mitigates anisotropy via direction-wise reweighting. Across eight vision and six NLI benchmarks, TARA-Merging consistently outperforms vanilla and LoRA-aware baselines, demonstrating strong robustness and generalization, and highlighting the importance of addressing both subspace coverage and anisotropy in LoRA merging.

Paper Structure

This paper contains 42 sections, 2 theorems, 20 equations, 11 figures, 15 tables.

Key Result

Proposition 1

Let $\sigma_{\max}(\bm J)$ and $\sigma_{\min}(\bm J)$ denote the largest and smallest singular values of $\bm J$. Then, for any coefficient vector $\bm\phi$, When $\kappa(\bm J)=\sigma_{\max}(\bm J)/\sigma_{\min}(\bm J)$ is large, the map $\bm\phi\!\mapsto\!\Delta \bm f$ is anisotropic, and equal-norm LoRA-direction updates need not yield proportional task-loss changes.

Figures (11)

  • Figure 1: Effective rank across layers for an attention projection layer (e.g., query). The gap between $\Delta \bm X$ stack and Rank-1 stack reflects merge-induced collapse.
  • Figure 2: Directional-sensitivity misalignment $\boldsymbol{\xi(\rho_1,\rho_2)}$. Larger $\xi$ indicates stronger change in loss-sensitive directions when switching preferences.
  • Figure 3: Two-Task Merging Results.
  • Figure 5: Effective Rank across layers for each module. Higher curves indicate broader subspace coverage. The gap between $\Delta \bm W$ stack and Rank-1 stack reflects merge-induced collapse.
  • Figure 6: Condition Number Anisotropy $\boldsymbol{\kappa}$ (RAW Basis). Layer-wise condition number $\kappa(\bm\rho)$ per module under the non-orthogonal LoRA basis (RAW). Larger $\kappa$ indicates stronger within-preference directional concentration of loss sensitivity. (Here, $\bm\rho_1$ uniformly weights all tasks, whereas $\bm\rho_2$ assigns all weight to a single task (one-hot).)
  • ...and 6 more figures

Theorems & Definitions (3)

  • Proposition 1: Anisotropy Bounds
  • Proposition 1: Anisotropy Bounds
  • proof