Table of Contents
Fetching ...

NP-LoRA: Null Space Projection Unifies Subject and Style in LoRA Fusion

Chuheng Chen, Xiaofei Zhou, Geyuan Zhang, Yong Huang

TL;DR

NP-LoRA presents a training-free, projection-based fusion for combining independently learned content and style LoRAs by enforcing subspace separation in the style subspace. It identifies style principal directions via SVD and projects the content LoRA into the orthogonal null space, with a soft projection controlled by mu to balance fidelity and style. Across SDXL and FLUX backbones and 32 content-style pairs, NP-LoRA achieves superior image fidelity and style coherence compared to strong baselines, while maintaining comparable efficiency and generalizing to unseen backbones. The approach provides a clear geometric interpretation of LoRA fusion, offering robust, plug-in improvements for diffusion model personalization without retraining.

Abstract

Low-Rank Adaptation (LoRA) fusion has emerged as a key technique for reusing and composing learned subject and style representations for controllable generation without costly retraining. However, existing methods rely on weight-based merging, where one LoRA often dominates the other, leading to interference and degraded fidelity. This interference is structural: separately trained LoRAs occupy low-rank high-dimensional subspaces, leading to non-orthogonal and overlapping representations. In this work, we analyze the internal structure of LoRAs and find their generative behavior is dominated by a few principal directions in the low-rank subspace, which should remain free from interference during fusion. To achieve this, we propose Null Space Projection LoRA (NP-LoRA), a projection-based framework for LoRA fusion that enforces subspace separation to prevent structural interference among principal directions. Specifically, we first extract principal style directions via singular value decomposition (SVD) and then project the subject LoRA into its orthogonal null space. Furthermore, we introduce a soft projection mechanism that enables smooth control over the trade-off between subject fidelity and style consistency. Experiments show NP-LoRA consistently improves fusion quality over strong baselines (e.g., DINO and CLIP-based metrics, with human and LLM preference scores), and applies broadly across backbones and LoRA pairs without retraining.

NP-LoRA: Null Space Projection Unifies Subject and Style in LoRA Fusion

TL;DR

NP-LoRA presents a training-free, projection-based fusion for combining independently learned content and style LoRAs by enforcing subspace separation in the style subspace. It identifies style principal directions via SVD and projects the content LoRA into the orthogonal null space, with a soft projection controlled by mu to balance fidelity and style. Across SDXL and FLUX backbones and 32 content-style pairs, NP-LoRA achieves superior image fidelity and style coherence compared to strong baselines, while maintaining comparable efficiency and generalizing to unseen backbones. The approach provides a clear geometric interpretation of LoRA fusion, offering robust, plug-in improvements for diffusion model personalization without retraining.

Abstract

Low-Rank Adaptation (LoRA) fusion has emerged as a key technique for reusing and composing learned subject and style representations for controllable generation without costly retraining. However, existing methods rely on weight-based merging, where one LoRA often dominates the other, leading to interference and degraded fidelity. This interference is structural: separately trained LoRAs occupy low-rank high-dimensional subspaces, leading to non-orthogonal and overlapping representations. In this work, we analyze the internal structure of LoRAs and find their generative behavior is dominated by a few principal directions in the low-rank subspace, which should remain free from interference during fusion. To achieve this, we propose Null Space Projection LoRA (NP-LoRA), a projection-based framework for LoRA fusion that enforces subspace separation to prevent structural interference among principal directions. Specifically, we first extract principal style directions via singular value decomposition (SVD) and then project the subject LoRA into its orthogonal null space. Furthermore, we introduce a soft projection mechanism that enables smooth control over the trade-off between subject fidelity and style consistency. Experiments show NP-LoRA consistently improves fusion quality over strong baselines (e.g., DINO and CLIP-based metrics, with human and LLM preference scores), and applies broadly across backbones and LoRA pairs without retraining.

Paper Structure

This paper contains 31 sections, 2 theorems, 35 equations, 14 figures, 4 tables.

Key Result

Proposition 1

Given content LoRA $\Delta W_c$ and style LoRA $\Delta W_s$, their weighted combination cannot, in general, isolate the content feature from the style-critical subspace. As a result, content-induced interference is unavoidable, and stylistic consistency cannot be guaranteed.

Figures (14)

  • Figure 1: Illustration of our motivation. (a) This task is to combine a content LoRA capturing subject identity with a style LoRA encoding artistic appearance, enabling the reuse and composition of learned generative knowledge. (b) Weighted addition of two LoRAs destroys style features in the merged results. Independently trained LoRAs, sharing the pretrained diffusion feature space, tend to occupy correlated low-rank high-dimensional subspaces (e.g., texture or color), which causes representational interference that degrades the fidelity of generated content. (c) The content LoRA is projected onto the null space of style lora, effectively avoiding overlap and preserving stylistic features.
  • Figure 2: Overview of the proposed method. NP-LoRA takes pretrained content and style LoRAs as inputs. The style LoRA is decomposed via singular value decomposition (SVD) to construct a null space, onto which the content LoRA is projected. This design enables effective fusion without extra training or hyperparameter tuning.
  • Figure 3: Singular value spectrum of a LoRA and perturbation effects. We first visualize the singular value spectrum, then perturb the principal and minor directions respectively. Perturbations on principal directions destroy style consistency, whereas those on minor directions have little impact.
  • Figure 4: Visualization of our method. (a) Content image. (b) Style image. (c) Results of direct weight-based merging, which causes style distortion and interference. (d) Results of hard projection (the base formulation of our method), which removes interference but suppresses content details. (e) Results of soft projection (ours), achieving a balanced fusion of subject fidelity and style consistency.
  • Figure 5: (a) and (e) are the content and style references, respectively. (c) shows the result of direct merging, which corresponds to $\mu = 0$ (i.e., the degenerate case of NP-LoRA). (d)-(g) show results for $\mu = 0.1, 0.5, 1, 10,$ and $\infty$, where $\mu = \infty$ represents the hard projection case of our method. Small $\mu$ values cause content-style interference, whereas large $\mu$ values preserve style at the expense of content fidelity.
  • ...and 9 more figures

Theorems & Definitions (4)

  • Proposition 1
  • proof
  • Proposition 2
  • proof