Table of Contents
Fetching ...

Expanding the Content-Style Frontier: a Balanced Subspace Blending Approach for Content-Style LoRA Fusion

Linhao Huang

TL;DR

The paper tackles entanglement between content and style in diffusion-based personalization and shows that increasing style strength degrades content, narrowing the frontier. It introduces Content-Style Subspace Blending with learnable cross-subspace weights, a Content-Style Balance loss, and a Non-linear Content-Style Blending strategy to provide continuous control over the content–style trade-off; this is formalized through updates like $\Delta W = A_cB_c + A_sB_s + A_sW_{21}B_c + A_cW_{12}B_s$ and a time-dependent blending during inference. Empirical results on SDXL v1.0 demonstrate that the method achieves the lowest IGD and GD among baselines and yields superior content preservation and style expression across 0–100% style intensities, validated by both quantitative metrics and qualitative images. An ablation study confirms the contributions of subspace blending, balance losses, and non-linear inference, highlighting the approach's robustness and effectiveness for flexible, scalable personalization of arbitrary content–style pairs in diffusion models.

Abstract

Recent advancements in text-to-image diffusion models have significantly improved the personalization and stylization of generated images. However, previous studies have only assessed content similarity under a single style intensity. In our experiments, we observe that increasing style intensity leads to a significant loss of content features, resulting in a suboptimal content-style frontier. To address this, we propose a novel approach to expand the content-style frontier by leveraging Content-Style Subspace Blending and a Content-Style Balance loss. Our method improves content similarity across varying style intensities, significantly broadening the content-style frontier. Extensive experiments demonstrate that our approach outperforms existing techniques in both qualitative and quantitative evaluations, achieving superior content-style trade-off with significantly lower Inverted Generational Distance (IGD) and Generational Distance (GD) scores compared to current methods.

Expanding the Content-Style Frontier: a Balanced Subspace Blending Approach for Content-Style LoRA Fusion

TL;DR

The paper tackles entanglement between content and style in diffusion-based personalization and shows that increasing style strength degrades content, narrowing the frontier. It introduces Content-Style Subspace Blending with learnable cross-subspace weights, a Content-Style Balance loss, and a Non-linear Content-Style Blending strategy to provide continuous control over the content–style trade-off; this is formalized through updates like and a time-dependent blending during inference. Empirical results on SDXL v1.0 demonstrate that the method achieves the lowest IGD and GD among baselines and yields superior content preservation and style expression across 0–100% style intensities, validated by both quantitative metrics and qualitative images. An ablation study confirms the contributions of subspace blending, balance losses, and non-linear inference, highlighting the approach's robustness and effectiveness for flexible, scalable personalization of arbitrary content–style pairs in diffusion models.

Abstract

Recent advancements in text-to-image diffusion models have significantly improved the personalization and stylization of generated images. However, previous studies have only assessed content similarity under a single style intensity. In our experiments, we observe that increasing style intensity leads to a significant loss of content features, resulting in a suboptimal content-style frontier. To address this, we propose a novel approach to expand the content-style frontier by leveraging Content-Style Subspace Blending and a Content-Style Balance loss. Our method improves content similarity across varying style intensities, significantly broadening the content-style frontier. Extensive experiments demonstrate that our approach outperforms existing techniques in both qualitative and quantitative evaluations, achieving superior content-style trade-off with significantly lower Inverted Generational Distance (IGD) and Generational Distance (GD) scores compared to current methods.

Paper Structure

This paper contains 13 sections, 24 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: In our experiments with existing models, we observed that these models either lack the ability to adjust style intensity or suffer from a significant loss of content features as the style intensity increases. Consequently, this results in a suboptimal content–style frontier. In contrast, our objective is to expand this frontier.
  • Figure 2: Overview of our method. Our approach consists of two stages: training and inference. In the train phase, we introduce Content-Style Subspace Blending, which utilizes learnable mixing matrices to fuse the content ($\mathbf{A}_\text{c}, \mathbf{B}_\text{c}$) and style ($\mathbf{A}_\text{s}, \mathbf{B}_\text{s}$) LoRA subspaces. This process is guided by our proposed Content-Style Balance loss. In the inference phase, we apply a Non-linear Content-Style Blending strategy, which uses dynamic, time-dependent weights during the denoising process. This approach leverages the property of diffusion models to generate content structure early and style details later, thereby achieving a superior content-style trade-off.
  • Figure 3: The intermediate states produced by the Direct Merge method often lose both content and style characteristics. In contrast, our method ensures that the intermediate states retain at least one of either the content or the style features. To ensure a fair comparison, we do not employ Non-linear Content-Style Blending, but instead utilize a weighted summation method akin to Direct Merge.
  • Figure 4: Quantitative comparisons. Comparison of alignment results across different methods.
  • Figure 5: Qualitative comparisons. We present images generated by our method and the compared methods.
  • ...and 1 more figures