Table of Contents
Fetching ...

Dynamic Training-Free Fusion of Subject and Style LoRAs

Qinglong Cao, Yuntian Chen, Chao Ma, Xiaokang Yang

TL;DR

This work proposes a dynamic training-free fusion framework that operates throughout the generation process, integrating two complementary mechanisms-feature-level selection and metric-guided latent adjustment-across the entire diffusion timeline, and dynamically achieves coherent subject-style synthesis without any retraining.

Abstract

Recent studies have explored the combination of multiple LoRAs to simultaneously generate user-specified subjects and styles. However, most existing approaches fuse LoRA weights using static statistical heuristics that deviate from LoRA's original purpose of learning adaptive feature adjustments and ignore the randomness of sampled inputs. To address this, we propose a dynamic training-free fusion framework that operates throughout the generation process. During the forward pass, at each LoRA-applied layer, we dynamically compute the KL divergence between the base model's original features and those produced by subject and style LoRAs, respectively, and adaptively select the most appropriate weights for fusion. In the reverse denoising stage, we further refine the generation trajectory by dynamically applying gradient-based corrections derived from objective metrics such as CLIP and DINO scores, providing continuous semantic and stylistic guidance. By integrating these two complementary mechanisms-feature-level selection and metric-guided latent adjustment-across the entire diffusion timeline, our method dynamically achieves coherent subject-style synthesis without any retraining. Extensive experiments across diverse subject-style combinations demonstrate that our approach consistently outperforms state-of-the-art LoRA fusion methods both qualitatively and quantitatively.

Dynamic Training-Free Fusion of Subject and Style LoRAs

TL;DR

This work proposes a dynamic training-free fusion framework that operates throughout the generation process, integrating two complementary mechanisms-feature-level selection and metric-guided latent adjustment-across the entire diffusion timeline, and dynamically achieves coherent subject-style synthesis without any retraining.

Abstract

Recent studies have explored the combination of multiple LoRAs to simultaneously generate user-specified subjects and styles. However, most existing approaches fuse LoRA weights using static statistical heuristics that deviate from LoRA's original purpose of learning adaptive feature adjustments and ignore the randomness of sampled inputs. To address this, we propose a dynamic training-free fusion framework that operates throughout the generation process. During the forward pass, at each LoRA-applied layer, we dynamically compute the KL divergence between the base model's original features and those produced by subject and style LoRAs, respectively, and adaptively select the most appropriate weights for fusion. In the reverse denoising stage, we further refine the generation trajectory by dynamically applying gradient-based corrections derived from objective metrics such as CLIP and DINO scores, providing continuous semantic and stylistic guidance. By integrating these two complementary mechanisms-feature-level selection and metric-guided latent adjustment-across the entire diffusion timeline, our method dynamically achieves coherent subject-style synthesis without any retraining. Extensive experiments across diverse subject-style combinations demonstrate that our approach consistently outperforms state-of-the-art LoRA fusion methods both qualitatively and quantitatively.
Paper Structure (16 sections, 18 equations, 7 figures, 6 tables)

This paper contains 16 sections, 18 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: (a) Existing methods directly rely on properties of LoRA weights to achieve fusion. (b) Our method integrates feature-level selection in the forward pass and latent-level refinement in the reverse process to enable dynamic training-free LoRAs fusion.
  • Figure 1: Additional results generated using FLUX. Each image corresponds to the object label indicated above and the style reference on the left. The results demonstrate the effects of applying different LoRA modules through our proposed method.
  • Figure 2: Overview of our method. By performing dynamic feature selection based on representation perturbation and applying metric-guided refinement throughout the denoising process, our framework enables training-free fusion of subject and style LoRAs.
  • Figure 2: Additional results generated using FLUX. Each image corresponds to the object label indicated above and the style reference on the left. The results demonstrate the effects of applying different LoRA modules through our proposed method.
  • Figure 3: Qualitative comparisons. We present images generated by our method and the compared advanced generation methods. Through nput-adaptive, representation-aware decisions throughout the generation process, our method effectively enables training-free fusion of subject and style LoRAs.
  • ...and 2 more figures