Table of Contents
Fetching ...

LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers

Yusuf Dalva, Hidir Yesiltepe, Pinar Yanardag

TL;DR

The paper tackles multi-concept personalization editing in diffusion models by addressing LoRA cross-talk without retraining. It introduces LoRAShop, a training-free pipeline that extracts region-specific subject priors from Flux rectified-flow transformers and performs per-token residual blending to insert multiple concepts into an image. Subject priors $ hat{M}_{c'}$ are derived from the last double-stream block via $M_{c'} = \operatorname{softmax}(Q_i K_{c'}^{\mathsf T}/\sqrt{d})$, smoothed and binarized to obtain non-overlapping $ hat{M}_u$, which guide blending with per-token weights $\alpha_{c'}(p) = \nfrac{\hat{M}_{c'}(p)}{\sum_u \hat{M}_u(p) + \varepsilon}$. Experiments show improved identity preservation and natural composition for single and multiple concepts, supporting real and generated image editing in a practical, training-free framework that enables rapid creative iteration.

Abstract

We introduce LoRAShop, the first framework for multi-concept image editing with LoRA models. LoRAShop builds on a key observation about the feature interaction patterns inside Flux-style diffusion transformers: concept-specific transformer features activate spatially coherent regions early in the denoising process. We harness this observation to derive a disentangled latent mask for each concept in a prior forward pass and blend the corresponding LoRA weights only within regions bounding the concepts to be personalized. The resulting edits seamlessly integrate multiple subjects or styles into the original scene while preserving global context, lighting, and fine details. Our experiments demonstrate that LoRAShop delivers better identity preservation compared to baselines. By eliminating retraining and external constraints, LoRAShop turns personalized diffusion models into a practical `photoshop-with-LoRAs' tool and opens new avenues for compositional visual storytelling and rapid creative iteration.

LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers

TL;DR

The paper tackles multi-concept personalization editing in diffusion models by addressing LoRA cross-talk without retraining. It introduces LoRAShop, a training-free pipeline that extracts region-specific subject priors from Flux rectified-flow transformers and performs per-token residual blending to insert multiple concepts into an image. Subject priors are derived from the last double-stream block via , smoothed and binarized to obtain non-overlapping , which guide blending with per-token weights . Experiments show improved identity preservation and natural composition for single and multiple concepts, supporting real and generated image editing in a practical, training-free framework that enables rapid creative iteration.

Abstract

We introduce LoRAShop, the first framework for multi-concept image editing with LoRA models. LoRAShop builds on a key observation about the feature interaction patterns inside Flux-style diffusion transformers: concept-specific transformer features activate spatially coherent regions early in the denoising process. We harness this observation to derive a disentangled latent mask for each concept in a prior forward pass and blend the corresponding LoRA weights only within regions bounding the concepts to be personalized. The resulting edits seamlessly integrate multiple subjects or styles into the original scene while preserving global context, lighting, and fine details. Our experiments demonstrate that LoRAShop delivers better identity preservation compared to baselines. By eliminating retraining and external constraints, LoRAShop turns personalized diffusion models into a practical `photoshop-with-LoRAs' tool and opens new avenues for compositional visual storytelling and rapid creative iteration.

Paper Structure

This paper contains 20 sections, 5 equations, 15 figures, 9 tables.

Figures (15)

  • Figure 1: LoRAShop. We present LoRAShop, a training-free framework enabling the simultaneous use of multiple LoRA adapters for generation and editing. By identifying the coarse boundaries of personalized concepts as subject priors, we allow the use of multiple LoRA adapters by eliminating the "cross-talk" between different adapters.
  • Figure 2: LoRAShop Framework. LoRAShop enables multi-subject generation and editing over a two-stage training-free pipeline. First, we extract the subject prior $\hat{M}_{c'}$, which gives a coarse-level prior on where the concept of interest, $c'$, is located. Following, we introduce a blending mechanism over the transformer block residuals, which both enables seamless blending of customized features and bounds the region-of-interest for the LoRA adapter utilized.
  • Figure 3: Editing Generated & Real Images with LoRAShop. We provide qualitative editing results with different human concepts. LoRAShop can achieve both edits on real and generated images. Due to non-intersecting subject prior extraction scheme of our framework, LoRAShop can perform edits with multiple concepts in one denoising pass.
  • Figure 4: Ablation Study. Ablation on transformer blocks, where Block 19 shows superior ability for separation between subjects.
  • Figure 5: Qualitative Comparisons. We provide qualitative comparisons on three mainstream tasks: single-subject generation, multi-subject generation and face swapping. Over all of the benchmarked tasks, LoRAShop provides superior performance against competing approaches.
  • ...and 10 more figures