Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models
Gihyun Kwon, Simon Jenni, Dingzeyu Li, Joon-Young Lee, Jong Chul Ye, Fabian Caba Heilbron
TL;DR
This work tackles the challenge of generating images that fuse multiple personalized concepts in text-to-image diffusion models. It introduces Concept Weaver, a tuning-free framework that first creates a semantics-aligned template image and then performs region-aware concept fusion using individualized concept models. The approach combines concept bank training, inversion-based guidance, region masks, and a novel multi-concept sampling strategy with feature injection and concept-aware conditioning to preserve structure while aligning appearances with multiple concepts. Experimental results show superior concept fidelity, scalability to more than two concepts, and applicability to real-image editing, with efficient potential extensions via LoRA fine-tuning.
Abstract
While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging. In this work, we introduce Concept Weaver, a method for composing customized text-to-image diffusion models at inference time. Specifically, the method breaks the process into two steps: creating a template image aligned with the semantics of input prompts, and then personalizing the template using a concept fusion strategy. The fusion strategy incorporates the appearance of the target concepts into the template image while retaining its structural details. The results indicate that our method can generate multiple custom concepts with higher identity fidelity compared to alternative approaches. Furthermore, the method is shown to seamlessly handle more than two concepts and closely follow the semantic meaning of the input prompt without blending appearances across different subjects.
