Table of Contents
Fetching ...

Diffusion Cocktail: Mixing Domain-Specific Diffusion Models for Diversified Image Generations

Haoming Liu, Yuanhe Guo, Shengjie Wang, Hongyi Wen

TL;DR

This work proposes Diffusion Cocktail (Ditail), a training-free method that transfers style and content information between multiple diffusion models, resulting in novel images unobtainable by a single model.

Abstract

Diffusion models, capable of high-quality image generation, receive unparalleled popularity for their ease of extension. Active users have created a massive collection of domain-specific diffusion models by fine-tuning base models on self-collected datasets. Recent work has focused on improving a single diffusion model by uncovering semantic and visual information encoded in various architecture components. However, those methods overlook the vastly available set of fine-tuned diffusion models and, therefore, miss the opportunity to utilize their combined capacity for novel generation. In this work, we propose Diffusion Cocktail (Ditail), a training-free method that transfers style and content information between multiple diffusion models. This allows us to perform diversified generations using a set of diffusion models, resulting in novel images unobtainable by a single model. Ditail also offers fine-grained control of the generation process, which enables flexible manipulations of styles and contents. With these properties, Ditail excels in numerous applications, including style transfer guided by diffusion models, novel-style image generation, and image manipulation via prompts or collage inputs.

Diffusion Cocktail: Mixing Domain-Specific Diffusion Models for Diversified Image Generations

TL;DR

This work proposes Diffusion Cocktail (Ditail), a training-free method that transfers style and content information between multiple diffusion models, resulting in novel images unobtainable by a single model.

Abstract

Diffusion models, capable of high-quality image generation, receive unparalleled popularity for their ease of extension. Active users have created a massive collection of domain-specific diffusion models by fine-tuning base models on self-collected datasets. Recent work has focused on improving a single diffusion model by uncovering semantic and visual information encoded in various architecture components. However, those methods overlook the vastly available set of fine-tuned diffusion models and, therefore, miss the opportunity to utilize their combined capacity for novel generation. In this work, we propose Diffusion Cocktail (Ditail), a training-free method that transfers style and content information between multiple diffusion models. This allows us to perform diversified generations using a set of diffusion models, resulting in novel images unobtainable by a single model. Ditail also offers fine-grained control of the generation process, which enables flexible manipulations of styles and contents. With these properties, Ditail excels in numerous applications, including style transfer guided by diffusion models, novel-style image generation, and image manipulation via prompts or collage inputs.
Paper Structure (21 sections, 2 equations, 16 figures, 2 tables, 2 algorithms)

This paper contains 21 sections, 2 equations, 16 figures, 2 tables, 2 algorithms.

Figures (16)

  • Figure 1: Diffusion Cocktail demo use cases. Ditail primarily focuses on transferring content information between diffusion models. We control the generation process of multiple diffusion models by injecting source image structure, resulting in novel images of various contents and styles. This core idea can be extended to various use cases. Better view with color. Zoom in for best view.
  • Figure 2: (a) Ditail for model-centric style transfer and content manipulation. The content image first goes through diffusion inversion, ideally using a DM matching the content image's domain. The resulting latents are then fed into the style DM for denoising and content injection. Given various-style DMs, we generate images with their styles while preserving the content image's structures; (b) Ditail can be naturally extended to transform collages to the same target style/domain; (c) Ditail allows tailoring novel target styles by merging existing style DMs via an interpolation hyper-parameter $\gamma$. Better view with color. Zoom in for best view.
  • Figure 3: (a) Visualization of noisy latent after DM inversion on real and generated images with different scaling factor $\alpha$. A larger $\alpha$ leads to a stronger structure prior embedded in the noisy latent, resulting in a more structure-preserved starting point in the reverse sampling process; (b) Demonstration of style transfer across different DMs. The diagonal images are the non-modified images generated using each diffusion model with the same textual prompt. Every column corresponds to the images generated by one diffusion model, using contents from other models. Better view with color. Zoom in for best view.
  • Figure 4: Qualitative comparison of style transfer results on generated images drawn from GEMRec-18K guo2023gemrec. The results of ControlNet zhang2023adding and PnP tumanyan2023plug are obtained with the target DM checkpoint. Better view with color. Zoom in for best view.
  • Figure 5: Qualitative results and ablation studies on real images from COCO Captions chen2015microsoft. Better view with color. Zoom in for best view.
  • ...and 11 more figures