Table of Contents
Fetching ...

How to Blend Concepts in Diffusion Models

Lorenzo Olearo, Giorgio Longari, Simone Melzi, Alessandro Raganato, Rafael Peñaloza

TL;DR

The task of concept blending through diffusion models is explored, finding that concept blending through space manipulation is possible, although the best strategy depends on the context of the blend.

Abstract

For the last decade, there has been a push to use multi-dimensional (latent) spaces to represent concepts; and yet how to manipulate these concepts or reason with them remains largely unclear. Some recent methods exploit multiple latent representations and their connection, making this research question even more entangled. Our goal is to understand how operations in the latent space affect the underlying concepts. To that end, we explore the task of concept blending through diffusion models. Diffusion models are based on a connection between a latent representation of textual prompts and a latent space that enables image reconstruction and generation. This task allows us to try different text-based combination strategies, and evaluate easily through a visual analysis. Our conclusion is that concept blending through space manipulation is possible, although the best strategy depends on the context of the blend.

How to Blend Concepts in Diffusion Models

TL;DR

The task of concept blending through diffusion models is explored, finding that concept blending through space manipulation is possible, although the best strategy depends on the context of the blend.

Abstract

For the last decade, there has been a push to use multi-dimensional (latent) spaces to represent concepts; and yet how to manipulate these concepts or reason with them remains largely unclear. Some recent methods exploit multiple latent representations and their connection, making this research question even more entangled. Our goal is to understand how operations in the latent space affect the underlying concepts. To that end, we explore the task of concept blending through diffusion models. Diffusion models are based on a connection between a latent representation of textual prompts and a latent space that enables image reconstruction and generation. This task allows us to try different text-based combination strategies, and evaluate easily through a visual analysis. Our conclusion is that concept blending through space manipulation is possible, although the best strategy depends on the context of the blend.
Paper Structure (13 sections, 8 figures, 1 table)

This paper contains 13 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: A visualization of the proposed analysis. From left to right: (a) given two input textual concepts ("dog", "rabbit"), (b) four different techniques are applied to explore multiple ways to blend them together through stable diffusion and (c) the obtained outputs are compared with qualitative analysis and a user study.
  • Figure 2: Samples of two blend per category.
  • Figure 3: Comparison of the blending methods. On the left, the individual prompts, and on the right, the results of the blending methods. All the images are generated starting from the same identical initial noise.
  • Figure :
  • Figure :
  • ...and 3 more figures