Table of Contents
Fetching ...

Multi-Subject Personalization

Arushi Jain, Shubham Paliwal, Monika Sharma, Vikram Jamwal, Lovekesh Vig

TL;DR

This work implements MSP using Stable Diffusion and assesses the approach against other text-to-image models, showcasing its consistent generation of good-quality images representing intended subjects and interactions.

Abstract

Creative story illustration requires a consistent interplay of multiple characters or objects. However, conventional text-to-image models face significant challenges while producing images featuring multiple personalized subjects. For example, they distort the subject rendering, or the text descriptions fail to render coherent subject interactions. We present Multi-Subject Personalization (MSP) to alleviate some of these challenges. We implement MSP using Stable Diffusion and assess our approach against other text-to-image models, showcasing its consistent generation of good-quality images representing intended subjects and interactions.

Multi-Subject Personalization

TL;DR

This work implements MSP using Stable Diffusion and assesses the approach against other text-to-image models, showcasing its consistent generation of good-quality images representing intended subjects and interactions.

Abstract

Creative story illustration requires a consistent interplay of multiple characters or objects. However, conventional text-to-image models face significant challenges while producing images featuring multiple personalized subjects. For example, they distort the subject rendering, or the text descriptions fail to render coherent subject interactions. We present Multi-Subject Personalization (MSP) to alleviate some of these challenges. We implement MSP using Stable Diffusion and assess our approach against other text-to-image models, showcasing its consistent generation of good-quality images representing intended subjects and interactions.
Paper Structure (10 sections, 6 equations, 7 figures, 1 table)

This paper contains 10 sections, 6 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: (A) Overview of the MSP-Diffusion architecture, (B) Sample generations from the method
  • Figure 2: Figure displaying sample training images, each linked to a distinct subject, alongside the unique identifier and class name used in the fine-tuning process of the Dreambooth model.
  • Figure 3: Qualitative comparison of MSP-Diffusion against Dreambooth dreambooth, Textual Inversion textual-inversion and Custom Diffusion custom-diffusion. Columns A, B, C and D refer to the images having number of personalized subjects as 1, 2, 3 and more than 3, respectively. Here, it is clearly visible that models like Dreambooth, Textual-Inversion and Custom Diffusion struggle to generate images with multiple personalized subjects and produce poor quality images (missing subjects, hybrid subject exhibiting the characteristics of multiple subjects and wrong subject's appearance).
  • Figure 4: Figure illustrates a sample input to the MSP-Diffusion method during inference. (A) presents the input format for scenarios both with and without a scaffolding image, while (B) provides additional details regarding an Openpose image in the context of ControlNet conditioning with MSP-Diffusion. It's worth noting that conceptI can be substituted with the specific subject, such as "sks male cartoon," for instance.
  • Figure 5: Figure showing the qualitative results of proposed MSP-Diffusion model in settings such as (A) with scaffolding image, (B) without scaffolding image and (C) with ControlNet conditioning. Here, $n$ refers to the number of personalized subjects in the image-generation.
  • ...and 2 more figures