Table of Contents
Fetching ...

OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

Zhe Kong, Yong Zhang, Tianyu Yang, Tao Wang, Kaihao Zhang, Bizhu Wu, Guanying Chen, Wei Liu, Wenhan Luo

TL;DR

OMG tackles occlusion and identity degradation in multi-concept diffusion-based image generation with a two-stage sampling framework. Stage 1 builds layout and visual comprehension information, then Stage 2 injects multiple concepts via Concept Noise Blending, guided by stored cross-attention maps to preserve occlusion layouts. The method is plug-and-play with existing single-concept personalization approaches like LoRA and InstantID, requiring no additional fine-tuning. Comprehensive quantitative and qualitative experiments demonstrate state-of-the-art performance in single-concept personalization and strong identity preservation and visual harmony in multi-concept scenarios, with ablations validating the contribution of layout preservation and noise blending.

Abstract

Personalization is an important topic in text-to-image generation, especially the challenging multi-concept personalization. Current multi-concept methods are struggling with identity preservation, occlusion, and the harmony between foreground and background. In this work, we propose OMG, an occlusion-friendly personalized generation framework designed to seamlessly integrate multiple concepts within a single image. We propose a novel two-stage sampling solution. The first stage takes charge of layout generation and visual comprehension information collection for handling occlusions. The second one utilizes the acquired visual comprehension information and the designed noise blending to integrate multiple concepts while considering occlusions. We also observe that the initiation denoising timestep for noise blending is the key to identity preservation and layout. Moreover, our method can be combined with various single-concept models, such as LoRA and InstantID without additional tuning. Especially, LoRA models on civitai.com can be exploited directly. Extensive experiments demonstrate that OMG exhibits superior performance in multi-concept personalization.

OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

TL;DR

OMG tackles occlusion and identity degradation in multi-concept diffusion-based image generation with a two-stage sampling framework. Stage 1 builds layout and visual comprehension information, then Stage 2 injects multiple concepts via Concept Noise Blending, guided by stored cross-attention maps to preserve occlusion layouts. The method is plug-and-play with existing single-concept personalization approaches like LoRA and InstantID, requiring no additional fine-tuning. Comprehensive quantitative and qualitative experiments demonstrate state-of-the-art performance in single-concept personalization and strong identity preservation and visual harmony in multi-concept scenarios, with ablations validating the contribution of layout preservation and noise blending.

Abstract

Personalization is an important topic in text-to-image generation, especially the challenging multi-concept personalization. Current multi-concept methods are struggling with identity preservation, occlusion, and the harmony between foreground and background. In this work, we propose OMG, an occlusion-friendly personalized generation framework designed to seamlessly integrate multiple concepts within a single image. We propose a novel two-stage sampling solution. The first stage takes charge of layout generation and visual comprehension information collection for handling occlusions. The second one utilizes the acquired visual comprehension information and the designed noise blending to integrate multiple concepts while considering occlusions. We also observe that the initiation denoising timestep for noise blending is the key to identity preservation and layout. Moreover, our method can be combined with various single-concept models, such as LoRA and InstantID without additional tuning. Especially, LoRA models on civitai.com can be exploited directly. Extensive experiments demonstrate that OMG exhibits superior performance in multi-concept personalization.
Paper Structure (14 sections, 7 equations, 10 figures, 2 tables)

This paper contains 14 sections, 7 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: We present OMG, an occlusion-friendly method for multi-concept personalization with strong identity preservation and harmonious illumination. The visual examples are generated by using LoRA models downloaded from https://civitai.com/.
  • Figure 2: Existing methods face identity degradation and occlusion problems. (a) Given two text prompts with identifiers, "A $[v1]$ man" and "A $[v2]$ woman", we generate $100$ images for the two concepts separately (separate generation) and calculate the Identity Alignment between generated images and reference images. Subsequently, we employ another text prompt, "A $[v1]$ man and a $[v2]$ woman", to randomly generate $100$ images containing both concepts simultaneously (simultaneous generation) and calculate Identity Alignment. We find that the simultaneous generation of two concepts leads to the decline of Identity Alignment, resulting in identity degradation. (b) Given spatial conditions with occlusion between concepts, the Mix-of-show gu2023mix cannot generate an integrity image and encounters an identity degradation problem.
  • Figure 3: Overviews of the proposed OMG, which contains two stages during sampling. The first stage takes charge of layout generation and visual comprehension information for handling occlusions. Leveraging the acquired information, the identities of concepts can be injected in multi-concept personalized denoising with the proposed latent-level and attention-level noise blending in the second stage.
  • Figure 4: Overviews of the Multi-concept Personalized Denoising. This stage utilizes the acquired visual comprehension information and the designed concept noise blending method to integrate multiple concepts while considering occlusions.
  • Figure 5: Effect of the initiation timestep for concept noise blending. The initiation timestep for concept noise blending influences both the image layout and illumination. When the initiation timestep is $0$, there is no concept noise blending operation during sampling, resulting in the same generation result for both stages.
  • ...and 5 more figures